368 56 14MB
English Pages 891 [857] Year 2021
Advances in Intelligent Systems and Computing 1318
S. Smys João Manuel R. S. Tavares Robert Bestak Fuqian Shi Editors
Computational Vision and Bio-Inspired Computing ICCVBIC 2020
Advances in Intelligent Systems and Computing Volume 1318
Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Nikhil R. Pal, Indian Statistical Institute, Kolkata, India Rafael Bello Perez, Faculty of Mathematics, Physics and Computing, Universidad Central de Las Villas, Santa Clara, Cuba Emilio S. Corchado, University of Salamanca, Salamanca, Spain Hani Hagras, School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK László T. Kóczy, Department of Automation, Széchenyi István University, Gyor, Hungary Vladik Kreinovich, Department of Computer Science, University of Texas at El Paso, El Paso, TX, USA Chin-Teng Lin, Department of Electrical Engineering, National Chiao Tung University, Hsinchu, Taiwan Jie Lu, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW, Australia Patricia Melin, Graduate Program of Computer Science, Tijuana Institute of Technology, Tijuana, Mexico Nadia Nedjah, Department of Electronics Engineering, University of Rio de Janeiro, Rio de Janeiro, Brazil Ngoc Thanh Nguyen , Faculty of Computer Science and Management, Wrocław University of Technology, Wrocław, Poland Jun Wang, Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong
The series “Advances in Intelligent Systems and Computing” contains publications on theory, applications, and design methods of Intelligent Systems and Intelligent Computing. Virtually all disciplines such as engineering, natural sciences, computer and information science, ICT, economics, business, e-commerce, environment, healthcare, life science are covered. The list of topics spans all the areas of modern intelligent systems and computing such as: computational intelligence, soft computing including neural networks, fuzzy systems, evolutionary computing and the fusion of these paradigms, social intelligence, ambient intelligence, computational neuroscience, artificial life, virtual worlds and society, cognitive science and systems, Perception and Vision, DNA and immune based systems, self-organizing and adaptive systems, e-Learning and teaching, human-centered and human-centric computing, recommender systems, intelligent control, robotics and mechatronics including human-machine teaming, knowledge-based paradigms, learning paradigms, machine ethics, intelligent data analysis, knowledge management, intelligent agents, intelligent decision making and support, intelligent network security, trust management, interactive entertainment, Web intelligence and multimedia. The publications within “Advances in Intelligent Systems and Computing” are primarily proceedings of important conferences, symposia and congresses. They cover significant recent developments in the field, both of a foundational and applicable character. An important characteristic feature of the series is the short publication time and world-wide distribution. This permits a rapid and broad dissemination of research results. Indexed by DBLP, EI Compendex, INSPEC, WTI Frankfurt eG, zbMATH, Japanese Science and Technology Agency (JST). All books published in the series are submitted for consideration in Web of Science.
More information about this series at http://www.springer.com/series/11156
S. Smys · João Manuel R. S. Tavares · Robert Bestak · Fuqian Shi Editors
Computational Vision and Bio-Inspired Computing ICCVBIC 2020
Editors S. Smys Department of Computer Science and Engineering RVS Technical Campus Coimbatore, Tamil Nadu, India Robert Bestak Czech Technical University in Prague Prague, Czech Republic
João Manuel R. S. Tavares Departamento de Engenharia Mecanica SDI Faculty of Engineering Universidade do Porto Porto, Portugal Fuqian Shi College of Graduate Studies University of Central Florida Orlando, FL, USA
ISSN 2194-5357 ISSN 2194-5365 (electronic) Advances in Intelligent Systems and Computing ISBN 978-981-33-6861-3 ISBN 978-981-33-6862-0 (eBook) https://doi.org/10.1007/978-981-33-6862-0 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
We are honored to dedicate the proceedings of ICCVBIC 2020 to all the participants and editors of ICCVBIC 2020.
Preface
It is with deep satisfaction that I write this preface to the proceedings of the ICCVBIC 2020 held in RVS Technical Campus, Coimbatore, Tamil Nadu, November 19–20, 2020. This conference was bringing together researchers, academics and professionals from all over the world, experts in Computational Vision and Bio-Inspired Computing. This conference particularly encouraged the interaction of research students and developing academics with the more established academic community in an informal setting to present and to discuss new and current works. The papers contributed the most recent scientific knowledge known in the field of computational vision, soft computing, fuzzy, image processing and bio-inspired computing. Their contributions helped to make the conference as outstanding as it has been. The Local Organizing Committee members and their helpers have put much effort into ensuring the success of the day-to-day operation of the meeting. We hope that this program will further stimulate research in computational vision, soft computing, fuzzy, image processing and bio-inspired computing and provide practitioners with better techniques, algorithms and tools for deployment. We feel honored and privileged to serve the best recent developments to you through this exciting program. We thank all the authors and participants for their contributions. Coimbatore, India
Dr. S. Smys Conference Chair ICCVBIC 2020
vii
Acknowledgements
ICCVBIC 2020 would like to acknowledge the excellent work of our conference organizing committee and keynote speakers for their presentation on November 19– 20, 2020. The organizers also wish to acknowledge publicly the valuable services provided by the reviewers. On behalf of the editors, organizers, authors and readers of this conference, we wish to thank the keynote speakers and the reviewers for their time, hard work and dedication to this conference. The organizers wish to acknowledge Dr. Smys, Dr. João Manuel R. S. Tavares, Dr. Robert Bestak and Dr. Fuqian Shi for the discussion, suggestion and cooperation to organize the keynote speakers of this conference. The organizers also wish to acknowledge the speakers and participants who attended this conference. Many thanks are given for all the persons who helped and supported this conference. ICCVBIC 2020 would like to acknowledge the contribution made to the organization by its many volunteers. The members have contributed their time, energy and knowledge at a local, regional and international levels. We also thank all the Chair Persons and conference committee members for their support.
ix
About the Conference
This conference proceedings volume contains the written versions of most of the contributions presented during the conference of ICCVBIC 2020. The conference provided a setting for discussing recent developments in a wide variety of topics including computational vision, fuzzy, image processing and bio-inspired computing. The conference has been a good opportunity for participants coming from various destinations to present and discuss topics in their respective research areas. ICCVBIC 2020 conference tends to collect the latest research results and applications on Computational Vision and Bio-Inspired Computing. It includes a selection of 63 papers from 227 papers submitted to the conference from universities and industries all over the world. All of the accepted papers were subjected to strict peer reviewing by 2–4 expert referees. The papers have been selected for this volume because of the quality and the relevance to this conference. ICCVBIC 2020 would like to express our sincere appreciation to all the authors for their contributions to this book. We would like to extend our thanks to all the referees for their constructive comments on all papers; especially, we would like to thank the organizing committee for their hard working. Finally, we would like to thank Springer publications for producing this volume. Dr. S. Smys Conference Chair ICCVBIC 2020
xi
Contents
Smart Surveillance System by Face Recognition and Tracking Using Machine Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D. K. Niranjan and N. Rakesh
1
Object-Based Neural Model in Multicore Environments with Improved Biological Plausibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . R. Krishnan and A. Murugan
15
Advancement in Classification of X-Ray Images Using Radial Basis Function with Support of Canny Edge Detection Model . . . . . . . . . . C. M. A. K. Zeelan Basha, T. Sai Teja, T. Ravi Teja, C. Harshita, and M. Rohith Sri Sai Brain Tumour Three-Class Classification on MRI Scans Using Transfer Learning and Data Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . C. A. Ancy and Maya L. Pai Assessing the Statistical Significance of Pairwise Gapped Global Sequence Alignment of DNA Nucleotides Using Monte Carlo Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rajashree Chaurasia and Udayan Ghose Principal Integrant Analysis Based Liver Disease Prediction Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M. Shyamala Devi, Kamma Rahul, Ambati Aaryani Chowdary, Jampani Sai Monisha Chowday, and Satheesh Manubolu Classification of Indian Classical Dance 3D Point Cloud Data Using Geometric Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ashwini Dayanand Naik and M. Supriya Fire Detection by Parallel Classification of Fire and Smoke Using Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Robert Singh, Suganya Athisayamani, S. Sankara Narayanan, and S. Dhanasekaran
29
41
57
71
81
95
xiii
xiv
Contents
A Split Key Unique Sudoku Steganography (SKUSS)-Based Reversible High Embedded Data Hiding Technique . . . . . . . . . . . . . . . . . . . 107 Utsav Kumar Malviya and Vivek Singh Rathore Identification of Insomnia Based on Discrete Wavelet Transform Using Time Domain and Nonlinear Features . . . . . . . . . . . . . . . . . . . . . . . . . 121 P. Mamta and S. V. A. V. Prasad Transfer Learning Techniques for Skin Cancer Classification . . . . . . . . . . 135 Mirya Robin, Jisha John, and Aswathy Ravikumar Particle Swarm Optimization Based on Random Walk . . . . . . . . . . . . . . . . 147 Rajesh Misra and Kumar Sankar Ray Signal Processing Algorithms Based on Evolutionary Optimization Techniques in the BCI: A Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Ravichander Janapati, Vishwas Dalal, N. Govardhan, and Rakesh Sengupta Cancelation of 50 and 60 Hz Power-Line Interference from Electrocardiogram Using Square-Root Cubature Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Roshan M. Bodile and T. V. K. Hanumantha Rao A Comprehensive Study on the Arithmetic Operations in DNA Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 V. Sudha and K. S. Easwarakumar Fuzzy C-means for Diabetic Retinopathy Lesion Segmentation . . . . . . . . 199 Shalini and Sasikala A Secured System for Tele Cardiovascular Disease Monitoring . . . . . . . . 209 Azmi Shawkat Abdulbaqi, Saif Al-din M. Najim, Shokhan M. Al-barizinji, and Ismail Yusuf Panessai Anomaly Detection in Real-Time Surveillance Videos Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 Aswathy K. Cherian and E. Poovammal Convolutional Neural Network-Based Approach for Potholes Detection on Indian Roads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 Noviya Balasubramanian, J. Dharneeshkar, Varshini Balamurugan, A. R. Poornima, Muktha Rajan, and R. Karthika An Efficient Algorithm to Identify Best Detector and Descriptor Pair for Image Classification Using Bag of Visual Words . . . . . . . . . . . . . . 245 R. Karthika and Latha Parameswaran GUI-Based Alzheimer’s Disease Screening System Using Deep Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 Himanshu Pant, Manoj Chandra Lohani, Janmejay Pant, and Prachi Petshali
Contents
xv
Performance Analysis of Different Deep Learning Architectures for COVID-19 X-Ray Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 K. S. Varshaa, R. Karthika, and J. Aravinth Random Grid-Based Visual Cryptography for Grayscale and Colour Images on a Many-Core System . . . . . . . . . . . . . . . . . . . . . . . . . 287 M. Raviraja Holla and Alwyn R. Pais A Generic Framework for Change Detection on Surface Water Bodies Using Landsat Time Series Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 T. V. Bijeesh and K. N. Narasimhamurthy A Machine Learning Approach to Detect Image Blurring . . . . . . . . . . . . . 315 Himani Kohli, Parth Sagar, Atul Kumar Srivastava, Anuj Rani, and Manoj Kumar Object Detection for Autonomous Vehicles Using Deep Learning Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 E. J. Sai Pavan, P. Ramya, B. Valarmathi, T. Chellatamilan, and K. Santhi CNN Approach for Dementia Detection Using Convolutional SLBT Feature Extraction Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341 A. V. Ambili, A. V. Senthil Kumar, and Ibrahiem M. M. El Emary Classification of Ultrasound Thyroid Nodule Images by Computer-Aided Diagnosis: A Technical Review . . . . . . . . . . . . . . . . . . . 353 Siddhant Baldota and C. Malathy A Transfer Learning Approach Using Densely Connected Convolutional Network for Maize Leaf Diseases Classification . . . . . . . . . 369 Siddhant Baldota, Rubal Sharma, Nimisha Khaitan, and E. Poovammal Predicting Embryo Viability to Improve the Success Rate of Implantation in IVF Procedure: An AI-Based Prospective Cohort Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383 Dhruvilsinh Jhala, Sumantra Ghosh, Aaditya Pathak, and Deepti Barhate Breast Cancer Detection and Classification Using Improved FLICM Segmentation and Modified SCA Based LLWNN Model . . . . . . . 401 Satyasis Mishra, T. Gopi Krishna, Harish Kalla, V. Ellappan, Dereje Tekilu Aseffa, and Tadesse Hailu Ayane Detection of Diabetic Retinopathy Using Deep Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415 R. Raja Kumar, R. Pandian, T. Prem Jacob, A. Pravin, and P. Indumathi Hybrid Level Fusion Schemes for Multimodal Biometric Authentication System Based on Matcher Performance . . . . . . . . . . . . . . . 431 S. Amritha Varshini and J. Aravinth
xvi
Contents
Evolutionary Computation of Facial Composites for Suspect Identification in Forensic Sciences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449 Vijay A. Kanade Automatic Recognition of Helmetless Bike Rider License Plate Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457 K. V. L. Keerthi, V. Krishna Teja, P. N. R. L Chandra Sekhar, and T. N. Shankar Prediction of Heart Disease with Different Attributes Combination by Data Mining Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469 Ritu Aggrawal and Saurabh Pal A Novel Video Retrieval Method Based on Object Detection Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483 Anuja Pinge and Manisha Naik Gaonkar Exploring a Filter and Wrapper Feature Selection Techniques in Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497 V. Karunakaran, V. Rajasekar, and S. Iwin Thanakumar Joseph Recent Trends in Epileptic Seizure Detection Using EEG Signal: A Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507 Vinod J. Thomas and D. Anto Sahaya Dhas Measurement of Physiological Parameters Using Video Processing . . . . . 527 M. Spandana, Pavan Arun Deshpannde, Kashinath Biradar, B. S. Surekha, and B. S. Renuka Low-Dose Imaging: Prediction of Projections in Sinogram Space . . . . . . . 541 Bhagya Sunag and Shrinivas Desai Transfer Learning for Children Face Recognition Accuracy . . . . . . . . . . . 553 R. Sumithra, D. S. Guru, V. N. Manjunath Aradhya, and Raghavendra Anitha A Fact-Based Liver Disease Prediction by Enforcing Machine Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567 Mylavarapu Kalyan Ram, Challapalli Sujana, Rayudu Srinivas, and G. S. N. Murthy Novel Approach to Data Hiding in Binary Images Minimizing Distortion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587 Gyankamal J. Chhajed and Bindu R. Garg Deep CNN-Based Fire Alert System in Video Surveillance Networks . . . 599 P. J. Sunitha and K. R. Joy Implementation of Chassis Number Recognition Model for Automatic Vehicle Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617 Khine Htoo and Myint Myint Sein
Contents
xvii
Identification of Artificial Body Marks and Skin Disorder Marks Using Artificial Neural Network Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 627 Dayanand G. Savakar, Danesh Telsang, and Anil Kannur Whale Optimization Algorithm Applied to Recognize Spammers in Facebook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643 R. Krithiga and E. Ilavarasan FNAB-Based Prediction of Breast Cancer Category Using Evolutionary Programming Neural Ensemble . . . . . . . . . . . . . . . . . . . . . . . . 653 Vijaylaxmi Inamdar, S. G. Shaila, and Manoj Kumar Singh Review on Augmented Reality and Virtual Reality Enabled Precious Jewelry Selling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665 Nishita Hada, Sejal Jain, Shreya Soni, and Shubham Joshi Age and Volume Detection of Heartwood and Sapwood in Scots Pine Species Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675 Piyush Juyal and Sachin Sharma Multi-layer Perceptron Training Using Hybridized Bat Algorithm . . . . . 689 Luka Gajic, Dusan Cvetnic, Miodrag Zivkovic, Timea Bezdan, Nebojsa Bacanin, and Stefan Milosevic Bayes Wavelet-CNN for Classifying COVID-19 in Chest X-ray Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707 S. Kavitha and Hannah Inbarani Survey of Color Feature Extraction Schemes in Content-Based Picture Recovery System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 719 Kiran H. Patil and M. Nirupama Bhat A New Method of Interval Type-2 Fuzzy-Based CNN for Image Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 733 P. Murugeswari and S. Vijayalakshmi Validating Retinal Color Fundus Databases and Methods for Diabetic Retinopathy Screening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747 S. Anitha and S. Madhusudhan An Investigation for Interpreting the Epidemiological Occurrence of COVID-19 in India Using GP-ARIMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 771 K. M. Baalamurugan, Tanya Yaqub, Akshat Shukla, and Akshita Artificial Intelligence and Medical Decision Support in Advanced Healthcare System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 781 Anandakumar Haldorai and Arulmurugan Ramu Survey of Image Processing Techniques in Medical Image Assessment Methodologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 795 Anandakumar Haldorai and Arulmurugan Ramu
xviii
Contents
An Analysis of Artificial Intelligence Clinical Decision-Making and Patient-Centric Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 813 Anandakumar Haldorai and Arulmurugan Ramu A Critical Review of the Intelligent Computing Methods for the Identification of the Sleeping Disorders . . . . . . . . . . . . . . . . . . . . . . . 829 Anandakumar Haldorai and Arulmurugan Ramu Review on Face Recognition Using Deep Learning Techniques and Research Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845 V. Karunakaran, S. Iwin Thanakumar Joseph, and Shanthini Pandiaraj Steganalysis for Images Security Classification in Machine Learning Using SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 855 P. Karthika, B. Barani Sundaram, Tucha Kedir, Tesfaye Tadele Sorsa, Nune Sreenivas, Manish Kumar Mishra, and Dhanabal Thirumoorthy Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 869
About the Editors
Dr. S. Smys received his M.E. and Ph.D. degrees all in Wireless Communication and Networking from Anna University and Karunya University, India. His main area of research activity is localization and routing architecture in wireless networks. He serves as Associate Editor of Computers and Electrical Engineering (C&EE) Journal, Elsevier, and Guest Editor of MONET Journal, Springer. He has served as a reviewer for IET, Springer, Inderscience and Elsevier journals. He has published many research articles in refereed journals and IEEE conferences. He has been General chair, Session Chair, TPC Chair and Panelist in several conferences. He is a member of IEEE and a senior member of IACSIT wireless research group. He has been serving as Organizing Chair and Program Chair of several international conferences, and in the Program Committees of several international conferences. Currently, he is working as Professor in the Department of CSE at RVS Technical Campus, Coimbatore, India. João Manuel R. S. Tavares graduated in Mechanical Engineering at the Universidade do Porto, Portugal, in 1992. He also earned his M.Sc. degree and Ph.D. degree in Electrical and Computer Engineering from the Universidade do Porto in 1995 and 2001 and attained his Habilitation in Mechanical Engineering in 2015. He is a senior researcher at the Instituto de Ciência e Inovação em Engenharia Mecânica e Engenharia Industrial (INEGI) and Associate Professor at the Department of Mechanical Engineering (DEMec) of the Faculdade de Engenharia da Universidade do Porto (FEUP). His main research areas include computational vision, medical imaging, computational mechanics, scientific visualization, human–computer interaction and new product development. Robert Bestak obtained Ph.D. degree in Computer Science from ENST Paris, France (2003) and M.Sc. degree in Telecommunications from Czech Technical University in Prague, CTU, Czech Republic (1999). Since 2004, he has been an Assistant Professor at the Department of Telecommunication Engineering, Faculty of Electrical Engineering, CTU. He participated in several national, EU, and third party research projects. He is the Czech representative in the IFIP TC6 organization, and chair of the working group TC6 WG6.8. He annually serves as Steering and Technical xix
xx
About the Editors
Program Committee member of numerous IEEE/IFIP conferences (Networking, WMNC, NGMAST, etc.) and he is member of editorial board of several international journals (Computers & Electrical Engineering, Electronic Commerce Research Journal, etc.). His research interests include 5G networks, spectrum management and big data in mobile networks. Dr. Fuqian Shi currently working as Graduate Faculty Scholar, College of Graduate Studies at the University of Central Florida, USA. He has published more papers in standard journal and conferences. He has acted as a committee member in many international conferences and editorial board in refereed journals. He acted as a reviewer in many reputed journals. He is a member of professional bodies like IEEE, IEEE Computer Society, Society, Man and Cybernetics, Computational Information System, ACM, Zhejiang Provincial Industrial Design Association and Information Management Association of Wenzhou City. His research interest includes computer networks, computer programming, computer graphics, image processing, data structure, operating system and medical informatics.
Smart Surveillance System by Face Recognition and Tracking Using Machine Learning Techniques D. K. Niranjan and N. Rakesh
Abstract With the recent technological advancements, the theft and other such unnecessary activities can be controlled. At present, many traditional security systems are available to just record the situation, but the unnecessary activities will be already taken. The main objective of this research work is to control and stop the theft by tracking the burglar face or the stranger face image and send the notification to the owners. This system will not only alert the owners, and it can also give access to the door lock with the face recognition feature. In this way, the proposed system replaces the traditional security system with the feature like RFID, PIN method, etc. The accuracy of the proposed system is high, and it works in all the situations, where it can be operated with both battery and main power system. The system has its own database to recognize the person, and the person with registered facial image can only be given access to the door. The developed system has a low powered device with high accuracy system. Moreover, it also helps to reduce the unnecessary activities. Keywords Face tracking · Smart door lock · Alert notification with image
1 Introduction Nowadays, the culprits are more, and at least one case is registered on theft and other unnecessary activities. The other unnecessary activities will be controlled by deploying the security system and monitor the activities like crime and kidnapping. To stop these activities, a security system is much needed for a home, where each and every one comes at different time and requires access to the door. The solutions like keyless entry and theft control can be done through the facial recognition. D. K. Niranjan (B) Department of Computer Science and Engineering, Amrita School of Engineering, Bengaluru, India N. Rakesh Department of Computer Science and Engineering, Amrita Vishwa Vidyapeetham, Bangalore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_1
1
2
D. K. Niranjan and N. Rakesh
Fig. 1 Cases registered but not solved
To access the auto-lock door system, the proposed image recognition systems will automatically recognize the stored facial images. The face detection will be done by registering the faces to the system, where each and every person of the household should register the faces to access the system. By implementing the proposed system, there will be no need for any keys and cards to access it. Whenever a person wants to enter the house, a camera will be used to detect and recognize the face for taking the necessary action. If the unknown person comes in contact with the known person of the family member, then the system will be normal and unlock the system. If the unknown person tries to enter the house, then the system captures the face image and sends it to the mail notification of the registered person. The whole world is scared of these types of activities, and further to control these activities, this remains as one of the types of solution. This system will majorly help the old age people and children, where it will make them safe when there is no one at the home. Figure 1 shows that the number of cases registered but not solved, where there is a greater number of cases registered are due to theft and kidnapping. To control these activities, the owners are required to adopt toward the latest technologies for maintaining safety. The surveillance system helps for the safety side, wherein with the proposed system, the face will be detected and starts recognition, if it is matched, then the system starts other process such as opening the door. If the system data is not matched, it sends the notification to the user. The process of recognition of face is done through the Haar cascade method. The face recognition remains as the complex process because some of them wear spectacles, and even in that situation, the system should recognize the face with no lag in time. This system works normally even in the night time, and also, the system uses its own light whenever it is dark to leverage better efficiency. After sending a mail notification, the system records intruder activities in a special folder. This system works even when the power is turned off by the intruder with a battery backup for 3 h. The system captures the intruder and tracks the activities, where the camera moves along with them in that circumstances of area. The mail notification will be sent to the owners as well as to the nearby police station to stop the activities. If the person tries to hit the door, then the system buzzer beeps louder so that it can be heard to nearly 50 m of radius. If the user car is moved by the unknown person, it sends the
Smart Surveillance System by Face Recognition and Tracking …
3
alert mail; if the car is moved along with the family member, then it will not send the alert. If the unknown person tries to do mischievous activities to the registered family members or some harassment to the family members, then the system alerts the user. This paper is organized as follows. Section 2 discusses the background work in the proposed research domain. In Sect. 3, the proposed system is explained. In Sect. 4, hardware description is provided. Section 5 describes the set-up and result of the system. Finally, Sect. 6 concludes the proposed research work and discuses about the future research scope.
2 Background Work In the [1], it explains the attendance system that has been proposed through facial recognition technique, which is a real-time system, and in this system, they analyze the attendance of students and employee with check-in and check-out timings to maintain the status of students and employee. The unknown person data is sent to the mobile application. In this system, the main drawback is the entry point and to the exit point. If the person exits at the entry point, it is failed. They have used OpenCV and Haar cascade method to recognize the face. They have done only for seven persons of only 30 training data for each person. The accuracy level of the system is less as they have only 30 training data, where the dataset is less. In the [2], this paper explains that the accessing of door is done by using the face recognition. They have used PIR sensor, when the person comes near to the door, the system recognizes and opens the door. The drawback of this paper is that, when the person comes near the system starts working until then the system will remain in the sleep mode. If power is off, then the system will not work because the system needs power to work. In the [3], this paper explains that the system is only designed for sensitive area, where the authorized persons will only get the permission for entering the area, as the paper says that there will be fail in the recognition. The fail in the recognition represents that there is only less trained data. Out of 100 people, two persons recognition is failed. It is only used in small area. It aims to restrict the unknown person. In the [4], this paper explains that the system controls the access permission. They have used principal component analysis for face recognition. In this, they have used GUI with the Arduino board for facial recognition. In this method, there is no training data, so the facial recognition expression will be changed every time for the same person, but storing of the person dataset will also be limited. For any facial recognition, there should be more dataset on the same person. Even the accuracy level of the person is also remaining less. In the [5], this paper explains that the system has implemented a pin made-based smart door lock system. If the entered pin is wrong, then the system will send the alert message to the owner. In this method, the security designed is high. The system uses ultrasonic sensor, when the person comes near to the door, then only the system
4
D. K. Niranjan and N. Rakesh
comes to the activation. If the unknown person enters the wrong pin, then the system sends message but not the intruder person.
3 Proposed System The main motivation for this research work is to leverage new security devices for personal care [6]. The system is designed for the safety purpose, where the system works based on the facial recognition concept by using the Haar cascade method. If the registered face person wants to enter the home, they need to just see the camera once then the system detects and recognizes the person to open the door [7, 8]. If the person face is not registered, the system will consider it as the unknown person and sends the emergency alert message to the registered owner as well as to the nearest police station mail. If the main power is off, the system will work on the battery power for around 3 h. If the unknown person does the mischievous activities, the device will send the captured image to the registered mail id. Then, the system will start tracing the unknown person activities by recording and store in the separate folder for future purpose. If the person does the mischievous activities like damaging the door or the misbehaving, the system will buzzer at load such that it covers around 50 m radius. The block diagram is shown in the Fig. 2. The proposed system requires to register the faces in the system at first stage [9]. The registered faces are stored in the dataset folder. In our system, we have registered ten persons data, each person’s data is trained, and it is of 1000 images for best accuracy. In our system, we have used Haar cascade method for detecting the face. In our system for registering the faces, the device needs to run registering the face, and the system starts video format of around 4 min to capture all the images of
CAMERA (Detection of Face)
Recognition from Data Set
Know
Opens the Door
Unknown
Captures the Image of unknown person
Sends the mail to the Registered Mail ID (With image attachment)
Fig. 2 Block diagram of process of the system
Buzzer around 50meters
Smart Surveillance System by Face Recognition and Tracking …
5
1000 images of each person [10]. The system converts the image of the registered person in to the gray scale image. These images are stored based on their IDs. The dataset stored in the database for recognition of the face. The process of registering the image in dataset is shown in Fig. 3. The system consists of the data image stored in the database. Then, the system starts working as the surveillance system, where it detects the registered person based on the dataset stored in the device. Whenever the person comes, the device detects and recognizes, and it will give access to open the door to the registered system. If the person is not registered, then the system sends notification as well as it will give buzzer sound so that the action will be taken at the earliest and will be stopped. The process of recognition is shown in the Fig. 4.
Input from PI V2 Camera
Convert Image into Gray Scale Image
Starts the Video Format for registering the faces
Capture the Image With in the Boundary
Detect Face (Haar Cascade Method)
Apply Boundary to Detected Face
Data Base to Store the Image
Fig. 3 Block diagram for registering the face
Input from the PI V2 Camera
Detect the Face
If the recognised face is same then it opens the doo r Fig. 4 Process of face recognition and access door
Compare the Recognised Face with Registered Face
If recognised face is not same then it sends the alert notification
6
D. K. Niranjan and N. Rakesh
Fig. 5 Haar cascade feature
In this module, we recognize and detect the input images. This module is connected to the outer side of the door, where the captured image converted into Haar feature-based cascade classifiers. Comparision and Matching the face with this feature extraction image with the database [11]. The application-specific unit which consists of door lock security system, it is associated with Door lock system module of authentication module and it starts functioning according to results of the module to perform door lock open or close operation based on Face Recognition. If face is matched then give access to open the door, If the compared face is not matched with the stored data then it sends the email notification to the user. In our proposed system, we are using Haar cascade approach for best efficiency. We have used OpenCV platform to recognize the face. In our approach, we have used local binary patterns histogram (LBPH) algorithm, based on the Haar cascade method. The system needs positive images which mean the images of faces, the negative images means the image without face. This algorithm is used to train the classifier. The Haar cascade feature is used which is shown in Fig. 5. The Haar cascade is of three features, edge, line and four-rectangle features [12, 13]. This feature is where it recognizes only the face, and background is removed off, of the non-facial segment. This feature is used to train the classifier of the face where it detects parts of the face in detail. This classifier is used because it gives 95% of accuracy in the result, compared to other method. The proposed home security system is a machine learning-based approach where a cascade function is trained from a lot of positive and negative images. It is then used to detect objects in other images. A procedural view of how this person detection works is shown in the Fig. 6, flow chart. The mischievous condition is that, if the unknown person tries to access the door, or by damaging the door, or damaging the items under the camera visible side, then it acts as the mischievous section. If the culprits turns-off the main power supply, the system will run by the battery power, where it runs for around 3 h and it stores in the database as well as in the cloud. If the culprits damage the system, then the data in the device will be uploaded to the cloud base for future assistance. Each and every part of the activities will be
Smart Surveillance System by Face Recognition and Tracking …
7
START A
B Start Camera to Detect Face
YES
Face Recognition ?
NO
Its not there, Send Notification to Registered User. (with image attached)
Its same and its stored in the data base.
Open the Door NO A
If Mischievous?
YES Then Track and record the activities of the person and send message to the nearest police station
B
Fig. 6 Flow chart of the proposed system
Send Emergency Alert Message to Registered User and Buzzer at Load Sound
8
D. K. Niranjan and N. Rakesh
recorded by tracking the culprits. These activities are traced based on the IR sensor placed. As the IR sensor is interfaced with the servo motor, it is rotated along the horizontal axis as well as the vertical axis. In our proposed system, it automatically turns on the lights when it is dark, for obtaining a best accuracy. The Haar cascade is considered as the best method for facial recognition compared to other methods, where it gives 95% accuracy of the face. This system will track the person, depending on the condition, as the person can be passing the road or the courier person coming, then the system captures the person face and store, but other conditions it alerts the system. This system will only send mail notification to the person with image. This device alerts in case of high security breaking, where it alerts by emergency alarm and light. To avoid the high risk, the system can be used under these areas, home, bank lockers, ATM, government services, etc…
4 Hardware Description In our system, we have used Raspberry Pi 3B+ for securing the whole process and the recognizing the faces at the low cost and low powered system [14]. It consists of in-built WIFI so that we can send mails to the registered mail IDs, where it consists of 1.4 GHz speed processor which is of 64-bit. It consists of CSI camera port which is in-built for the PIV2 camera, and for display purpose, it consists of HDMI port or the display port for PI display screen. The memory slot of the device can be expanded as the user requirements. The module is as shown in the Fig. 7. The proposed system has used 5MP camera for best results, where the pi camera is connected to its CSI port by the flex cable, this camera only takes the picture and video of the activities not the sound of the activities. It has capable of taking at the 1080p clarity video or the picture so that it is most effective (Fig. 8). The proposed system has used IR sensor for tracing the person. If the person moves, the sensor detects the position, and IR sensor is interfaced with the servo motors for tracking the unregistered person. IR sensor detects the person and
Fig. 7 Raspberry Pi 3B+
Smart Surveillance System by Face Recognition and Tracking …
9
Fig. 8 PI V2 camera (5MP)
measures the distance, and then, the system changes the camera angle to cover the area. The IR sensor is shown as in the Fig. 9. The servo motor is used to access the door and control the camera angle. These are done by interfaced with IR sensor, as the IR distance is calculated and sent to the motor, to pulse width modulation. As the PWM value is varied, the rotation angle is also varied and attains the position of the activities. The servo motor is as shown in the Fig. 10. In our model, we have used LED light which is interfaced with the LDR, to turn ON/OFF the lights when needed. Fig. 9 IR sensor
Fig. 10 Servo motor (SG-90)
10
D. K. Niranjan and N. Rakesh
5 Results The proposed system consists of Pi camera connected to the camera serial interface (CSI) to the Raspberry Pi, to register the faces to the dataset. In our model, we have OpenCV concept with the LBPH algorithm with the Haar cascade feature extraction, to recognize the face of the person. We have collected ten persons faces to be train the system, where 1000 images of face are trained of each person; these images are registered by their IDs. The 1000 images are registered by taking the video format by the system. Each video is of 4 min, in that, the person facial expression is changed every second. In these 4 min of video, the one person only face is registered and stored in dataset with ID. Whenever they see the camera, the smart door lock will be opened. The trained data of image is shown in the Fig. 11. The Fig. 12 shows the face recognized, it compares with the dataset of images which has been trained. If the recognized images match with the dataset images, then door opens, else the mail notification will be sent with the image of the person. Fig. 13 shows the email notification which has been sent to the registered mail id of the person. The mail consists of the “EMERGENCY” message, and the person image which has been captured will be sent to both user as well as to the nearest police station, depending on the situation. If any mischievous activities are carried out, then it sends to the nearest police station to stop the activities and take the necessary action. It is shown in Fig. 13. The proposed model is shown in the Fig. 14, the system consists of sensors and actuators as shown in the figure. Servo motor tracks the face and to access the door. The buzzer is used to alert, for the radius of 50-meter if any miscellaneous activities happens.
Fig. 11 1000 set of image are registered of one ID
Smart Surveillance System by Face Recognition and Tracking …
11
Face Recogniz ed with unique id
Pi Camera
Fig. 12 Face recognized with their unique ID (door will be opened)
Fig. 13 Face not-recognized and emergency mail is sent
Emergency Mail
Attachment of image of unknown person to mail
The proposed designed model is as shown in Figs. 14 and 15. Fig. 15a shows the night vision of the security system. The proposed system will give the best accuracy and efficiency of the system.
12
D. K. Niranjan and N. Rakesh
LED and LDR
Pi V2 Camera (5MP)
IR Sensor
Servo Motor (SG-90)
Servo Motor for accessing the Door IR Sensor
Fig. 14 Proposed model
a
b Raspberry Pi and Circuit deployment with Buzzer
Fig. 15 Image (A) is at the night time, the lights are turned ON for recognition of face. Image (B) is of the circuit deployment of the system
6 Conclusion and Future Scope 6.1 Conclusion The surveillance system is essentially required to enhance the safety of individuals, as the theft and other unnecessary activities are increasing in recent times. Some devices are existing, but it can only accommodate for limited person with limited
Smart Surveillance System by Face Recognition and Tracking …
13
database for training the classifier. In this system, the database storage accommodates 1000 images of each person, and the memory required for storing the database is too minimal, further it hardly takes around 300 MB for one person. This device will help all type of age group, so they will not worry about the biometric and password. The device will automatically move at the required angle, it detects and recognizes the person to open the door, and this process will take 2–3 s. The calibration of this device is performed more accurately for better efficiency. If an unregistered person is detected, the system sends alert message to the owner. If any mischievous activities occur, the system sends emergency message to the user as well as to the nearest police station. This method is faster and user-friendly compared to the traditional method, and in traditional method, it only implements smart lock system, when the person comes near to the door. But in the proposed method, it detects the person, who enters the secured area. Traditional method is usually a time consuming method, and if the power goes off, then the system does not work. But, in our system, it runs on the battery and does the efficient work with low power. This device can deployed in home, ATM, bank lockers, and in Government services.
6.2 Future Scope It has been planned to elaborate the design, by sending SMS and implementing GSM, creating application for mobile device to easily access and add its own Internet data network and renewable energy.
References 1. M. Srivastava, A. Kumar, A. Dixit, Real time attendance system using face recognition technique, in International Conference on Power Electronics & IoT Application in Renewable Energy and its control (PARC) (2020) 2. A. Nag, J.N. Nikhilendra, M. Kalmath, IoT based door access control using face recognition, in International Conference for Convergence in Technology (I2CT) (2018) 3. D.A. Chowdhry, A. Hussain, M.Z.U. Rehman, F. Ahmad, Smart security system for sensitive area using face recognition, in IEEE Conference on Sustainable Utilization and Development in Engineering and Technology (2013) 4. N. Bakshi, V. Prabhu, Face recognition system for access control using principal component analysis, in International Conference on Intelligent Communication and Computational Techniques (ICCT) (2017) 5. Kyungil University, Gyeongsan, Gamasil-gil, Security and usability improvement on a digital door lock system based on internet of things. Int. J. Sec. Appl. 9(8), 45–54 (2015) 6. V. Kamesh, M. Karthick, K. Kavin, M. Velusamy, R. Vidhya, Real time fraud anomaly detection in e-banking using data mining algorithm. South Asian J. Eng. Technol. 8(1), 144–148 (2019) 7. V.S. Sureshkumar, D. Joseph Paul, N. Arunagiri, T. Bhuvaneshwaran, S. Gopalakrishnan, Optimal performance and security of data through FS-drops methodology. Int. J. Innov. Res. Eng. Sci. Technol. 3(5), 1–7 (2017)
14
D. K. Niranjan and N. Rakesh
8. J.D. Irawan, E. Adriantantri, A. Farid, RFID and IOT for attendance management system. MATEC Web Conf (ICESTI) 164, 01020 (2018) 9. M. Vijayakumar, E. Prabhakar, A hybrid combined under—over sampling method for class imbalanced datasets. Int. J. Res. Adv. Develop. (IJRAD) 02(05), 27–33 (2018) 10. N. Rakesh, Performance analysis of anomaly detection of different IoT datasets using cloud micro services, in International Conference on Inventive Computation Technologies (ICICT 2016) 11. S. Sreedharan, N. Rakesh, Securitization of smart home network using dynamic authentication, in International conference on Computer Networks and Inventive Communication Technologies (ICCNCT) (2018) 12. R.S. Anuradha, R. Bharathi, K. Karthika, S. Kirithika, S. Venkatasubramanian, Optimized door locking and unlocking using IoT for physically challenged people, Issue 3 Harlow. Int. J. Innov. Res. Comput. Commun. Eng. 4 (2016). ISSN(Online): 2320–9801 ISSN (Print): 2320–9798. 13. N. Majgaonkar, R. Hodekar, P. Bandagale, Automatic door locking system. Int. J. Eng. Develop. Res IJEDR 4. ISSN: 2321–9939. 14. L.S. Oliveira, D.L. Borges, F.B. Vidal, L. Chang, A fast eye localization and verification method to improve face matching in surveillance videos, in IEEE International Conference on Systems, Man, and Cybernetics (SMC) (2012), pp. 840–845
Object-Based Neural Model in Multicore Environments with Improved Biological Plausibility R. Krishnan and A. Murugan
Abstract It is generally known that the computational neuroscience is usually considered as a mathematics-driven model. Computational neuroscience maps the functions of the neural populations to computer-based mathematical functions. Presently, numerous efforts have been taken to provide neural models that mimic the biological functions of neurons. This endeavor suggests the need for deploying comprehensive object-based basic neural models in the place of mathematical models that has been implemented by just using object-oriented programming languages. This paper describes the object-based model (OBM) and how it is different from the models using object-oriented languages. The consistent premise ‘Everything is not learning’ is set and is justified by the biological supports produced here. This paper also discusses about the alternate methods adopted by various researchers to prove that only a complete object-based design of neurons will have convergence toward the objective that mimic the actions of brain. This paper analyzes the need for different programming paradigms and concludes with a suggestion that the adaptive programming language ‘Go’ is a language that suits the implementation of such a model. Keywords Brain mimicking · Intelligence · Artificial neural network · Object-based model (OBM) · Go language · Concurrency
1 Introduction “Even with a million processors, we can only approach 1% of the scale of the human brain, and that is with a lot of simplifying assumptions,” says Steve Furber [1]. This bold and truthful proclamation by the top scientist strengthened the following proposal as an endeavor toward a fresh approach in mimicking the brain. The further part of this paper is divided into five parts where the first part pins the boundaries so that the terminologies and jargon around computational neuroscience R. Krishnan (B) · A. Murugan PG & Research Department of Computer Science, Dr. Ambedkar Government Arts College (Affiliated to University of Madras), Vyasarpadi, Chennai 600039, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_2
15
16
R. Krishnan and A. Murugan
do not lead to misinterpretations about what the brain mimicking is considered. The second part sets the objective for the research work. The third part analyzes the available models and the need and possibilities for change. The fourth part explains the requirements and structure of an OBM. The fifth part of the paper selects ‘Go’ language for the implementations of this model and also describes the proposal of OBM.
1.1 Artificial Intelligence and Thinking Artificial intelligence (AI) is defined by computational scientists in many ways from making the machine intelligent to copying expertise. Still, it is wobbling in their minds according to their perceptual changes and the process of setting goals. Before adding another perception, it was imperative to arrive at a sensible definition of intelligence. Interestingly intelligence is also a thing that possesses endless definitions by psychologists and scientists [2]. In the work [3], there has been an effort to give a precise definition of intelligence. ‘Intelligence measures an agent’s ability to achieve goals in a wide range of environments’ is the definition S. Legg and M. Hutter have arrived at. They themself call it an informal definition. In their future work [3], they formulate a formal measure for machine intelligence using the most accepted and most general framework, reinforcement learning. Their emphatic conclusion that IBM’s DeepBlue chess supercomputer will earn low universal intelligence measure because it is specific to a single environment raises the following question. If intelligence is a measure, then what is measured? Intelligence measures the ability of the agent in achieving the goal. The goal is set by the environment and achieving it itself will not yield one a high intelligence measure. For a sample from a human intelligence point of view, losing a game to motivate a child or to avoid the expenditure of the winner’s party may yield higher intelligence measures. Even though this is sidelined as ‘humanness’ in contrast to ‘artificial intelligence,’ this work considers this point to be the part of the agent’s ability enhancement. If intelligence is the measure of ability then what is the measured ability of the brain? If the machine thinks and decides then the measure of it will be intelligence, where a machine having the highest measure will be called an AI machine. Knowing the consequences of ‘thinking machine,’ which was an effort to build a dream machine, the proposed work as of now is only an academic endeavor in analyzing the ability of the brain and evaluating the possibilities of its machine implementations with the available technologies and programming paradigms. Figure 1 illustrates a perception of the process of thinking, and the keywords extracted out of it are parallel/concurrent and non-deterministic.
Object-Based Neural Model in Multicore Environments …
17
Fig. 1 Components and characteristics of thinking
2 The Prime Objectives of This Research The prime objectives of the whole work are proposed as follows: i.
ii.
To have mimicking as a prime objective rather than utility-based short-term goals. AI (or) brain mimicking is the ultimate goals in the field of computer science or information technology. When psychologists and biologists probe the behavior and the structure of the brain to ascertain what is intelligence, computational scientists concurrently endeavor to implement the already found truths about the brain or to implement imaginative perception about the brain with the nomenclature AI. To achieve the universal goal, the approach is to first set short-term goal-based or utility-based objectives. This work is to find other avenues or approaches toward brain mimicking rather than emphasizing only on utility factors. It also becomes necessary to emphasize that the factors that have been established by the computational scientist in their endeavor will be utilized with caution and proper reasoning, keeping in mind that our objective is to deviate from utility-based models toward functional mimicking of the brain. To have a different functional locus of a brain like ‘Thinking’ (a process of the brain) as the objective apart from usual learning to move closer toward mimicking.
18
iii.
iv.
v.
vi.
vii.
R. Krishnan and A. Murugan
It is not certain that a person reacts identically toward identical events. Similarly, it is also not certain that two persons react identically to an event. This work presumes that the differentiating factor between any two reactions is a place where the root of intelligence is hiding. The probing method and the avenue of the brain have to be different for finding that root of intelligence. To have ‘emotion’ (a temporal status of the brain) as an addition to the objective with the computer-based implementations of cognition and decision making. Psychologists have studied the interplay between emotion, cognition and decision making. The work [4] describes, a happy person decides on the basis of his experience (positive) and in contrast the same person who when sad thinks deep (pessimistic) and decides on the basis of an exhaustive analysis of pros and cons from all the data and knowledge possessed by him. He hints these two approaches, the first one as top-down and the second one as bottomup. Therefore, the programming paradigms suitable to the aspects defined by psychologists are to be searched for. Applying them with the same decisionmaking problem of our previous works [5, 6] and achieving feasible results which may be optimum or sometimes sub-optimal according to the emotional constraints will be a different approach. This may move the brain mimicking to the emotion-based temporal behavior of the brains. To have logical decision making combined with mathematical decision making in the set objectives. All neural network or support vector machines (SVM) [7, 8] or deep learning (DL) [9, 10] are loaded with mathematical algorithms to map the brain functions producing similar outputs of a brain to any given input using the massive amount of dataset. In contrast here, the objective is set to mingle ‘logical decision making’ (non-mathematical decision making) like first-order logic, resolution principles, etc. To have implementations as ‘multicore centric’ to mimic the concurrent cum cooperative functioning of a neural population. The learning algorithm uses concurrent processing to perform independent mathematical operations concurrently, where this work is to build a model that mimics the cooperative functioning of neurons in spite of being concurrent. To have uncertainty depicted in the proposed models. The models developed should exhibit uncertainty within a finite range maybe in speed or may even be in decision making. To have a modern programming paradigm utilized.
The models developed should imbibe the suitable characteristics of any new programming paradigms. By doing so, the work may have suggestions in modifications in programming language implementations. It is ascertained that all the above objectives are an endeavor to extend any available work in a new direction. This paper picks the above objectives to an initial extent and analyzes the possibilities of moving closer to brain mimicking.
Object-Based Neural Model in Multicore Environments …
19
3 Traditional Versus Modern Artificial Neural Networks It is important to emphasize that whatever the second-generation artificial neural networks (ANNs) have achieved is through mathematical modeling. Such emphasis will not brush aside the quality of duplicating the brain functions by those neural models. It will only comprehend that such methods have limitations in their aspirations of mimicking a brain. This is also accepted by the nanoscientists who were trying to build a neuro-morphic computer as they describe their effort will even be a baby step toward the objective of mimicking the brain [11]. It is evident that any computer-based model is not going to be completely biological in nature, but they can virtualize the biological elements. Computational neuroscientists also rightly describe the mathematical models that try to mathematically implement or correlate a brain function. These mathematical models consider the outputs generated by a human brain for the inputs at a particular situation and match a mathematical function to produce similar outputs for the same inputs. These models have achieved only mathematical implementations of a specific brain function under a specific condition but not in the way the brain does it. Therefore, such models are considered as ‘weak’ models of the brain even though their success according to their utility factor has made us find an alternative word to describe them as ‘Engineered AI’ [12].
3.1 Influencing Factors of Third-Generation ANNs The analysis of second-generation ANNs created the perception for the formation of the hypothesis of the proposed model. The objective is to try to compute outputs to match the outputs of a human brain, mimicking the brain’s functional flow. The emphasis is that the main objective of the proposed ANN is to imitate the human brain, in contrast to the mathematical models that have been achieved to find solutions for specific problem situations. This work scrutinizes the methods used and also unused, in other words, revisit the probabilities of finding alternative models, which may enable better brain mimicking capabilities to the proposed neural networks. It becomes important to contrast from the available definition of the artificial neuron itself. Therefore, a new model has to be proposed for an artificial neuron. This should also eliminate the emphasis only on mathematical calculation. The search for support toward the set objective confirmed that the objective set is a prospective one and also is supportive of the proposal of providing suitable modifications to the basic computational model itself. Particularly, the work [13] was of immense support to this endeavor of looking back where the authors say looking back should not be considered as a failure of any endeavor; in contrast, it may provide broader future prospects. In the same work [13], the nature of spiking neurons was described to be ‘sparsely connected and spiking on discrete timings,’ is supportive of the work [6] in which it occupies a part. The third generation spiking neurons are
20
R. Krishnan and A. Murugan
also perceived as a small step toward the biologically plausible model. Further study into spiking neurons confirmed that perception where the work [14] necessitates the inclusion of gene/protein parameters into the functioning of neural models. This work gave an insight into the biologically plausible modifications needed in the computational neural models. In particular, the following two questions raised by the authors [14] gave the idea of the need to modify the basic perception of an artificial neuron. (a) (b)
How to link the gene/protein functionalities to the neuronal behavior? and Which gene/protein should be included in the model and how to match the timing of gene transcription and translation into proteins according to the spiking speed of the neuron?
It is observed that the tone of the questioning of the above work [14] was the prime factor that enlightened the required perceptional change by the computational model proposers to be more biologically centered. It is also observed that the author’s terminology, ‘neural parameters,’ for gene and protein selection, sounds more designcentered rather than biological perception. It was decided to think about neurons as a functional unit that gets external inputs in contrast to the traditional way of defining it as a programmatic function with specified parameters. It is perceived that the genetic code determines the functioning of each neuron, rather than a gene pattern being used as data as suggested in the work [14].
4 Need for an Object-Based Model The following discussion, in specific terms re-discussion, is of high importance to evaluate, elucidate, and emulate the current models of ANNs. This paved the way for finding the scope of new probabilities and designing a new model, respectively. The first task of evaluating the current models gave the conclusion that being mathematical in nature the probability of them becoming a brain mimic is low. That is why it was decided that any proposed model has to be adaptive in nature as a biological neuron to become a nearer mimic of the brain and has to have a lesser emphasis on mathematical calculations. It is observable that the neurons depicted by most of the ANNs in computational neuroscience are numeric data with temporally varying numeric values. A quick review confirmed that it was the case when the structured programming approach was used and a network was considered as a matrix. Surprisingly even the advent of a new programming paradigm called Object-oriented programming has not modified the matrix representation of ANNs to object-based models. The object-oriented paradigm has given the ease in coding reusability but not facilitated any change in neural models. Further probing into available works divulged spiking neurons [15, 16], nano-inductors cum memristors [17, 18], spinnaker, and modern concurrent programming languages. Each one of them has contributed to the object-based model proposed in this paper. Spiking neurons have sustained the idea
Object-Based Neural Model in Multicore Environments …
21
Fig. 2 Object-based neuron’s structural variation from traditional artificial neurons
of the need for more biological plausibility and gave the idea of intercommunicating computational objects. Nano-inductors cum memristors [17] supported the need for tiny objects working in synergy. Spinnaker and nano-inductors being two contrasting approaches, where the former one being the interconnection of massively capable millions of processors and the latter wanting to be a massive collection of tiny objects and both being still valued to single-digit percentage in mimicking the brain, opined the search for further ANN models. Both of the above endeavors being designing physical object building approaches directed the search toward paradigm shifts in architecture and programming in computational science. The scope was found to be attached in the area of multicore servers using concurrent languages. Figure 2 illustrates the structural variations proposed for this OBM. The proposed structure is believed to provide diversified functioning of neurons in a network.
4.1 Object-Based Neuron Should Be a Firmware The probability of achieving a firmware model for a neuron and further into a neural network in mimicking a brain is perceived to be high in comparison to the functional hardware models. The functional hardware models [19] will increase the speed, reduce the size to nano-units, and will confirm a precise inter-communication but will lack in dynamicity in comparison to firmware models. It is trivial for the firmware models to depend on computational hardware available from time to time.
22
R. Krishnan and A. Murugan
The improved computational multicore environments of the current generation have facilitated this work.
4.2 Object-Based Model Should Be Spiking Spiking neurons are said [20] to be more biologically plausible models of neural networks. A spike is a discrete event, and such networks concentrate on membrane potential and their response to a quantum of the spike received by it. The huge amount of endeavors in this area of research are making the spiking neurons [21] a wanted area for researchers. The physical model [22, 23] building of a neuron is done through building physical prototypes (nano-inductors, memristors) [24, 25], and virtual modeling is also done through software simulation tools [26]. This work which is purely the software computational model accommodates only the term spike into it. This model does not endeavor toward any physical model. This model perceives an artificial model of a neuron has to be a running process or a routine. A routine that imitates a neuron has to be as tiny as possible and should be alive all the time but should be active only when required. That is, it should be alive but idle until a spike makes it active.
5 Object-Based Model (OBM) Should be Functional in Nature A neuron is a member of a family of billions of neurons and is individually a functional unit. For simulating a neuron as a computation model, the easiest term that qualifies for such a model is a ‘function.’ If you see a ‘function’ as a neuron, immediately the suitability is questioned because all functions are not considered as separate processes from the invoking process. There comes the immediate two alternate terms which can be running units within a computer are ‘threads’ and ‘p-threads.’ The pthreads of UNIX and the threads of JAVA which look to be a suitable alternative get denied due to inter-communication and context switching overheads. Their dependency on the operating system (OS) and lack of application-level control are also the reasons for finding suitable alternatives to them. The third alternative programming language term ‘object’ which evolved has individual properties and common functionalities that may coincide with the definition of the biological neuron. This also gets denied to duplicate a biological neuron because data members of the object are individual, whereas the methods are not like threads, which are alive always. They are called-functions whenever necessary, and even, they are shared by objects belonging to the same type. In other words, for multiple instances of a class, the methods are not loaded independently and all the implementations use shared segment concepts.
Object-Based Neural Model in Multicore Environments …
23
After knocking into the closed doors of these structural, object-oriented, and functional programming, the endeavor turned to visit the modern trends in a programming language as hinted earlier. The search toward the types of threads in concurrent processing which are non-preemptive in nature leads to the doors of the ‘Go’ language.
5.1 Search for a Modern Programming Paradigm The objectives set for the OBM must pass the feasibility test with regard to contemporary hardware architecture available and also programming paradigms that can suit the model. The availability of multicore architecture immediately satisfies the proposal of multiple neural objects occupying multiple cores concurrently. The vital part of the proposal relies on the search and find conducted for a suitable programming paradigm which will make the model feasible. The criteria set for the desired programming paradigm are as follows, 1. 2. 3. 4.
5.
The programming language or environment must have the feature of looking back and changing the basics according to the architectural change. The language should support multiple processes and communicate easily [27]. More than inter-process communication the environment should provide control over process execution and hierarchy. The application or network must embed execution control rather than an intermediate environment (a virtual machine) other than the operating system hindering the liberty of the process directly exploiting the services of the operating system. The language should be mechanically sympathetic [27] to the preferred multicore architecture.
The go language is an ace up one’s sleeve for this paper. Tables 1 and 2. illustrate the same. Table 2 illustrates the neural traits and Go language features as the basis for the proposed neural model. Figure 3. illustrates the neural network functioning of the proposed model in Go language.
6 Conclusion This paper repurposes available technologies, programming paradigms, and ANN models for the objective of mimicking the brain. The proposal ‘object-based model’ for a neuron is to give a change in perception of an artificial neuron as a running routine/thread. The paper analyzes the reasoning behind the need for change. The paper also describes the possibilities for such a model with the modern programming
Goroutines are anonymous
Goroutines are scheduled cooperatively
Go has time-based libraries
Go has channels, ‘sync’ and workgroups Goroutines intercommunications are very handy and have minimal process overheads compared to threads of familiar languages like ‘Java’ and ‘Python’
Neurons are anonymous
Neurons cannot be controlled individually [28]
Neuron’s functions are temporal [29]
Neurons respond to signals and messages
Remarks
A neuron is a functional unit performing a part of a larger function
(continued)
Goroutines respond to time library functions like a ticker
This work prefers co-operative scheduling followed by Go language before the Go version 1.14. The Goroutines will not be preempted by the Go scheduler when it is running through an OS thread, similar to a neural process that is initiated will not be controllable externally
i. Goroutines need not have a name ii. Functions having names can become Goroutines of multiple instances still will not have any ID iii. The external factors like parameters and messages make similar routines function differently
Goroutine is a lightweight processing unit which can perform a part of a larger application
“Go” language feature Goroutine
Neural trait
Table 1 Correlation of neural traits and Go language
24 R. Krishnan and A. Murugan
Neurons work concurrently in groups
Neural trait
Table 1 (continued)
Goroutines suit concurrency
“Go” language feature
Remarks i. Goroutine design is a multicore concurrency centric ii. The performance of Goroutine in a multicore environment makes its choice automatic for process-centric neural implementations iii. The improved concurrency makes it suitable to work in non-deterministic situations
Object-Based Neural Model in Multicore Environments … 25
26
R. Krishnan and A. Murugan
Table 2 Neural character and model structure mapping Neural traits
Goroutine structure
Neuron perform tasks indefinitely between cell Goroutines should be performing its task birth and death within an infinite loop func neuron() { for { // Neural Task } } Neurons are initiated by the external task
Main invokes many instances of neurons func main() { go neuron() go neuron() }
Neurons perform short tasks and wait for other neuron() blocks for signals within infinite loop neurons func neuron() { for { // channel blocks and Synchronous wait } } Neurons when alive can be, active or inactive
Goroutines are invoked and can be in running or blocked state
Neurons wait for signals and messages. They use signals for state change and messages for process control
Go language supports goroutine blocking for channels and sync. broadcast(). Channels can be used for messages and broadcast() be used for signaling. (Here the word state is used for routine states within an application, therefore, phase change is used for making changes within the routine)
Inter Neuronal communication can be one to one or many to one or one to many
i. Neural implementation will use channels for one to one and many to one communications ii. Sync.broadcast() can be used for one to many
The neural performance is massively parallel but controlled parallelism is evident in temporal operations
i. Millions of Goroutines can be invoked and the tool of wait groups in Go language can be used for sequencing neural functions ii. The controlling functions will be done by separate Goroutines
language Go. The challenge to the success of the model relies on the problem selections, concurrency, and non-deterministic approach of the model to reach the goal. The future works can include real-time applications like human behavior mapping to support medical diagnostics or crime detection or guided marketing, etc.
Object-Based Neural Model in Multicore Environments …
27
Fig. 3 Neural network functioning of the proposed model in Go language
References 1. A New Supercomputer Is the World’s Fastest Brain-Mimicking Machine, https://www.scient ificamerican.com/article/a-new-supercomputer-is-the-worlds-fastest-brain-mimicking-mac hine/ 2. S. Legg, M. Hutter, A collection of definitions of intelligence. Frontiers Artif. Intell. Appl. 157, 17 (2007) 3. S. Legg, M. Hutter, A Formal Measure of Machine Intelligence. arXiv preprint cs/0605024 (2006) 4. N. Schwarz, Emotion, cognition, and decision making. Cogn. Emot. 14(4), 433–440 (2000) 5. K. Shyamala, P. Chanthini, R. Krishnan, A. Murugan, Artificial neural network model adopting combinatorial inhibition process in multiple solution problems. Int. J. Eng. Technol. 7(3.4), 167–173 (2018) 6. K. Shyamala, P. Chanthini, R. Krishnan, A. Murugan, Adoption of combinatorial graph for inhibitory process in optimization problems. Int. J. Appl. Eng. Res. 13(13), 11261–11266 (2018) 7. M. Saber, A. El Rharras, R. Saadane, H.K. Aroussi, M. Wahbi, Artificial neural networks, support vector machine and energy detection for spectrum sensing based on real signals. Int. J. Commun. Netw. Inf. Sec. 11(1), 52–60 (2019) 8. D. Thukaram, H.P. Khincha, H.P. Vijaynarasimha, Artificial neural network and support vector machine approach for locating faults in radial distribution systems. IEEE Trans. Power Deliv. 20(2), 710–721 (2005) 9. I. Goodfellow, Y. Bengio, A. Courville, Deep Learning. MIT press (2016) 10. S. Jürgen,Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)
28
R. Krishnan and A. Murugan
11. Researchers Develop Device that Mimics Brain Cells Used for Human Vision, https://phys.org/ news/2020-02-device-mimics-brain-cells-human.html 12. Beyond Deep Learning—3rd Generation Neural Nets, https://www.datasciencecentral.com/pro files/blogs/beyond-deep-learning-3rd-generation-neural-nets 13. Spiking Neural Networks, The Next Generation of Machine Learning, https://towardsdatascie nce.com/spiking-neural-networks-the-next-generation-of-machine-learning-84e167f4eb2b 14. N. Kasabov, L. Benuskova, S.G. Wysoski, A computational neurogenetic model of a spiking neuron, in Proceedings. IEEE International Joint Conference on Neural Networks, vol. 1 (IEEE, 2005) 15. S. Ghosh-Dastidar, H. Adeli, Third Generation Neural Networks: Spiking Neural Networks. Advances in Computational Intelligence. Springer Berlin, Heidelberg (2009), 167–178 16. P.A. Merolla, J.V. Arthur, R. Alvarez-Icaza, A.S. Cassidy, J. Sawada, F. Akopyan, B. Brezzo, A million spiking-neuron integrated circuit with a scalable communication network and interface. Science 345(6197), 668–673 (2014) 17. Scientists Want to Mimic the Human Brain and They’ve Made a Breakthrough https://www. weforum.org/agenda/2016/10/scientists-want-to-mimic-the-human-brain-and-they-ve-madea-breakthrough/ 18. J. Grollier, D. Querlioz, M.D. Stiles, pintronic nanodevices for bioinspired computing. Proc. IEEE 104(10), 2024–2039 (2016) 19. A. Baddeley, Working memory: theories, models, and controversies. Annu. Rev. Psychol. 63, 1–29 (2012) 20. Y. Hao, X. Huang, M. Dong, B. Xu, A biologically plausible supervised learning method for spiking neural networks using the symmetric STDP rule. Neural Netw 121, 387–395 (2020) 21. J. Choi, M. Ahn, J.T. Kim, Implementation of hardware model for spiking neural network, in Proceedings on the International Conference on Artificial Intelligence (ICAI). The Steering Committee of the World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp) (2015), p. 700 22. D. Sarkar, J. Tao, W. Wang, Q. Lin, M. Yeung, C. Ren, R. Kapadia, Mimicking biological synaptic functionality with an indium phosphide synaptic device on silicon for scalable neuromorphic computing. ACS Nano 12(2), 1656–1663 (2018) 23. A. Trafton, Mimicking the Brain in Silicon https://news.mit.edu/2011/brain-chip-1115 24. V.K. Sangwan, D. Jariwala, I.S. Kim, K.S. Chen, T.J. Marks, L.J. Lauhon, M.C. Hersam, Gate-tunable memristive phenomena mediated by grain boundaries in single-layer MoS2. Nat. Nanotechnol. 10(5), 403–406 (2015) 25. Y. Babacan, F. Kaçar, K. Gürkan, A spiking and bursting neuron circuit based on memristor. Neurocomputing 203, 86–91 (2016) 26. R. Brette, M. Rudolph, T. Carnevale, M. Hines, D. Beeman, J.M. Bower, M. Zirpe, Simulation of networks of spiking neurons: a review of tools and strategies. J. Comput. Neurosci. 23(3), 349–398 (2007) 27. Scheduling In Go: Part I—OS Scheduler, https://www.ardanlabs.com/blog/2018/08/schedu ling-in-go-part1.html 28. Brain Basics: The Life and Death of a Neuron, https://www.ninds.nih.gov/Disorders/PatientCaregiver-Education/Life-and-Death-Neuron 29. A.M. Rossi, V.M. Fernandes, C. Desplan, Timing temporal transitions during brain development. Curr. Opin. Neurobiol. 42, 84–92 (2017)
Advancement in Classification of X-Ray Images Using Radial Basis Function with Support of Canny Edge Detection Model C. M. A. K. Zeelan Basha, T. Sai Teja, T. Ravi Teja, C. Harshita, and M. Rohith Sri Sai
Abstract With the recent technological innovations, medical image processing plays an important role in offering better diagnosis and treatment. In this work, an Xray image has been used to identify various orthopedic- and radiology-based mussel disorders. For the preprocessing stage, mean, median, wiener filters are used for noise removal, canny edge segmentation is proposed for image acquisition, and radial basic function (RBFNN) machine learning optimization is developed for classification of disorders, respectively. To classify the X-ray image, the above methods are applied to various body parts such as head, neck, skull, palm, and spine. The results compete with present methods and achieve 97.82% accuracy for the classification of X-ray image diseases. Keywords X-ray image · RBFNN · Mean · Median · Weiner filters · Canny edge segmentation
1 Introduction X-ray images are utilized to form the imaging of body parts and muscles. They help physicians to identify specific diseases that affect their patients. X-rays are most often used to detect the broken bones as cuts and fractures clearly display in X-ray pictures, enabling doctors to rule out sprains and pains. These images are shaded with black and white because various tissues and bones can absorb different rate of radiations and calcium in hard materials like bones that absorb more. Such that bones shown in X-rays look like the white color. Flesh and fat soft tissues can engage fewer X-rays so it looks like a gray color. This chromatic visualization is shown in Fig. 1 clearly. C. M. A. K. Zeelan Basha (B) · T. Ravi Teja · C. Harshita · M. Rohith Sri Sai Department Of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Andhra Pradesh, India T. Sai Teja Department Of Electronics and Communication Engineering, Maulana Azad National Institute of Technology, Bhopal, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_3
29
30
C. M. A. K. Zeelan Basha et al.
Fig. 1 X-ray image
The figure above explained has X-ray model, any X-ray image differentiates as like here, i.e., tissues and flesh absorb photons from X-rays. Bones absorb fewer photons compared to tissues, and this entire interest of imaging is detected by using film or image detectors. The X-rays are two-dimensional raw images, which are projections from radiation phenomena. In this work, the main focus will be given to the automatic detection of brain-related fractures. The fractures and tumors have classified the Xray images in an effective manner. In general, the X-ray-based disorders can handle the area of interest and structures. An automatic X-ray skull image classification is always remaining as a challenging task. This can be explained as below: 1. 2.
Intensity values and the structure of interest. Individual dimensions.
The relationship between X-ray images and the detection of disorders with respect to hard materials has been given a prior task. This work has presented the X-ray images of skull classification by using RBFNN classification and image acquisition with the help of segmentation and filters. A dataset of 500 X-ray images is collected from KIMS hospital, Hyderabad, out of which 300 are normal and 100 are fractured cases. Organization of the paper: Sect. 1 explains about the general introduction of Xray images and image acquisition methods, and Sect. 2 describes literature survey related to skull X-ray image; in Sect. 3, the methodology has been explained, and Sects. 4 and 5 briefly explains about results and conclusion of the paper.
Advancement in Classification of X-Ray Images Using Radial …
31
2 Literature Survey Earlier articles have used CAD structures to encounter ILD in chest radiographs through texture assessment. For instance, the CAD device of the Kun Rossman Laboratory in Chicago has divided the lung into more than one area of the hobby and analyzed the lungs “ROI to decide whether or not or not there were any abnormalities [1]. Then, pretrained NNs have been used to classify suspicious regions to be detected. This gadget can assist medical doctors in beautifying the accuracy of interstitial lesion detection. Plans et al. developed a flexible scheme for CAD of ILD [2]. This technique can discover a selection of pathological competencies of interstitial lung tissue that is totally based on an energetic contour set of rules that would choose the lung vicinity. The area is then divided into 40 special areas of interest [3]. Then, a-dimensional Daubechies wavelet redecorate is completed at the ROI to calculate the texture diploma. However, with the high-quality application of deep reading in the detection of lung ailments and there can be a little literature on the detection of interstitial lung disease within the absence of a massive chest X-ray dataset on ILD [4]. Most of the literature has used CT datasets to locate ILD. Other diseases in chest X-rays are similar to pulmonary nodules, tuberculosis, and ILD [5–7]; there are extraordinary ailments that can be detected, including cardiomegaly, pneumonia, pulmonary edema, and emphysema. There is a good deal less literature on those sicknesses, and a brief communication is given here. Detecting cardiomegaly commonly calls for reading the coronary heart length and calculating the cardiothoracic ratio (CTR) and growing a cardiac tumor screening device. “Candemir et al.” used 1D-CTR, 2D-CTR, and CTAR as functions, and they used SVM to categorize 250 cardiomegaly photographs and 250 regular pics, acquiring accuracy of 76.5% [8, 9]. Islam et al. used multiple CNNs to discover cardiomegaly. The community becomes as it should be adjusted on 560 photograph samples and mounted on a hundred snapshots and that they acquired the maximum accuracy of 93%, which is 17% elements higher than within the literature [10–12]. Pneumonia and pulmonary edema may be categorized with the aid of extracting texture abilities. Parveen et al. used an FCM clustering algorithm to find pneumonia. The outcomes confirmed that the lung location of the chest becomes low in black or dark gray. When an affected person has pneumonia, the lungs are full of water or sputum. Thus, there can be more absorbed radiation, and the lung areas will be white or mild gray. This approach can assist medical doctors to come across the degree of infection without troubles and appropriately. “Kumar et al.” used a device reading set of regulations to carry out texture analysis of chest X-rays.
3 Methodology For the preprocessing stage, image acquisition methods like wiener filter are selected. Various filters like mean and median have various difficulties at the time of image
32 Fig. 2 Block diagram
C. M. A. K. Zeelan Basha et al.
Input x-ray skull image
Wiener filtering
Canny-edge segmentation
RBFNN
Classification
noise reduction. For segmentation, canny edge-based segmentation is used, this method can help to perform a brief study on the selected skull image information. After that radial bases, neural networks have been applied for classification. Figure 2 explains about proposed X-ray image classification with deep learning mechanism. In this, various steps are used to identify the problems in the skull.
3.1 Wiener Filtering In X-ray image analysis, wiener filters are very useful for noise eliminations and image de-noising is a significant task in medical image processing. Various filters like median and mean are available, but these have more limitations. So, wiener filter can handle presented limitations,
(1)
The above Eq. (1) explains about wiener noise removal function; this can handle the noise in the X-ray images with efficient manner (Fig. 3).
Advancement in Classification of X-Ray Images Using Radial …
33
Fig. 3 X-ray image process
3.2 Canny Segmentation In this segmentation, we are using edge-based detection for multi-stage operations; it is a technique to extract useful structural information from images. This computer vision requires the coefficients like OSTU segmentation method. This canny segmentation can handle the criteria has illustrated below. 1. 2. 3.
Edge-based low error rate detection, this catches the image edges. Edge point detection which is the center of localized edges. Image noise has been removed and marked the space where falls edges presented.
Edges are detected clearly by using canny segmentation, due to which the fracture could be easily detected. On the whole, the accuracy in detecting will increase. 1 (i − (k + 1))2 + ( j − (k + 1))2 ; 1 ≤ i, exp − Hi j = 2π σ 2 2σ 2
j ≤ (2k + 1) (2)
In this, H represents impulse response of the canny edge segmentation method. ⎡
2 ⎢4 ⎢ 1 ⎢ B= ⎢5 159 ⎢ ⎣4 2
4 9 2 9 4
5 12 15 12 5
4 9 12 9 4
⎤ 2 4⎥ ⎥ ⎥ 5 ⎥ ∗ A.. ⎥ 4⎦ 2
(3)
B is the output of segmentation process, and A is the input. d(x, y) =
2 2 G x (x, y) + G y (x, y)
w(x, y) = exp −
√
d(x,y) 2h 2
(4)
34
C. M. A. K. Zeelan Basha et al.
Fig. 4 RBFNN deep learning model
The above Eq. (4) is useful for weight and distance calculation with respect to adaptive ness in the segmentation process.
3.3 RBFNN Radial basis function is a deep learning process; in this, we classify the selected skull X-ray image. This classification method competes with SVM and KNN models; RBF community learning requires the dedication of the RBF centers and the weights. The selection of the RBF facilities is maximum vital to RBF network implementation. The facilities can be positioned on a random subset of all of the education examples or decided by using clustering or via a getting to know the procedure. One can also use all of the information elements as centers inside the beginning and then selectively dispose of facilities the use of the-NN class scheme [27] (Fig. 4). X = input factor and C = weighted sum. For some RBFs which include the Gaussian, it is also crucial to determine the smoothness parameter. Existing RBF community studying algorithms are specially derived for the Gaussian RBF community and can be changed for this reason at the same time as distinctive RBFs are used. Mathematical computations of RBFNN: P = Pj −
P j h j hTj P j λ j + hTj P j h j
,
(5)
where P is the projection matrix which is used to crop the analysis of linear networks that would be useful. −1
T T A−1 m = Hm Hm + λUm Um
T m H m + λIm −1 UmT −1 = Um−1 H
Advancement in Classification of X-Ray Images Using Radial …
⎡ ⎢ ⎢ ⎢ = Um−1 ⎢ ⎢ ⎣
1 λ+h˜ 1T h˜ 1
0
0 λ+h1˜ T h˜ 2 2 .. .. . . 0 0
−1 A−1 UmT = Um−1 .
...
0
··· .. . ···
0 .. . 1 λ+h˜ 1T h˜ 1
35
⎤ ⎥ ⎥ ⎥ T − ⎥ Um ⎥ ⎦ (6)
where A−1 represents the variance matrix. ⎡
⎤ x1 ⎢ x2 ⎥ ⎢ ⎥ X = ⎢ . ⎥. ⎣ .. ⎦
(7)
xn ⎡
H11 H12 ⎢ H21 H22 ⎢ H=⎢ . .. ⎣ .. . H p1 H p2
⎤ . . . H1m . . . H2m ⎥ ⎥ . . . .. ⎥ . . ⎦ . . . Hpm
(8)
H represents the design matrix, respectively. Sensitivity =
tp tp + fn
(9)
Specificity =
tn tn + fp
(10)
tp + tn tp + fp + tn + fn
(11)
Accuracy =
Equations (5)–(11) explains about mathematical modeling of RBFNN and performance metrics of proposed X-ray image calculations (Fig. 5).
4 Results and discussion In this work, skull image has been taken as input applying the image acquisition methods like wiener filter and segmentation for classification we are applying RBFNN results are simulated using MATLAB 2015b. Tables 1 and 2 explain about performance metrics of proposed method, and these are compared with the existed methods (Figs. 6, 7, and 8).
36
C. M. A. K. Zeelan Basha et al.
Fig. 5 Architecture of classification of X-rays
Table 1 Confusion metrics parameters
Table 2 De-noising PSNR value
Actual
Predicted Positive
Negative
Positive
Tp = True positive
Fn = False negative
Negative
Fp = False negative
Tn = True positive
Filter
PSNR
References
Wavelet
18.51923
[11]
Median
20.2567
[13]
High boost
23.1678
[12]
Proposed wiener
31.52
Present method
Tables 3, 4, and 5 explain about various methods which are already implemented; these are compared with the proposed RBFNN model (Table 6; Fig. 9).
Advancement in Classification of X-Ray Images Using Radial …
37
Fig. 6 KNN model
Fig. 7 SVM model
5 Conclusion In this investigation, a skull X-ray image denoising, segmentation and contrast adjustment have been deployed to improve the quality of X-ray images. After that, statistical analysis is performed based on SVM, KNN, and RBFNN methods. In this, RBFNN model achieves better performance and compete with the present computed-aided design. This work is very helpful for finding the X-ray image diagnosis process in various medical laboratories with orientation. This is helpful for radiologists and
38
C. M. A. K. Zeelan Basha et al.
Fig. 8 RBFNN model Table 3 KNN-classification
Table 4 SVM classification
Table 5 RBFNN
Type
Sensitivity
Accuracy
Specificity
Skull-1
95.23
89.12
87.23
Skull-2
96.12
90.12
88.42
Skull-3
95.12
88.78
89.13
AVG
95.49
89.34
88.26
Type
Sensitivity
Accuracy
Specificity
Skull-1
96.23
90.12
88.23
Skull-2
97.12
91.12
89.42
Skull-3
96.12
89.78
90.13
AVG
96.49
90.34
89.26
Type
Sensitivity
Accuracy
Specificity
Skull-1
97.23
91.12
89.23
Skull-2
98.12
92.12
90.42
Skull-3
97.12
90.78
91.13
AVG
97.49
91.34
90.26
Table 6 Comparison of work Type
Sensitivity
Accuracy
Specificity
KNN
95.49
89.34
88.26
SVM RBFNN
96.49
90.34 97.49
89.26 91.34
90.26
Advancement in Classification of X-Ray Images Using Radial …
39
Fig. 9 RBFNN versus remaining models
doctors to perform accurate and real diagnosis. The performance metrics like accuracy has been achieved by 90%, sensitivity 97%, and specificity 91%, respectively, which is a better achievement when compared to the present research work.
References 1. M. Loog, B. van Ginneken, M. Nielsen, Detection of interstitial lung disease in PA chest radiographs, in Medical Imaging 2004: Physics of Medical Imaging, vol. 5368 (SPIE, San Diego, 2004). https://doi.org/10.1117/12.535307 2. H. Abe, H. Macmahon, J. Shiraishi, Q. Li, R. Engelmann, K. Doi, Computer-aided diagnosis in chest radiology. Semin. Ultrasound CT MRI 25(5), 432–437 (2004) 3. M.T. Islam, M.A. Aowal, A.T. Minhaz, K. Ashraf, Abnormality Detection and Localization in Chest X-rays Using Deep Convolutional Neural Networks. arXiv preprint arXiv:170509850 (2017) 4. A. Kumar, W. Yen-Yu, L. Kai-Che, I.C. Tsai, H. Ching-Chun, H. Nguyen, Distinguishing normal and pulmonary edema chest x-ray using Gabor flter and SVM, in 2014 IEEE International Symposium on Bioelectronics and Bioinformatics, ed by C. Li (IEEE ISBB, Taiwan, 2014), 1–4. https://doi.org/10.1109/isbb.2014.6820918 5. U. Avni, H. Greenspan, E. Konen, M. Sharon, J. Goldberger, X-ray categorization and retrieval on the organ and pathology level, using patch-based visual words. IEEE Trans. Med. Imaging. 30(3), 733–746 (2011) 6. N.M. Noor, O.M. Rijal, A. Yunus, A.A. Mahayiddin, C.P. Gan, E.L. Ong, et al., Texture-based statistical detection and discrimination of some respiratory diseases using chest radiograph, in Advances in Medical Diagnostic Technology (Springer, Singapore, 2014). pp. 75–9 7. V. Bindhu, BIomedical image analysis using semantic segmentation. J. Innov. Image Process. (JIIP) 1(02), 91–101 (2019) 8. C.Z. Basha, K.M. Sricharan, C.K. Dheeraj, R. Ramya Sri, A study on wavelet transform using image analysis. Int. J. Eng. Technol. (UAE), 7(2), 94–96 (2018)
40
C. M. A. K. Zeelan Basha et al.
9. C.M.A.K. Zeelan Basha, T. Maruthi Padmaja, G.N. Balaji, Automatic X-ray image classification system. in Smart Innovation, Systems and Technologies, vol. 78 (Springer Science and Business Media, Deutschland GmbH), pp. 43–52 10. C.M A.K. Zeelan Basha, Maruthi Padmaja, and G.N.Balaji,“Computer Aided Fracture Detection System” Journal of Medical Imaging and Health Informatics Vol. 8, 526–531, 2018. 11. C.M.A.K. Zeelan Basha, T. Maruthi Padmaja, G.N. Balaji, EAI endorsed transactions on pervasive health and technology. 5(18), 1–6 (2019) 12. C.Z. Basha, M.R.K. Reddy, K.H.S. Nikhil, P.S.M. Venkatesh, A.V. Asish, Enhanced computer aided bone fracture detection employing x-ray images by Harris Corner technique, in 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC) (Erode, India, 2020), pp. 991–995.
Brain Tumour Three-Class Classification on MRI Scans Using Transfer Learning and Data Augmentation C. A. Ancy and Maya L. Pai
Abstract Accurate classification is a prerequisite for brain tumour diagnosis. The proposed method is a modified computer-aided detection (CAD) technique used for leveraging automatic classification in brain magnetic resonance imaging (MRI), where the model has trained a pipeline of convolutional neural networks (CNNs) using transfer learning (TL) on ResNet 50 with PyTorch. The proposed method employs benchmarked datasets from figshare database, where data augmentation (DA) is applied to increase the number of datasets that can further increase the training efficiency. Thus, the retrained model can classify the tumour images into three classes, i.e., glioma, meningioma, and pituitary tumours. Classification accuracy was tested by comparing the accuracy matrices, loss matrices, and confusion matrix and found to be 99%. The proposed model is the first of its kind that employs both DA and TL on the ResNet 50 model for performing a three-class classification on brain tumour, and results reveal that it outperforms all other existing methods. Keywords Computer-aided detection · Magnetic resonance ımaging · Convolutional neural network · Transfer learning · ResNet 50 · Data augmentation
1 Introduction The use of machine learning techniques for classification has changed the facets of CAD systems [1–3]. Brain tumour has become one of the perilous diseases across the globe. Generally, tumour is caused due to the anomalous development of cells all over the skull. Computer vision and deep learning techniques will widely employ the trained CNN models to facilitate diagnosis and treatment triage [4]. Many different imaging techniques exist, and this will give information about size, shape, location, and type of brain tumours such as magnetic resonance imaging (MRI), magnetic resonance spectroscopy (MRS), computed tomography (CT), single-photon emission C. A. Ancy (B) · M. L. Pai Department of Computer Science and IT, School of Arts and Sciences, Kochi, India Amrita Vishwa Vidyapeetham, Coimbatore, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_4
41
42
C. A. Ancy and M. L. Pai
computed tomography (SPECT), and positron emission tomography (PET). Figure 1 depicts the different brain imaging techniques used by the radiologists. Here, many factors such as angle, lighting, and resolution can disrupt the classification results. Among different imaging techniques available, MRI images are widely employed due to their strong resolution to soft tissues and they can also give detailed description on the brain images. MRI scans are of different types, they are FLAIR, T1-weighted, T2-weighted, etc. The proposed work uses T1-weighted contrast-enhanced magnetic resonance images (CE-MRI) for the work. T1-weighted images use all three image planes, i.e., axial, sagittal, and coronal views. It uses the gadolinium-chelate injection method to highlight the contrast in the images. Figure 2 shows the images of different types of MRI scans. Among all imaging techniques available, early detection plays a very important role in the diagnosis and treatment outcomes of the disease. According to the World Health Organization (WHO) standards, neurologists have classified brain tumours into more than 120 different types [5]. The proposed method deals with multiclass Fig. 1 Different brain imaging techniques
Fig. 2 Different types of MRI scans
Brain Tumour Three-Class Classification on MRI Scans Using …
43
classification, where the data is taken from publicly available dataset figshare. Among the existing 120 different types of tumours, the dataset contains 2D images of three kinds of brain tumours: glioma, meningioma, and pituitary tumours, i.e., three-class classification. CAD method was first evolved using different traditional machine learning algorithms, which takes a long time since it involves different steps like image preprocessing, segmentation, feature extraction, and classification. Despite these drawbacks, many such CAD models outperformed the scenario [6]. There accuracy depends on the handcrafted features which they obtained and efficient modelling of machine learning algorithms employed. It was after 2014 that much of the works in brain tumour detection was done using deep learning (DL) techniques. DL methods do not require handcrafted feature extraction, whereas they have the capability to self-learn from the datasets. Among different methods in DL, CNN models are widely employed and have achieved good results nowadays due to several reasons: (1) they are so powerful for complex image segmentation and classification problems; (2) availability of a large number of labelled training data in medical imaging; (3) availability of different transfer learning techniques to get knowledge; and (4) availability of powerful graphics processing units (GPU). The proposed work also addresses hardware restrictions as medical image datasets would be heavy, and it requires better hardware specifications for storing, training, and deploying such heavy datasets to ensure maximum accuracy. Figure 3 shows a midline post-contrast sagittal T1-weighted MRI image, where different parts of the brain are marked. The types of tumours are classified based on the nature of the growth, type of spreading, and also the location in which the growth appears. Figure 4 depicts the three different views (sagittal, axial, and coronal) in which MRI scanning is performed to obtain T1-weighted images. And Fig. 5 shows the three prominent brain tumour types, which are taken from the figshare database and considered for the three-class classification in the present work. The tumour portions are marked with red shading. The structure of the paper is as follows. state of the art is described in Sect. 2, and the architecture of proposed method and experimental setup is discussed in Sect. 3. Section 4 gives the test results and comparisons with brief discussions. Section 5 gives the conclusion, and Sect. 6 discusses the outlook for future scope.
2 State Of The Art The research work using CNN particularly in the field of brain tumour detection has increased since 2014. According to the statistics [7], the number of works published per year as in Google Scholar containing the keywords CNN and the brain tumour was 19 in the year 2014, 47 in 2015, 137 in 2016, 468 in 2017, 977 in 2018, increased to 1790 in 2019 and 351 in 2020. From all these studies, it was observed that the
44
C. A. Ancy and M. L. Pai
Fig. 3 Parts of brain
Fig. 4 Sagittal, axial, and coronal views of MRI scans
Fig. 5 MRI samples of meningioma, glioma, and pituitary tumour types
Brain Tumour Three-Class Classification on MRI Scans Using …
45
availability of brain tumour images was a major problem in many of the works. Due to this problem, overfitting can occur to CNN architectures. In 2018, Mohsan et al. [8] proposed a method that combines discrete wavelet transform (DWT) and principal component analysis (PCA) and got good accuracy for four class brain tumour classification. Another work which came in 2018 was by Naceur et al. [9], where he used fully automatic brain tumour segmentation using end-to-end incremental deep neural networks in MRI images. In 2019, Khaled [10] came up with a review article where he describes different works that happened in the brain tumour detection domain using different CNN models. Another method came up in 2019, by Deepak et al. [11], where they addressed the three-class brain tumour classification problem using CNN features via transfer learning. They employed deep transfer learning and uses a pretrained GoogleNet to extract features from brain MRI images and obtained a classification accuracy of 98%. Another work in 2019, by Vijayakumar [12], proposed a model of capsule neural network that can work well even with a small number of datasets, unlike the CNN model. In 2019, Hossam et.al [13] came up with a multiclass classifier where the model used two publicly available datasets and obtained appreciable overall accuracy. In 2020, Maglogiannis et al. [7] came up with a work that uses MRI images and the transfer learning for brain tumour classification. Their CAD systems employ transfer learning for feature extraction and use it in nine deep pretrained convolutional neural network (CNN) architectures. The proposed method employs the data augmentation technique to avoid insufficient data problems and overfitting. Using pretrained CNN models along with transfer learning and adjusting the learning and hyperparameters can achieve good results, and it works much faster and simpler than models working with randomly initialized weights [14, 15]. The proposed method uses three-class tumour classification in brain MRI by data augmentation and transfer learning using the ResNet 50 model, which is one of the outperformed models.
3 Architecture of Proposed Method and Experimental Setup 3.1 Proposed Method In the proposed method, we employ the pretrained architecture of ResNet 50, and the extracted features are used by deep transfer learning to produce knowledge. We use CE-MRI images from the figshare database for classification (glioma, meningioma, and pituitary). To avoid the problem of the small number of training datasets and the problem of overfitting, our classification systems use data augmentation techniques to improve the number of datasets and deep transfer learning for feature extraction. The proposed method is the first of its kind which employs both DA and TL for the three-class classification of brain MRI. ResNet is one of the outperformed pretrained models which is widely employed in this area because of its simpler structure and less
46
C. A. Ancy and M. L. Pai
complexity. ResNet employs residual learning methods for better training of networks and thus reduces the error caused by increasing depth. It has many models available, where we employ the ResNet 50 model as it can give better results. Thus, after finetuning and learning, the retrained model could accurately classify the tumour into three types. Classification accuracy and various performance metrics are computed. Good results are obtained in case of accuracy, small training time, and less computation cost compared to the related work. Figure 6 shows the generic model of our proposed system, and Fig. 7 depicts the architecture of the pretrained model of ResNet 50. In the retrained model, instead of training the model from the scratch, the learned features are transferred from the pretrained model. The pretrained model employed in the work is ResNet 50, which is already a built model for training and contains many fully connected layers for convolution and pooling, and is used as the starting point. It learns and passes the knowledge to the retrained model. By using the pretrained model, we can directly use the weights and architecture obtained, as it is previously learned on large datasets, and apply directly to our problem using transfer learning. ResNet 50 pretrained model is directly available in the Keras library and is trained using the ImageNet dataset. ResNet 50 is a residual network variant and uses 50 layers. The retrained model is created by fine tuning ResNet 50 which is already been trained. The architecture of the pretrained model can be employed, and weights can be initialized randomly and train the model according to our dataset again. Images acquired from the benchmarked publicly available figshare database undergo
Fig. 6 Generic architecture of the new model
Brain Tumour Three-Class Classification on MRI Scans Using …
47
Fig. 7 Architecture of ResNet 50
Fig. 8 Proposed model
preprocessing to remove noise and data augmentation to multiply in number. Thus, the retrained model uses the modified data for training and the knowledge from transfer learning. Retrained models are used to speed up learning and to avoid the need of training large datasets again. While using the pretrained model, it is important to retrain the upper layers as the higher-level features will be class-specific.
48
C. A. Ancy and M. L. Pai
Figure 8 shows the diagram of the proposed model of our work. The objective of the proposed model is to bring up a fully automated CAD which can increase the overall accuracy, reduce the overfitting, and speed up the training time.
3.2 Experimental Setup Running and training the DL models with large and real datasets has been the major hindrance for researchers, as it requires a huge computational power. We implemented the proposed classification model using python and GPU-computing libraries (CUDA, OpenCL) in Google Colaboratory (Colab) which is a free research project of Google. It is a cloud-based Jupyter notebook which gives the users a fully fledged runtime for DNN architectures and hence requires no setup and also contains many ML libraries preinstalled. We can also add our libraries and can store the code in Google drive so that it can be worked from anywhere. It poses a robust GPU with free-of-charge, hence widely employed in CNN training as it is more secure and durable.
3.3 Dataset Used and Preprocessing Employed The datasets from the public database figshare [16] are used in the proposed work. It contains a 3064 T1-weighted CE-MRI modality for the first gate check, which is collected from 233 patients of three different hospitals in China during 2005– 2010. The dataset contains 2D images of three kinds of tumour images: meningioma (708 images), glioma (1426 images), and pituitary tumour (930 images), which are in (.mat) data format and are of three different views: axial, sagittal, and coronal. Before feeding the data to augmentation, RoI-based preprocessing step for data cleaning is performed, to remove the small amount of Salt and Pepper and Gaussian noise which are present in the MRI scans. The data size of images is 512 × 512 pixels , whereas each pixel size is 49 mm × 49 mm. Grey images are first normalized and converted into RGB format, which is represented by m × n × 3 array with values ranging from [0, 1]. It is then resized to 224 × 224 × 3 (For ResNet 50 model). After the reduction, the dataset is divided into 70% (2145 images) training and the remaining 30% (919 images) for validation using the Pareto principle [17]. The splitting of datasets has also been done using the other popular ratios like 80–20, 75–25, still, 70–30 gave the highest overall accuracy.
Brain Tumour Three-Class Classification on MRI Scans Using …
49
3.4 Data Augmentation Data augmentation is a method that is applied on-the-fly during training to avoid the lack of data problem which causes overfitting and increases generalization performance by enhancing the training dataset itself. It also solves the problem of class imbalance, through oversampling the minority class. In the case of data insufficiency, the model can memorize the details of the training set, but can’t generalize it to the validation set. Deep learning works better with optimum datasets. To obtain the desired accuracy, we need to address the problem of lack of satisfactory amounts of data. We applied eight different augmentation techniques to extend the available data. This session briefs about the transformations applied in the data augmentation. They are edge detection, sharpening, gaussian blur, and emboss for the noise invariance and skewness, flipping, rotation, and shears for geometric transformations invariance. The original dataset contains 3064 MRI images. After augmentation, it got increased by a factor of 8, thus the modified datasets become large. It was observed that augmentation techniques increased the individual accuracy of data samples from three classes and also the overall accuracy of the system. Accuracy of class glioma, meningioma, pituitary, and overall accuracy of the model before augmentation was 95%, 96%, 95%, and 95%, respectively, while after augmentation was 99.46%, 98.10%, 99.84%, and 99%, respectively. Thus by applying these techniques, considerable amount of high-quality abundant data was obtained in the training phase.
3.5 Deep Transfer Learning Transfer learning is an optimization technique in machine learning which is primarily employed for improving performance and speed up the training process. We use a pretrained model approach for our work, which is the most commonly employed model in DL problems. It uses a model that is trained for a particular task and repurposes it to use for another related task as the inductive transfer. In the proposed work, the pretrained ResNet 50 model was retrained using the transfer learning technique where the data augmented training set from the figshare database was used. We use stochastic gradient descent (SGD) with 0.9 momenta, and it gives a decreasing learning rate for training and thus we get closer to the desired parameters. The learning rate was chosen to be 10−4 for the work. The value of the learning rate is suitably chosen as, if it is too small, optimization takes a lot of time whereas, if it is too large, the optimization may overshoot. Minibatch size is set to 128 images as a very large batch size can adversely affect the model quality. Finally, fine-tuning is done to lightly adjust the weights of the model. To perform the transfer learning, we train for 50 epochs and the networks are validated at appropriate iterations.
50
C. A. Ancy and M. L. Pai
4 Test Results and Comparisons with Brief Discussions In this section, evaluation criteria and performance matrices computed are described and the results of the proposed method are presented. The proposed classifier classifies the output into three classes: glioma, meningioma, and pituitary tumours. Figures 9, 10, and 11 show the MRI sample images of the final outputs obtained from the three-class classifier. The model could classify with a minimum number of incorrect predictions.
4.1 Performance Matrices and Evaluation The performance of the given model was computed using various evaluation indices such as accuracy matrices, loss matrices, and confusion matrix. Training and validation accuracy and loss of the proposed model with 70% of training samples are given Fig. 9 Image classified as glioma
Fig. 10 Image classified as meningioma
Brain Tumour Three-Class Classification on MRI Scans Using …
51
Fig. 11 Image classified as pituitary
Fig. 12 Training and validation accuracy
below. Classification accuracy is defined as the ratio of the number of correctly classified samples with the total number of data samples. Accuracy matrices for the training and validation are depicted in Fig. 12. It gives the overall accuracy of the proposed model. It shows how the proposed model’s prediction was correct compared to the true data. Good value in accuracy metrics reveals that the model is classified well. With the obtained results, it was revealed that the overall accuracy of the proposed model was 99%. Figure 13 shows the lose metrics obtained. Loss is measured not as a percentage, but it is a number calculated on training and validation dataset which gives how bad the prediction was on a single sample. It is the sum of errors of each sample in the validation and training datasets. A decline in the loss metrics reveals that the model’s prediction was good which in turn proves that the proposed model was accurate. Figure 14 shows the confusion matrix obtained by using the Scikit-learn library where the x-axis denotes the true label and the y-axis denotes the predicted label. Table 1 shows the obtained values of the confusion matrix. It is used for quantifying the performance of the classifier. This table gives a summary of the prediction results
52
C. A. Ancy and M. L. Pai
Fig. 13 Training and validation loss
Fig. 14 Confusion matrix Table 1 Confusion matrix for the proposed work obtained after training with 70% of train data Predicted Actual
M
G
P
M
891
4
0
G
15
1700
0
P
6
0
1064
P, M, G denotes Pituitary, Meningioma, and Glioma images respectively
Brain Tumour Three-Class Classification on MRI Scans Using …
53
of the three-class classification problem. Here the diagonal values give the correct predictions, i.e., 891, 1700, 1064 for classes meningioma, glioma, and pituitary. From these values, accuracy, precision, recall, specificity, and F1 score can be computed using the true-positive, true-negative, false-positive, and false-negative cases.
4.2 Test Results Accuracy, precision, recall, specificity, and F1 score of individual samples from these three classes were also found out to know the individual class performance. Those can be computed from the values obtained from the confusion matrix. Table 2 gives the accuracy, precision, recall, specificity, and F1 score values of the individual class predictions of class glioma (G), meningioma (M), and pituitary (P), computed using the formula (1) to (5). Table 2 shows the performance matrices values like accuracy, precision, recall, specificity, and F1 score obtained for the individual classes. From the results of Table 2, it was revealed that individual classification efficiency was pretty good for class glioma and pituitary. For class meningioma, it was slightly week. Accuracy = (TP + TN)/(TP + FP + FN + TN)
(1)
Precision = TP/(TP + FP)
(2)
Recall = TP/(TP + FN)
(3)
Specificity = TN/(TN + FP)
(4)
F1 Score = 2TP/(TP + FP)
(5)
Table 2 Performance matrix values for three-class classifier outputs Accuracy
Precision
Recall
Specificity
F1 score
G
99.46
99.13
99.77
99.24
0.99
M
98.10
99.55
97.7
99.24
0.98
P
99.84
99.43
1
1
0.99
P, M, G denotes Pituitary, Meningioma, and Glioma images respectively
54
C. A. Ancy and M. L. Pai
Table 3 Comparison of the overall accuracy of the model with existing works Author(s)
Methodology
Training data (%)
Accuracy (%)
Pashaei [18]
CNN ELM
70
93.68
Afshar [19]
CapsNet
70
90.89
Deepak and Ameer[11]
CNN & transfer learning
5
97.1
Swati et al. [20]
Transfer learning and fine tuning
75
94.82
Proposed method
Data augmentation and transfer 70 learning using ResNet 50
99
4.3 Comparison with Existing Works The overall accuracy of the model is compared with other existing works too. Table 3 shows the comparison value of the overall accuracy of the proposed method with other existing related works. The proposed method took 50 epochs to train, 70% training data to obtain an overall accuracy of 99%. The table displays the methodology employed in each state of the art, along with the percentage of data used for training and the accuracy obtained. Results revealed that the proposed method outperformed all other existing methods.
5 Conclusion An accurate and fully automated three-class classifier is proposed for classifying brain tumours, which takes CE-MRI benchmarked datasets from the figshare database. The problem with insufficient data samples and overfitting is addressed by data augmentation, which significantly improved the image samples for training and did the necessary preprocessing. The proposed system was pretrained with ResNet 50 architecture and transfer learning for hyperparameter tuning which eases the classification with simpler architecture and less time taken. The proposed method employs the combination of both DA and TL from a pretrained network, which is the first of its kind and test results and comparisons revealed that the proposed model outperformed all the state of the art. The work outperformed in the case of simpler architecture employed, a smaller number of epochs taken for training, reduced time consumption, lower number of outfits, and with a better classification of 99%. The limitations of the work include lower classification accuracy for meningioma tumours.
Brain Tumour Three-Class Classification on MRI Scans Using …
55
6 Outlook for Future Scope The work can be extended to perform further. First, the classification result of meningioma was a little week, and appropriate tuning of the transfer learning model may solve this issue. Secondly, different image fusion techniques can also be tried in the preprocessing phase to improve upon the accuracy of deep neural networks. Third, extending the same model with other existing models such as GoogleNet, AlexNet, SENet, VGG models, and their performances can be compared. Fourth, images from different modalities can be tried with models such as X-rays, PET, and CT types. Fifth, the number of epochs taken and its effect on the classification results can also be studied further.
References 1. M.S. Suchithra, M.L. Pai, Improving the prediction accuracy of soil nutrient classification by optimizing extreme learning machine parameters. Inform. Process. Agricul. (2019). https://doi. org/10.1016/j.inpa.2019.05.003 2. P. Aswathi Anand, M.L. Pai, Artificial neural network model for identifying early readmission of diabetic patients. Int. J. Innov. Technol. Explor. Eng. (IJITEE). 8(6) (2019) 3. K.S. Varsha, M.L. Pai, Rainfall prediction using fuzzy c-mean clustering and fuzzy rule-based classification. Int. J. Pure. Appl. Mathe. 119, 597–605 (2018) 4. S. Shakya, Analysis of artificial intelligence based image classification techniques. J. Innov. Image Process. (JIIP) 2(01), 44–54 (2020) 5. N.J. Tustison, K.L. Shrinidhi, M. Wintermark, C.R. Durst, B.M. Kandel, J.C. Gee, M.C. Grossman, B.B. Avants, Optimal symmetric multimodal templates and concatenated random forests for supervised brain tumor segmentation (simplified) with ANTsR. Neuroinformatics 13(2), 209–225 (2014). https://doi.org/10.1007/s12021-014-9245-2 6. E.I. Zacharaki, S. Wang, S. Chawla, D.S. Yoo, R. Wolf, E.R. Melhem, C. Davatzikos, Classification of brain tumor type and grade using MRI texture and shape in a machine learning scheme. Magn. Reson. Med. 62(6), 1609–1618 (2009). https://doi.org/10.1002/mrm.22147 7. R. Chelghoum, A. Ikhlef, A. Hameurlaine, S. Jacquir, Transfer learning using convolutional neural network architectures for brain tumor classification from MRI images. IFIP Adv. Inform. Commun. Technol. 189–200 (2020). https://doi.org/10.1007/978-3-030-49161-1_17 8. H. Mohsen, E.-S. El-Dahshan, E.-S. El-Horbaty, A.-B. Salem, Classification using deep learning neural networks for brain tumors. Fut. Comput. Inf. J. 3(1), 68–71 (2018). https:// doi.org/10.1016/j.fcij.2017.12.001 9. M.R. Naceur, S. Rachida, A. Akil, K. Rostom, Fully automatic brain tumor segmentation using end-to-end incremental deep neural networks in MRI images. Comput. Methods Programs Biomed. 166, 39–49 (2018). https://doi.org/10.1016/j.cmpb.2018.09.007 10. M.K. Abd-Ellah, A.I. Awad, A.A.M. Khalaf, H.F.A. Hamed, A review on brain tumor diagnosis from MRI images: practical implications, key achievements, and lessons learned. Magn. Reson. Imaging 61, 300–318 (2019). https://doi.org/10.1016/j.mri.2019.05.028 11. S. Deepak, P.M. Ameer, Brain tumor classification using deep CNN features via transfer learning. Comput. Biol. Med. 111, 103–345 (2019). https://doi.org/10.1016/j.compbiomed. 2019.103345 12. T. Vijaykumar, Classification of brain cancer type using machine learning. J. Artif. Intell. Caps. Netw. 2, 105–113 (2019)
56
C. A. Ancy and M. L. Pai
13. H.H. Sultan, N.M. Salem, W. Al-Atabany, Multi-classification of brain tumor images using deep neural network. IEEE Access 7, 69215–69225 (2019). https://doi.org/10.1109/access. 2019.2919122 14. M.I. Sharif, J.P. Li, M.A. Khan, M.A. Saleem, Active deep neural network features selection for segmentation and recognition of brain tumors using MRI images. Pattern Recogn. Lett. 129, 181–189 (2020). https://doi.org/10.1016/j.patrec.2019.11.019 15. J. Bernal, K. Kushibar, D.S. Asfaw, S. Valverde, A. Oliver, R. Martí, X. Lladó, Deep convolutional neural networks for brain image analysis on magnetic resonance imaging: a review. Artif. Intell. Med. 95(April), 64–81 (2019). https://doi.org/10.1016/j.artmed.2018.08.008 16. Figshare brain tumor dataset, https://doi.org/10.6084/m9.figshare.1512427.v5 17. A. Sarah, I. Abdelaziz, M. Ammar, H. Hesham, An enhanced deep learning approach for brain cancer MRI images classification using residual networks. Artif. Intell. Med. 102, 101779 (2020). https://doi.org/10.1016/j.artmed.2019.101779 18. A. Pashaei, H. Sajedi, N. Jazayeri, Brain tumor classification via convolutional neural network and extreme learning machines, in IEEE 8th International Conference on Computer and Knowledge Engineering (ICCKE, 2018), pp. 314–319. 19. P. Afshar, K.N. Plataniotis, A. Mohammadi, Capsule networks for brain tumor classification based on MRI images and course tumor boundaries, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP, 2019), pp. 1368–1372. 20. Z.N. Swati, Q.Z. Khan, M. Kabir, F. Ali, Z. Ali, S. Ahmed, Lu. Jianfeng, Brain tumor classification for MR images using transfer learning and fine-tuning. Comput. Med. Imaging Graph. 75(July), 34–46 (2019). https://doi.org/10.1016/j.compmedimag.2019.05.001
Assessing the Statistical Significance of Pairwise Gapped Global Sequence Alignment of DNA Nucleotides Using Monte Carlo Techniques Rajashree Chaurasia and Udayan Ghose
Abstract Generally, global pairwise alignments are used to infer homology or other evolutionary relationships between any two sequences. The significance of such sequence alignments is vital to determine whether an alignment algorithm is generating the said alignment as evidence of homology or by random chance. Gauging the statistical significance of a sequence alignment obtained through the application of a global pairwise alignment algorithm is a difficult task, and research in this direction has only provided us with nebulous solutions. Moreover, the case of nucleotide alignments with gaps has been scarcely explored. Very little literature exists on the statistical significance of gapped global alignments employing affine gap penalties. This manuscript aims to provide insights into how the statistical significance of gapped global pairwise alignments that may be inferred using Monte Carlo techniques. Keywords Global pairwise alignment · Scoring matrix · Statistical significance · Gapped alignments · Affine gap penalty · Monte Carlo method · Extreme value distribution
1 Background Global pairwise alignments are generally used to measure the evolutionary relatedness or homology of two sequences over their entire lengths. The most popular pairwise global alignment algorithm is the Needleman–Wunsch algorithm [19]. However, the standard Needleman–Wunsch works well for minimal length sequences (of the R. Chaurasia (B) Guru Nanak Dev Institute of Technology, Directorate of Training and Technical Education, Government of NCT of Delhi, Delhi, India e-mail: [email protected]; [email protected] R. Chaurasia · U. Ghose University School of Information, Communication & Technology, Guru Gobind Singh Indraprastha University, New Delhi, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_5
57
58
R. Chaurasia and U. Ghose
order of a few hundred nucleotides in a reasonable time) only. Several improvements over this basic algorithm have been devised in [5, 10, 12, 18, 23] to alleviate the problem of scaling up to longer sequences, the most notable among these being the Myers-Miller algorithm for linear space alignment [18]. Many online tools are now available to implement both the basic Needleman–Wunsch algorithms for small length sequences and a faster variant based on the improvisations. Two of such tools are widely used when compared to the others found at [4, 21]. The optimal global alignment scores obtained using these algorithms are not considered as a sufficient proof for their shared homology. Further, it is required to measure their statistical significance to be sure that such an optimal alignment could not be obtained by coincidence. The statistical significance of a global pairwise optimal alignment can be obtained from the optimal alignment score [7]. However, the alignment and the corresponding optimal score are dependent on two parameters, viz. the substitution matrix used and the gap penalty employed. For amino acid sequences, standard substitution matrices (also known as scoring matrices) like Point Accepted Mutation (PAM) [6] and BLOcks SUbstitution Matrix (BLOSUM) [9] are applied. Nevertheless, there are no standard matrices for nucleotide sequences. Scoring matrices for nucleotides consist of match and mismatch scores among the nucleotide bases. Further, gap penalties are either linear or affine. Linear gap penalties impose a penalty proportional to gap length. Affine gap penalties, on the other hand, impose distinct penalties for introducing a gap (higher) and widening an open gap (lower). The extension penalty is taken proportional to gap length after the introduction of a gap. In practice, affine gap penalties are much more widely used than simple linear penalties as they combine the benefits of both a constant gap penalty and a linear penalty. More complex gap penalty models like convex gap penalties and profile-based gap penalties also exist, though they are not widely employed in global alignment. The significance of optimal alignment scores can be evaluated by its p-value that conveys the probability of the best possible alignment with the given optimal score that have occurred with random coincidence. In other words, if the optimal score is higher than all scores obtained in a random model, it can be said that the optimal alignment given by the alignment algorithm is statistically significant, and its pvalue will remain close to zero. The choice of the random model determines the p-value estimation process. There are a few techniques for generating the random model in question. For instance, a random sequence generator function can be used to create independent random sequences of a specified length without providing a template sequences input. Some random sequence generators preserve the G–C (Guanine–Cytosine) content or composition of a template which, is a real sequence from a sequence database [8, 24, 26, 27]. In contrast, others are based on more complex Markov models that are organism-specific or models that preserve dinucleotide frequencies and codon usage [3]. Independent random sequences of similar length can also be taken from real sequence databases. It is a recognized fact that the distribution of a large group of independent identically distributed random variables follows the Gaussian distribution. It is further substantiated that, for local sequence alignments containing gaps, the distribution of
Assessing the Statistical Significance of Pairwise Gapped …
59
the maximum of such random variables tends towards the extreme value distribution [13]. For local alignments of gapped sequences, some empirical evidence exists [1, 29] that the distribution of scores tends to follow the extreme value distribution (EVD), some studies further pointing out the Gumbel-type EVD as an approximate distribution for scores [11, 16, 17, 20, 22]. However, for global pairwise gapped nucleotide sequence alignments, no theoretical results for optimal score distributions are known even for the simplest random models and fixed sequence-specific alignment. Reich et al. [25] have studied the score distributions for global alignments of nucleotides using a static scoring model and zero-gap penalties. The authors used the Z-score, Monte Carlo techniques and a random model that generated independent sequences with individual nucleotide base frequencies around 25%. They found that the score distributions were close to the distributions of sequences randomly retrieved from a sequence database. However, the case for gapped global nucleotide alignments has not been considered in their work. In another study, Altschul and Erickson [3] used a random shuffling generator that preserved dinucleotide and codon usage and used the p-value for estimation of score distributions to demonstrate that the score distribution is marginally non-normal. However, the tail behaviour of the distribution was not expounded in their study. Monte Carlo techniques employ statistical analysis using continual random sampling and are generally used in resolving problems that can be construed in nomenclatures of probability. This study has generated the null model of alignment scores through random shuffling of the query sequence. By continuously sampling the query sequence randomly, the proposed research work has inferred a probabilistic explanation of the statistical significance of the alignment scores. The main concentration is given on affine gapped pairwise nucleotide global alignments and attempts to provide a distribution for the score significance of such alignments. It is worthwhile to note, however, that assessing the significance of alignment can only point us in the direction of a target exhibiting an exciting pattern and should not be taken as valid proof of biological significance [2, 15, 28]. Experimental studies in a wet-lab can make the only confirmation of biological relevance. Section 1 of this paper gives a brief background of the research done in this direction, followed by Sect. 2 that details the methodology used in the study. Section 3 discusses the results obtained, and Sect. 4 concludes the paper with comments on laying the groundwork for further extensions to the study.
2 Methods The algorithm for the global pairwise alignment of DNA sequences employed in our study is the Myers-Miller linear space alignment algorithm implemented in the EMBOSS 6.5.0.0 package for Windows (see ftp://emboss.open-bio.org/pub/EMB OSS/windows/, free and open-source). The EMBOSS program ‘stretcher’ implements the improvisation over standard Needleman–Wunsch and takes many parameters, some of which are the two sequences to be aligned, the scoring matrix file, the
60
R. Chaurasia and U. Ghose
gap open penalty, and the gap extension penalty. The user manual and documentation for this function can be found online at several websites (e.g. see https://bioinfo. ccs.usherbrooke.ca/cgi-bin/emboss/help/stretcher). The fixed-parameter set model is employed for alignment, wherein the standard scoring matrix with symmetric match and mismatch scores and default gap penalties as used by the online version of this tool (see https://www.ebi.ac.uk/Tools/psa/emboss_stretcher/) is applied for the pairwise global alignment. In a fixed-parameter set model, the model parameters are static quantities. This means that these values are not drawn from a random distribution. In our study, the model parameters, as described in section one, are the substitution matrix and affine gap penalties. However, the significance of alignment scores is studied by using the Monte Carlo methods, where the model parameters are required to remain fixed so that they do not bias the alignment score distributions unnecessarily. The scoring matrix used in this study specifies a match score of ‘5’ and a symmetric and uniform mismatch score of ‘−4’. These are the default scores used by the EMBOSS ‘stretcher’ program for bases A, T, G, and C (see the ‘EDNAFULL’ datafile section at https://bioinfo.ccs.usherbrooke.ca/cgi-bin/emboss/help/str etcher). The gap open penalty (the default value for nucleotides as per the EMBOSS ‘stretcher’ program) is set to 16, and the gap extension penalty (the default value for nucleotides as per the EMBOSS ‘stretcher’ program) is set to 4. Specific sequences of varied lengths are carefully selected from the National Center for Biotechnology Information (NCBI) database (see https://www.ncbi.nlm. nih.gov/nuccore) followed by a Basic Local Alignment Search Tool for Nucleotides (BLASTN) search [4] for highly similar sequences. The sequence pairs thus selected vary in length from less than 1 Kb (where Kb implies kilo basepair) up to 10 Kb. Each of these sequence pairs is globally aligned using the ‘stretcher’ command, and the optimal alignment score is noted along with other details. A brief representation of the selected sequence pairs is given in Table 1. In order to generate the random or null model, this research work has utilized the EMBOSS ‘shuffleseq’ (see https://emboss.bioinformatics.nl/cgi-bin/emboss/shu ffleseq) program that shuffles a template nucleotide sequence and generates random sequences based on the composition of the template. Each of these randomly generated sequences preserves the composition of the second sequence that was used as an input to the global alignment algorithm. The sequences from the null model are then aligned globally to the first sequence used as an input to the ‘stretcher’ program. Hereafter, the first sequence used as input in aligning via EMBOSS ‘stretcher’ will be referred to as the ‘target’ sequence, and the second sequence that is used to generate the random model will be referred to as the ‘query’ sequence. Three sets of random sequences are generated for each sequence pair. The first set contains 200 randomly shuffled sequences, the second set comprises 500 transposed sequences, and the third set holds 1000 such sequences. Lastly, the scores of alignments are remaining fit for a specific sequence pair to three types of extreme value distributions (EVDs) by estimating their shape, location, and scaling parameters. The p-value is then calculated from these parameter estimates using the cumulative distribution function [14] of the said EVDs. The three types
Assessing the Statistical Significance of Pairwise Gapped …
61
Table 1 A brief representation of selected sequence pairs for alignment S.No. Pair
Sequence Alignment Alignment Percent Percent Percent length length score identity similarity gaps (in base pairs, bp)
1
745 bp
NC_000079.6: 23,763,668 −23,764,412 (Mus musculus H1.1 linker histone)
765 bp
819
60.0
60.0
11.1
1032 bp
3678
84.3
84.3
3.5
1038 bp
1449
62.9
62.9
5.2
1167 bp
4743
90.4
90.4
1.4
1629 bp
-1665
35.1
35.1
54.8
2015 bp
5434
76.3
76.3
3.8
2963 bp
6004
68.7
68.7
18.4
NC_030679.2: 700 bp c135757942–135,758,641 (Xenopus tropicalis H1.3 linker histone) 2
3
4
5
6
7
NM_000523 (Homo sapiens homeobox D13)
1008 bp
NM_008275 (Mus musculus homeobox D13)
1020 bp
KC978991.1 (Faba bean necrotic stunt alphasatellite 1 isolate)
1001 bp
NC_038958.1 (Pea yellow dwarf alphasatellite 1 isolate)
1021 bp
NM_000522 (Homo sapiens homeobox A13)
1167 bp
NM_008264 (Mus musculus homeobox A13)
1151 bp
AF531299.1 (Homo sapiens histone H1
1620 bp
NC_000079.6: 23,763,668 −23,764,412 (Mus musculus H1.1 linker histone)
745 bp
NC_038298.1 (Bayou virus nucleocapsid)
1958 bp
KX066124.1 (Muleshoe hantavirus strain HV segment S)
1996 bp
NC_000006.12: 31,575,565–31,578,336 (Homo sapiens tumor necrosis factor)
2772 bp
M64087.1 (Equus caballus tumor necrosis factor-alpha)
2610 bp
(continued)
62
R. Chaurasia and U. Ghose
Table 1 (continued) S.No. Pair
Sequence Alignment Alignment Percent Percent Percent length length score identity similarity gaps (in base pairs, bp)
8
NC_026662.1 (Simian torque teno virus 31 isolate)
3907 bp
NC_014480.2 (Torque teno virus 2)
3322 bp
NC_026662.1 (Simian torque teno virus 31 isolate)
3907 bp
9
4004 bp
46
50.6
50.6
19.5
4084 bp
587
50.8
50.8
8.7
4981 bp
24,635
99.4
99.4
0.0
5835 bp
12,135
69.2
69.2
3.5
7257 bp
12,975
65.7
65.7
3.2
8368 bp
29,594
84.1
84.1
1.1
9039 bp
23,497
74.3
74.3
3.0
9611 bp
39,052
89.9
89.9
4.8
MH649256.1 3904 bp (Anelloviridae sp. Isolate) 10
NC_004764.2 (Budgerigar fledgling disease virus-1)
4981 bp
AB453162.1 (Budgerigar 4981 bp fledgling disease polyomavirus strain APV4) 11
12
13
14
15
NC_048296.1 (Bird’s-foot trefoil enamovirus 1 isolate)
5736 bp
KY985463.1 (Alfalfa enamovirus 2 isolate)
5729 bp
U31789.1 (Human papillomavirus type 48)
7100 bp
U31790.1 (Human papillomavirus type 50)
7184 bp
NC_001362.1 (Friend murine leukemia virus)
8323 bp
AB187565.1 (Murine leukemia virus graffi)
8319 bp
M10060.1 (Human T-lymphotropic virus 2)
8952 bp
Y14570.1 (Simian T-lymphotropic virus 2)
8855 bp
NC_001802.1 (Human immunodeficiency virus 1)
9181 bp
KT284376.1 (Human 9579 bp immunodeficiency virus 1 isolate from the USA) The selected sequences are taken from the NCBI nucleotide database (reference numbers are indicated). Note that sequence pair 10 has no gaps (ungapped alignment of same size sequences), and sequence pair 5 aligns sequences of different lengths. Percent identity and percent similarity in the case of nucleotides are equal and imply a ratio of the number of matching bases to the total alignment length. Percent gaps give the ratio of the number of gaps in the alignment to the total length of the alignment
Assessing the Statistical Significance of Pairwise Gapped …
63
of extreme value distributions that have been used in our study are the Generalized EVD, the Gumbel distribution (also known as the Type I EVD), and the Fréchet distribution (also known as the Type II EVD). Generalized EVD is a distribution that combines Type I, Type II, and Type III EVDs. Type III EVD is also known as the Weibull distribution and is not included in this study. All processing of data and its analysis has been programmed using the Python programming language.
3 Results and Discussion For each sequence pair listed in Table 1, nine curve plots are generated. Every sequence pair is fit to three types of EVDs for each of the three random sequence sets (denoted in the plots as RN = 200, RN = 500, and RN = 1000). The corresponding p-values are also saved to an excel sheet carrying other summary data, some of which are included in Table 1. Figure 1 shows all the nine plots for sequence pair 1. In each graphical plot, the optimal scores obtained by aligning every randomly generated sequence with the
Fig. 1 Curve plots for EVD fits for sequence pair 1. EDNAS54 signifies the substitution matrix file used for EMBOSS ‘stretcher’ program. RN indicates the random model size
64
R. Chaurasia and U. Ghose
query sequence are plotted against the number of alignments achieving that optimal score. The histogram of scores so obtained is then fit to a generalized EVD, a Gumbeltype EVD, and a Fréchet-type EVD. From Fig. 1, it is clearly evident that all three types of EVDs fit the data sufficiently. This means that the alignment score obtained for sequence pair one is statistically significant. Further, it may be inferred that for this sequence pair, the generalized EVD as well as type I and type II EVDs fit the data well. Table 2 shows the p-value estimates for each of these nine cases, for all sequence pairs. It can be discerned from this table that for sequences having alignment lengths up to 9 kb, and random model size of 200 randomly shuffled sequences, the data fit the generalized, Gumbel- as well as Fréchet-type EVDs. This suggests that the alignment scores obtained for all these pairs of sequences are highly statistically significant and closely follow the EVD distributions considered in this study. In other words, the probability of this random model score distribution being more significant than the optimal alignment score becomes extremely small. For random model sizes of 500 and 1000, only the last two pairs of sequences show that the score distribution follows the Gumbel-type EVD, whereas the remaining pairs fit all types of EVDs considered here, sufficiently. For brevity, the plots for a few notable cases are presented in the figures that follow. Since p-values less than 0.05 are considered statistically significant, all the estimates in Table 2 that are not in boldface are highly statistically significant. Further, a p-value of 0.0 is an indication of the null hypothesis (that the optimal alignment score was obtained by mere chance) being an impossibility. High alignment scores are generated as the sequence lengths increase, where sequence similarity is reasonable while aligning. As the optimal alignment scores increase, the random model optimal alignment scores shift farther away from the optimal maximum global pairwise score. For high alignment scores (around 30,000), the generalized EVD and Type II EVD (Fréchet) do not fit the data consistently. However, the Gumbel-type I EVD fits all cases satisfactorily regardless of the percent identity/similarity, percentage, gaps, alignment length, or sequence lengths. This may be so because the Gumbel-type EVD is known to hold for a wide variety of underlying densities [7], including the Normal or Gaussian distribution (which is the case for the null model here). Note that in Fig. 2, the target sequence is approximately twice in length than the query. Due to the low sequence similarity and high gap percentage, the optimal score is negative. Nevertheless, the maximal random scores are less than the optimal alignment score by a considerable distance, and the score is highly significant in the sense that it is almost nearly impossible that it could have been obtained by fluke. It is also worth noting that sequence pair 10 has the highest similarity of 99.4% among all other sequence pairs, and the optimal alignment contains no gaps. Figure 3 shows the curve plots for this case, and it is clear that the alignment score is highly significant in all cases, and the score distribution follows the EVD distribution faithfully. Further, looking at curve plots of sequence pair 15 in Fig. 4 tells us that the scores follow the Gumbel-type EVD sufficiently well, while the other two EVDs may or may not fit reliably.
0.0
0.0
0.0
1.18E−09
0.0
0.0
0.0
0.0
1.79E−10
0.0
0.0
0.0
0.0
0.075242874
2
3
4
5
6
7
8
9
10
11
12
13
14
15
TYPE II EVD
Gen EVD
0.0
0.0
0.0
0.0
0.0
0.0
4.97E−13
3.83E−13
0.0
0.0
4.9E−13
0.0
0.0
0.0 0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.057849298
0.0
0.0
0.0
0.0
4.96E−13 0.0
3.81E−13 0.0
0.0
0.0
4.63E−13 0.0
0.0
0.0
0.0
5.85409E−12 5.86E−12 0.0
TYPE I EVD
TYPE II EVD Gen EVD
0.0
0.0
0.0 0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.072291831
0.0
0.0
0.0
0.0
0.090770228
0.094655538
0.0
0.0
0.0
0.0
2.24809E−12 7.77167E−12 0.0
7.29306E−13 7.29417E−13 0.0
0.0
0.0
7.30527E−14 7.69385E−14 0.0
0.0
0.0
0.0
TYPE II EVD
0.0
2.22045E−16
0.0
5.02E−13
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.075284757
0.08820396
0.0
0.0
0.0
0.0
1.51201E−12 6.46594E−13
5E−13
0.0
0.0
2.25375E−14 1.63203E−14
0.0
0.0
0.0
2.93221E−12 2.92155E−12
TYPE I EVD
RN = 1000 (random model size = 1000)
8.41682E−12 3.95695E−12 0.0
TYPE I EVD
RN = 500 (random model size = 500)
The numbers in boldface indicate high p-values, and the corresponding result is not significant. Gen EVD = Generalized extreme value distribution, Type I EVD = Gumbel EVD, and Type II EVD = Fréchet EVD
0.0
1
Gen EVD
Pair RN = 200 (random model size = 200) number (refer Table 1)
Table 2 P-values estimates for each sequence pair listed in Table 1
Assessing the Statistical Significance of Pairwise Gapped … 65
66
R. Chaurasia and U. Ghose
Fig. 2 Curve plots for EVD fits for sequence pair 5. Pair 5 aligns two sequences of different lengths (the target being 1620 bp and the query being 745 bp long). Further, the optimal alignment score so obtained is negative, the percent similarity/identity is the least (35.1%), and the percentage gaps is the highest (54.8%) among the pairs of sequences selected for the study
4 Conclusion and Future Work The proposed research work has attempted to provide empirical evidence of the statistical significance of optimal alignment scores acquired through global pairwise alignment with gaps. Further, this research work has considered the fixed-parameter model wherein only a particular set of a predetermined scoring matrix, and default gap penalties are applied to all the sequence pairs for each random model. By applying the repeated random sampling technique (Monte Carlo methods), it has been attempted to assess the statistical significance of gapped global pairwise alignments for the case of nucleotides. The optimal alignment scores generated for small length sequences of the range 1–10 kb are found to be highly significant. Further, the statistical significance of such alignments tends to follow the extreme value distribution, as is evident from the p-value estimates obtained. However, as the alignment lengths increase beyond 9 kb (and as the alignment scores increase as a consequence of moderate to high
Assessing the Statistical Significance of Pairwise Gapped …
67
Fig. 3 Curve plots for sequence pair 10. The histogram is barely visible in this case owing to the very high globally optimal score obtained compared to the optimal scores of the null model
similarity), it is observed that the alignment scores closely abide by the Gumbel-type EVD reliably well. Therefore, it is reasonable to conclude that for longer sequences, the statistical significance of scores may be explained by the Type I extreme value distribution. Further, looking at all the plots shown in the figures in section three, it can be seen that the distributions are positively skewed with a right tail. However, these tails are thinner than the tails of the Gaussian distribution. The p-value calculated in this study considers the right tailed behaviour of the distribution and the very low p-values (that are not in boldface) clearly show that the probability of obtaining an optimal alignment score by chance will be almost negligible. In cases where the p-value turns out to be zero, this probability also reduces to zero, indicating that arriving at a score as extreme as the optimal, by fluke, is impossible. Regardless, further studies are necessary that scale up well to moderate-length and large sequences. In order to do away with the runtime overhead of generating and aligning the randomly shuffled sequences with the target to obtain the null model, other techniques that utilize a precompiled random model may be devised. Further, the results may be validated against a random sampling of real sequences of similar lengths as the target alignment. Complex random models employed by Altshul and
68
R. Chaurasia and U. Ghose
Fig. 4 Curve plots for sequence pair 15. For a random model size of 200 and 1000, the generalized EVD fails to fit the alignment score histogram. Likewise, the Fréchet EVD distribution does not fit the data for the null model size of 1000 shuffled sequences
Erickson [3] can also be explored to strengthen further the empirical analysis for global gapped pairwise nucleotide alignment significance. This study can be extended to include a multiple fixed-parameter set model in conjunction with multiple null models, and the results obtained may then be validated against a real sequence database search to provide robust empirical evidence of the statistical significance of alignment. Work in this direction is currently underway. Conflicts of Interest The authors declare no conflicts of interest.
References 1. S.F. Altschul, W. Gish, Local alignment statistics. Methods Enzymol. 266, 460–480 (1996). https://doi.org/10.1016/s0076-6879(96)66029-7 2. S.F. Altschul, M.S. Boguski, W. Gish, J.C. Wootton, Issues in searching molecular sequence databases. Nat. Genet. 6(2), 119–129 (1994). https://doi.org/10.1038/ng0294-119
Assessing the Statistical Significance of Pairwise Gapped …
69
3. S.F. Altschul, B.W. Erickson, Significance of nucleotide sequence alignments: a method for random sequence permutation that preserves dinucleotide and codon usage. Mol. Biol. Evol. 2(6), 526–538 (1985). https://doi.org/10.1093/oxfordjournals.molbev.a040370 4. BLAST Global Alignment (n.d.) Needleman-Wunsch Global Align Nucleotide Sequences. https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastSearch&PROG_DEF=blastn& BLAST_PROG_DEF=blastn&BLAST_SPEC=GlobalAln&LINK_LOC=BlastHomeLink 5. A. Chakraborty, S. Bandyopadhyay, FOGSAA: Fast Optimal Global Sequence Alignment Algorithm, 3 (2013). https://doi.org/10.1038/srep01746 6. M.O. Dayhoff, R. Schwartz, R.C. Orcutt, A model of Evolutionary Change in Proteins, in Atlas of Protein Sequence and Structure vol. 5, supplement third. Nat. Biomed. Res. (1978), pp. 345–358. ISBN 978–0–912466–07–1 7. R. Durbin, S. Eddy, A. Krogh, G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids (Cambridge University Press, Cambridge, 1998). https:// doi.org/10.1017/CBO9780511790492 8. EMBOSS Shuffleseq (n.d.). EMBOSS Shuffleseq Tool. https://emboss.bioinformatics.nl/cgibin/emboss/shuffleseq 9. S. Henikoff, J.G. Henikoff, Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. U.S.A. 89(22), 10915–10919 (1992). https://doi.org/10.1073/pnas.89.22.10915 10. D.S. Hirschberg, A linear space algorithm for computing maximal common subsequences. 18, 341–343 (1975). https://doi.org/10.1145/360825.360861 11. X. Huang, D.L. Brutlag, Dynamic use of multiple parameter sets in sequence alignment. Nucleic Acids Res. 35(2), 678–686 (2007). https://doi.org/10.1093/nar/gkl1063 12. T. Kahveci, V. Ramaswamy, H. Tao, T. Li, Approximate Global Alignment of Sequences. Approximate Global Alignment of Sequences (n.d.) https://doi.org/10.1109/bibe.2005.13 13. S. Karlin, S.F. Altschul, Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. U.S.A. 87(6), 2264–2268 (1990). https://doi.org/10.1073/pnas.87.6.2264 14. MATLAB Help Center (n.d.) Assessing the significance of an alignment. https://www.mathwo rks.com/help/bioinfo/examples/assessing-the-significance-of-an-alignment.html 15. A.Y. Mitrophanov, M. Borodovsky, Statistical significance in biological sequence analysis. Brief. Bioinform. 7(1), 2–24 (2006). https://doi.org/10.1093/bib/bbk001 16. R. Mott, Accurate formula for P-values of gapped local sequence and profile alignments. J. Mol. Biol. 300(3), 649–659 (2000). https://doi.org/10.1006/jmbi.2000.3875 17. R. Mott, Maximum-likelihood estimation of the statistical distribution of Smith-Waterman local sequence similarity scores. Bltn. Mathcal. Biol. 54, 59–75 (1992). https://doi.org/10.1007/BF0 2458620 18. E.W. Myers, W. Miller, Optimal alignments in linear space. Bioinformatics 4, 11–17 (1988) 19. S.B. Needleman, C.D. Wunsch, A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970). https://doi.org/10. 1016/0022-2836(70)90057-4 20. R. Olsen, R. Bundschuh, T. Hwa, Rapid assessment of extremal statistics for gapped local alignment. in Proceedings. International Conference on Intelligent Systems for Molecular Biology (1999) 211–222. 21. Pairwise Sequence Alignment (n.d.) EMBL-EBI Pairwise Sequence Alignment Tools. https:// www.ebi.ac.uk/Tools/psa/ 22. W. Pearson, Empirical statistical estimates for sequence similarity searches. J. Mol. Biol. 276(1), 71–84 (1998) 23. D.R. Powell, L. Allison, T.I. Dix, A versatile divide and conquer technique for optimal string alignment. 70, 127–139 (1999). https://doi.org/10.1016/s0020-0190(99)00053-8 24. Random DNA Sequence Generator (n.d.). https://www.faculty.ucr.edu/~mmaduro/random.htm 25. J.G. Reich, H. Drabsch, A. Däumler, On the statistical assessment of similarities in DNA sequences. Nucleic Acids Res. 12(13), 5529–5543 (1984). https://doi.org/10.1093/nar/12.13. 5529 26. RSAT (n.d.). Random Sequence Web Tool. https://rsat.sb-roscoff.fr/random-seq_form.cgi
70
R. Chaurasia and U. Ghose
27. SMS (n.d.). Sequence Manipulation Suite. https://www.bioinformatics.org/sms2/random_dna. html 28. M.S. Waterman, Mathematical Methods for DNA Sequences (CRC Press Inc., United States, 1989). 29. M.S. Waterman, M. Vingron, Rapid and accurate estimates of statistical significance for sequence database searches. Proc. Natl. Acad. Sci. U.S.A. 91(11), 4625–4628 (1994). https:// doi.org/10.1073/pnas.91.11.4625
Principal Integrant Analysis Based Liver Disease Prediction Using Machine Learning M. Shyamala Devi, Kamma Rahul, Ambati Aaryani Chowdary, Jampani Sai Monisha Chowday, and Satheesh Manubolu
Abstract Human liver is considered as the most essentially functioning body organ to aid the process of food digestion and remove the toxic substances. The later diagnosis of liver disease will lead to life-threatening condition that damages the life of an individual. Machine learning can be used to completely analyze the features and predict the severity of liver disease. This paper attempts to analyze each clinical features that influence the target with the following contributions. Firstly, the liver dataset from UCI machine repository is subjected with the data processing and cleansing. Secondly, the ANOVA test is applied to verify the features with PR(>F) < 0.05 that highly influences the target. Thirdly, the data is applied to principal component analysis and then fitted to all the classifiers to analyze the performance metrics. Fourth, the data is applied to linear discriminant analysis and then fitted to all the classifiers to analyze the performance. Experimental results shows that “total proteins” do not contribute to target, and passive aggressive classifier is found to provide the accuracy of 71% before and after feature scaling for both the principal component analysis and linear discriminant analysis. Keywords Machine learning · Classification · Accuracy and feature scaling
1 Introduction People are facing several disease due to the evolving contemporary lifestyle, food habits and adverse change in the environmental condition. Predicting the disease in early stage may help the patient to fight the critical condition. Several attempts are made to diagnose and predict the disease using machine learning models. Data mining finds hidden pattern information in the huge amount of medical data. [1] The disease can be predicted based on the patient symptoms and KNN, and CNN can be used for the accurate prediction of the diseases. This paper [2–4] develops medical M. Shyamala Devi (B) · K. Rahul · A. A. Chowdary · J. S. M. Chowday · S. Manubolu Department of Computer Science & Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_6
71
72
M. Shyamala Devi et al.
cost prediction along with the statistical machine learning model. The coefficient of determination in the prediction model (0.42) was higher than that of a traditional linear model (0.25). The review on [5–8] predictive models to perform chronic disease diagnosis is analyzed in this paper. It is analyzed that about 45% of studies have used SVM models, 23% of the studies have used K-nearest neighbor and Naïve Bayes models, 18% of studies have applied logistic regression, and 14% of studies have applied random forest for disease prediction [9, 10].
1.1 Disadvantages of the Existing Model The coefficient of determination in the prediction model (0.42) was higher than that of a traditional linear model (0.25). The review on [11, 12] predictive models that help to perform chronic disease diagnosis is analyzed in this paper. It is analyzed that about 45% of studies have used SVM models, 23% of the studies have used Knearest neighbor [13] and Naïve Bayes models, 18% of studies have applied logistic regression, and 14% of studies have applied random forest for the disease prediction [14]. In the existing models, the random forest classifiers and KNN attempt to predict the liver disease. The review shows that no attempts have been made to analyze the performance of all the classifiers with and without feature scaling [15]. Various diseases are diagnosed using neural networks to increase the speed of decision making, and it can lower the false positive rates. The accuracy of the machine learning algorithm will mainly depend on the quality of the dataset.
2 Overall Architecture 2.1 Dataset Preparation The Indian liver patient dataset from UCI machine repository has been utilized in this research work [https://archive.ics.uci.edu/ml/machine-learning-databases/ 00225/] with this link. The overall workflow is shown in Fig. 1. The paper’s contributions are given below. (i) (ii) (iii) (iv)
Firstly, the liver dataset is subjected to the data processing and cleansing. Secondly, the ANOVA test is applied to verify the features with PR(>F) < 0.05 that highly influence the target. Thirdly, the data is applied to principal component analysis and then fitted to all the classifiers before and after feature scaling to analyze the performance. Fourth, the data is applied to linear discriminant analysis and then fitted to all the classifiers and after feature scaling to analyze the performance.
Principal Integrant Analysis Based Liver Disease Prediction … Fig. 1 Overall architecture flow
73
Liver Data Set
Partition of dependent and independent attribute
Anova Test Analysis
Feature Scaling
PCA
LDA
Analysis of Precision, Recall, FScore, and Accuracy
Liver Disease Prediction
3 Feature Analysis 3.1 ANOVA Test Analysis ANOVA test is applied to dataset features, and results show that “total proteins” have value of PR(>F) > 0.05 and do not contribute to target as shown in Table 1. ANOVA test is used to analyze the features of the dataset by comparing both the null and alternate hypothesis. If the P value associated with the Fstatistic is less than 0.05, then the existence of that feature will highly influence the target.
3.2 Data Exploratory Analysis The correlation of each feature in the dataset is analyzed and is shown in Figs. 2 and 3. The correlation quantifies the relationship of the features present in the dataset. It shows the association level of two features and their contribution toward predicting the target. The associative level of two features can either be strong or weak. Here the pair (Gender, Total_Bilirubin) is a highly correlated pair toward target prediction.
74
M. Shyamala Devi et al.
Table 1 ANOVA test analysis with the dataset features Features
sum_sq
df
F
PR(>F)
Age
2.248032
1
11.17143
0.00088
Gender
0.8094
1
3.973363
0.04669
Total_Bilirubin
5.778375
1
29.60928
7.80E-08
Direct_Bilirubin
7.213982
1
37.43959
1.73E-09
Alkaline_Phosphotase
4.072429
1
20.55844
0.000007
Alamine_Aminotransferase
3.182228
1
15.94122
0.000074
Aspartate_Aminotransferase
2.750741
1
13.72864
0.000231
Total_Protiens
0.146043
1
0.712934
0.398819
Albumin
3.103722
1
15.53743
0.000091
Fig. 2 Dataset ınformation and kernel density estimation plot
3.3 Dimensionality Reduction with PCA and LDA Principal component analysis is a linear dimensionality reduction that uses singular value decomposition of features that reduces to minimized dataset. PCA decomposes multivariate dataset into a multiple sets of orthogonal components that forms the highest variance. The linear discriminant analysis fits the dataset into its conditional densities using Bayes’ rule within the linear decision boundary. ˙It assumes that entire class agrees to same covariance matrix and fits the Gaussian density to each class in the dataset. The dataset is turned into reduced dataset by projecting discriminative features using transform method. This paper uses both PCA and LDA to perform dimensionality reduction. The relationship between component and cumulative variance with PCA is shown in Fig. 4.
Principal Integrant Analysis Based Liver Disease Prediction …
Fig. 3 Dataset correlation ınformation
Fig. 4 PCA analysis—five components (left) before (right) after feature scaling
75
76
M. Shyamala Devi et al.
4 Results and Discussion 4.1 Implementation Setup The Indian liver dataset extracted from the UCI machine learning repository is used for implementation. The dataset consists of the features, namely age, gender, total bilirubin, direct bilirubin, alkaline phosphatase, alamine aminotransferase, aspartate aminotransferase, total proteins, albumin and target. The data preprocessing is done by label encoding the values of gender and missing values. The data cleansing is done after performing ANOVA test. ANOVA test is applied to dataset features, and results show that the Fstatistic value of “total proteins” greater than 0.05 and its existence does not influence the target. So the feature “total proteins” is eliminated before processing the data for classifier to perform the purpose of data cleansing. The correlation of all the features toward the target is done, and dimensionality reduction is applied. The dataset is split with 80:20 for training and testing dataset.
4.2 PCA Analysis Before and After Feature Scaling Feature scaling is done to normalize the values of independent variables in the dataset. It greatly affects the performance of any algorithm toward the target prediction. So we have implemented the dimensionality reduction before and after feature scaling in order to view the deviation in the performance metrics. Principal component analysis with five components is applied in the dataset, the five-component PCA reduced dataset is fitted with the classifiers like logistic regression, KNN, kernel SVM, decision tree, random forest, gradient boosting, AdaBoost, ridge, RidgeCV, SGD, passive aggressive and bagging classifier, and the performance metrics are shown in Figs. 5 and 6.
4.3 LDA Analysis Before and After Feature Scaling Linear discriminant analysis with five components is applied in the dataset, the five-component LDA reduced dataset is fitted with the classifiers like logistic regression, KNN, kernel SVM, decision tree, random forest, gradient boosting, AdaBoost, ridge, RidgeCV, SGD, passive aggressive and bagging classifier, and the performance metrics are shown in Figs. 7 and 8.
Principal Integrant Analysis Based Liver Disease Prediction …
Fig. 5 PCA dataset with classifier performance metrics analysis before feature scaling
Fig. 6 PCA dataset with classifier performance metrics analysis after feature scaling
77
78
M. Shyamala Devi et al.
Fig. 7 LDA dataset with classifier performance metrics analysis after feature scaling
Fig. 8 LDA dataset with classifier performance metrics analysis after feature scaling
Principal Integrant Analysis Based Liver Disease Prediction …
79
5 Conclusion This paper attempts to explore the feature analysis of the dataset by interpreting the relationship of feature with each other. The data correlation matrix is extracted to identify the highly correlated dataset pair toward the target prediction. The dimensionality reduction of the dataset is done with principal component analysis and linear discriminant analysis. The PCA reduced dataset and LDA reduced dataset are fitted with the classifiers, namely logistic regression, KNN, kernel SVM, decision tree, random forest, gradient boosting, AdaBoost, ridge, RidgeCV, SGD, passive aggressive and bagging classifier before and after feature scaling. Experimental result shows that the PCA reduced dataset with random forest has 74% accuracy before feature scaling, and passive aggressive classifier has 71% accuracy after feature scaling. Experimental result shows that LDA reduced dataset with gradient boosting and passive aggressive classifier has 68% accuracy before and after feature scaling. It is observed that passive aggressive classifier is projecting the nominal accuracy for predicting the target variable.
References 1. D. Dahiwade, G. Patle, E. Meshram, Designing disease prediction model using machine learning approach, in 3rd International Conference on Computing Methodologies and Communication (ICCMC) (Erode, 2019), pp. 1211–1215 2. T. Takeshima, S. Keino, R. Aoki, T. Matsui, K. Iwasaki, Development of medical cost prediction model based on statistical machine learning using health ınsurance claims data. Res. Meth. Model. Meth. 21, 97 (2018) 3. G. Battineni, G.G. Sagaro, N. Chinatalapudi, F. Amenta, Applications of machine learning predictive models in the chronic disease diagnosis. J. Pers. Med. 10(2), 21 (2020). https://doi. org/10.3390/jpm10020021 4. M. Chen, Y. Hao, K. Hwang, L. Wang, Disease prediction by machine learning over big data from healthcare communities. IEEE Access 5, 8869–8879 (2017). https://doi.org/10.1109/ACC ESS.2017.2694446 5. K. Shailaja, B. Seetharamulu, M.A. Jabbar, Machine learning in healthcare: a review, in Second International Conference on Electronics, Communication and Aerospace Technology (ICECA) (Coimbatore, 2018), pp. 910–914. https://doi.org/10.1109/ICECA.2018.8474918 6. R. Bhardwaj, A.R. Nambiar, D. Dutta, A study of machine learning in healthcare, in IEEE 41st Annual Computer Software and Applications Conference (COMPSAC), (Turin, 2017), pp. 236–241, https://doi.org/10.1109/COMPSAC.2017.164 7. T. Le Nguyen, T.H. Do, Artificial ıntelligence in healthcare: a new technology benefit for both patients and doctors, in International Conference on Management of Engineering and Technology (Portland, 2019), pp.1–15, https://doi.org/10.23919/PICMET.2019.8893884 8. B. Nithya, V. Ilango, Predictive analytics in health care using machine learning tools and techniques. in International Conference on Intelligent Computing and Control Systems (ICICCS) (Madueai, 2017), pp. 492–499, https://doi.org/10.1109/ICCONS.2017.8250771 9. T. Vijayakumar, Neural network analysis for tumor investigation and cancer prediction. J. Electron. 1(02), 89–98 (2019) 10. S. Shakya, Analysis of artificial intelligence based image classification techniques. J. Innov. Image Process. (JIIP) 2(01), 44–54 (2020)
80
M. Shyamala Devi et al.
11. Koresh, H. James Deva, Computer vision on based traffic sign sensıng for smart transport. J. Innov. Image Process. (JIIP) 1(01), 11–19 12. A. Bashar, Survey on evolving deep learning neural network architectures. J. Artif. Intell. 1(02), 73–82 13. M. Duraipandian, Performance evaluation of routing algorithm for manet based on the machine learning techniques. J. Trends Comput. Sci. Smart Technol. (TCSST) 1(01), 25–38 (2019) 14. S. Subarna, Analysis of artificial ıntelligence based ımage classification techniques. J. Innov. Image Process. 44–54 (2020) 15. V. Suma, Computer vision for human-machine interaction-review. J. Trends Comput. Sci. Smart Technol. 1(02), 131–139 (2019)
Classification of Indian Classical Dance 3D Point Cloud Data Using Geometric Deep Learning Ashwini Dayanand Naik and M. Supriya
Abstract Indian classical dances have many unique postures that require to be identified and classified correctly. Though many classification techniques exist for twodimensional dance images, there is a need to classify the three-dimensional images as it is still on the evolving side. Geometric deep learning is one of the growing fields in machine learning and deep learning. It enables to learn from complex type of data represented in the form of graphs and 3D objects (manifolds). Deep learning algorithms like convolution neural networks (CNN) and recurrent neural networks (RNN) have achieved higher performance on the broad range of problems. Using these algorithms, one can also classify the images. Deep learning algorithm works well for Euclidean data such as points, lines, and planes. CNN cannot be implemented on the non-Euclidean data such as graphs and 3D object (manifolds), and thus, neural network architecture that can learn from non-Euclidean data is required. In the proposed work, implementation of geometric deep learning is done on 3D image data represented as point cloud. PointNet architecture will work efficiently with point cloud data. This architecture has been used to classify Indian classical dance point cloud data into five dance forms, namely Bharatanatyam, Odissi, Kathak, Kathakali, and Yakshagana. Keywords Geometric deep learning · Point cloud data · Indian classical dance forms · Euclidean and non-Euclidean data
1 Introduction Amalgamation of many cultures can be found in India, and it includes language, dance, music, religion, and food. Indian classical dance is one of the ancient arts, A. D. Naik (B) · M. Supriya Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Bengaluru, India M. Supriya e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_7
81
82
A. D. Naik and M. Supriya
which has unique features such as hand mudra, leg postures, and costumes each of which are different for each classical dance form. According to natya shastra, one of the ancient literatures on dance estimates the presence of dance in India between 500 BCE and 500 CE. There are 11 classical dance forms in India, namely Bharatanatyam and Bhagavatha Mela from Tamil Nadu, Kathakali from Kerala, Manipuri from Manipur, Kathak from Uttar Pradesh, Kuchipudi from Andhra Pradesh, Odissi from Odisha, Mohiniyattam from Kerala, Sattriya from Assam, Yakshagana from Karnataka and Chhau from Eastern India. Each Indian classical dance expresses cultural richness of each state. The person who is not aware of dance cannot identify the difference in dance postures and hand mudras. If one is interested to learn the basics and technicalities of Indian classical dance forms, they can find it from many online sites who offer such training. Such type of learning to understand the basics can now happen through many digitized platforms too. The proposed work focusses on one such approach to enable the learning of digital images represented in 3D format, which brings out the third dimension to the standard 2D images and enables a higher learning accuracy. For classifying the images, machine learning [1] and deep learning [2] are the appropriate approaches. Many approaches have been proposed in the literature to classify Indian classical dance images. The main focus of such models is to consider two-dimensional image data which is fed as input to the convolution neural network (CNN) [3]. Current work includes classifying Indian classical dance forms represented as three-dimensional point cloud data using geometric deep learning. Convolution neural network is only suitable for image data which is represented as Euclidean data. Convolution is done by sliding the filter over the input at every location providing a highly accurate system. If the image is curved, it becomes as a 3D object. Convolving around 3D shapes by vector like filter is not possible. So, the geometric deep learning is the solution for the problem. Here, the non-Euclidean data such as 3D object point cloud data is considered as the input. Representation of set of points in 3D by x, y, z location is called point cloud. Here, dance form 3D point cloud data is fed to the network to classify the point cloud with label. To classify the point cloud data, PointNet architecture is used. PointNet architecture directly takes the 3D point cloud data without transforming them into other forms. There are other implementations, which makes use of mesh, volumetric and RGB(D) data. Advantage of using point cloud data among these is that it makes use of raw sensor data and it is canonical feature.
2 Related Works In recent years, many research works are performed in the field of image processing, machine learning, and deep learning. Ankita et. al. have made an attempt to recognize Indian classical dance from the videos [4]. Training has been done using support vector machine (SVM). The work has been trained on the dataset consisting of 211 videos that yield an accuracy of 75.83%. CNN architecture has been proposed to
Classification of Indian Classical Dance 3D Point Cloud Data …
83
classify Bharatanatyam mudra and poses in the work proposed in [5]. Here, the dataset consists of 200 different mudras and provides an accuracy of 89.92%. Pose descriptor such as histogram of oriented optical flow (HOOF) [6] has also been used to represent the dance video frames. Online dictionary learning technique is used to learn pose basis using SVM which classifies the videos into three different dance forms-Bharatanatyam, Kathak, and Kathakali. The accuracy of 86.67% has been achieved by this approach. Many studies have been conducted on action recognition on three-dimensional objects using geometric-based methods. The work by Yui et al. has introduced the special form of the data such as graphs and manifolds [7] which is also called as nonEuclidean data. Human manifolds are used for action recognition. Discriminating information can be extracted between actions by manifold charting. Singular value decomposition (SVD) is used for factorizing the data tensors which are projected on a tangent space. After computing the tangent bundle’s intrinsic distance, actions can be classified. Implementation of geometric deep learning on non-Euclidean data is briefly explained in the work proposed in [8]. Deep learning does well at extracting features from Euclidean data. Since graphs and manifolds come under non-Euclidean data, normal convolution approach cannot be implemented, but by generalizing the convolution operation, it can be achieved. Hence, the proposed model uses deep learning for non-Euclidean domains by making use of spatial-domain feature. Three-dimensional surface data of an object can be represented using point clouds with x-, y-, z-coordinates. There are few works which have implemented deep learning on point clouds. Instead of normal CNN convolution technique, edge convolution is involved in point cloud world [9]. Edge convolution (EdgeConv) has many properties, such as incorporating local neighborhood information and learning global shape properties. Here, the architecture is tested on the ModelNet40 and S3DIS 3D datasets. Graph convolution neural network (GCNN) for 3D point cloud classification is implemented in the work proposed in [10]. GCNN is the extension of traditional CNN, where in GCNN data used is in graph form. In this work, graphCNN is used for classifying 3D point cloud data of an object. Model implemented the combination of localized graph convolutions with two types of graph pooling methods (to downsize the graph). Graph-CNN can explore the point cloud data. This model achieves competitive performance on the 3D object classification on ModelNet dataset. Ernesto Brau et.al. have implemented deep CNN to estimate the 3D human pose which learns from 2D joint annotations [11]. This architecture uses traditional way, but the output layer will project the predicted 3D joints on the 2D. It enforces pose constraints using an independently trained network that learns a prior distribution over 3D poses. An approach is made to recognize the human action using human skeletal information [12] which considers the skeletal angular representation. The scale of the actor is identified and also correlation among different body parts is maintained. A neural network using the point clouds has been developed by Charles et al. in [13]. This model called PointNet has been designed such that it provides a unified architecture for variety of applications such as classification of objects, semantic parsing, and segmentation. Main challenge addressed in this work is to feed the
84
A. D. Naik and M. Supriya
unordered point cloud data to the network which has the features such as permutation invariance and rotation invariance. PointNet is tested on ModelNet40 [14, 15] dataset. Literature summary is shown in the Table 1. From the survey, it could be concluded that most of the work uses two-dimensional Indian classical dance image data. There is need of three-dimensional data to be trained in the emerging future applications, and hence, geometric deep learning is the possible option. Considering three-dimensional point cloud data with PointNet architecture is more effective as there is no need of conversion of data to other form. Such a model can be used to recognize the dance images and their varied forms more precisely yielding high classification accuracy.
3 Proposed Methodology As described in the previous section, the model with PointNet architecture uses the 3D point cloud data of Indian classical dance forms and classifies into five labels, namely Bharatanatyam, Odissi, Kathak, Kathakali, and Yakshagana. Existing PointNet architecture is tested on ModelNet dataset. This dataset holds Object File Format (OFF) format files which basically carries 3D point cloud data of different categories. Since ICD point cloud data is not available, it has to be generated following certain procedure. Images are extracted from the internet. Point cloud dataset has to be generated by using 3D builder and MeshLab software. 3D builder is used to change the 2D image into 3D Polygon File Format (PLY) format. 3D data is required in the OFF Object File Format (OFF) format, and PLY data has to be converted into OFF format using MeshLab. OFF data of all the five dance forms are fed to the PointNet architecture to classify them into different dance labels. As the 3D point cloud data of each dance form will have large points, it is better to downsize the point data to lesser points to increase the performance. In this work, each dance point cloud data is fixed to 2048 points. PointNet architecture is used in the proposed work to train and classify 3D point cloud data of Indian classical dance (ICD) images. The Indian classical dance point cloud dataset used in this proposed model has five dance forms each consisting of 40 point cloud OFF files, and each folder is split into train and test data folder in 80:20 ratio. Here, training dataset contains a 30 OFF files and testing dataset contains 10 OFF files. Figure 1 shows few samples from the ICD point cloud data in object file format (OFF). PointNet architecture is fed with ICD point cloud dataset. Workflow of the proposed model is presented in Figs. 2. and 3. As described earlier, 3D point cloud data with x-, y-, z-coordinates (where z is the depth information) is fed to the PointNet model. Basically, PointNet consists of multilayer perceptron (MLP), max pooling layer and fully connected layers with ReLu and batch normalization. Figure 4 shows the basic architecture of PointNet. Architecture resolves two challenges. First one is, unordered pointset is considered as input which means model must be invariant to the N! Permutations. N orderless
CNN
pose descriptor based on histogram of oriented Dataset: ICD 86.67% optical flow (HOOF), SVM classifier
Examine a standard manifold charting and some alternative chartings on special manifolds, particularly, the special orthogonal group, Stiefel manifolds, and Grassmann manifolds
Proposed a unified framework allowing to generalize CNN architectures to non-Euclidean domains (graphs and manifolds)
Point Cloud Segmentation using proposed Neural Network, EdgeConv used
Graph convolutional neural networks, graph signal processing, 3D point cloud, Supervised Learning, pointGCN
Deep CNN for 3D Human pose estimation
Reference [5]
Reference [6]
Reference [7]
Reference [8]
Reference [9]
Reference [10]
Reference [11]
Dataset: Human3.6 M Walking:70.2% Jogging:79.7%
Dataset: ModelNet40 89.51%
Dataset: ShapeNet 84.1%
Dataset: Cora,PubMed MoNet-81.69% for Cora, 78.81% for PubMed
Dataset: Cambridge gesture, the UMD Keck body gesture, and the UCF sport datasets
Dataset: ICD offline and online Images 89.92%
Dataset: ICD Video frames Bharathanatyam-65.78%, Manipuri-71.42%
DCNN, Optical Flow, SVM classifier
Reference [4]
Accuracy achieved
Methods used
Papers
Table 1 Literature summary Observations
(continued)
3D human pose is learned from 2D joint annotations
Point cloud data are converted to graph forms before it is fed to the network
Point cloud data are converted to graph forms before it is fed to the network
–
–
Suitable for 2D image data
Suitable for 2D image data
Suitable for 2D image data and Accuracy is less
Classification of Indian Classical Dance 3D Point Cloud Data … 85
PointNet architecture uses 3D point cloud data Dataset used:ModelNet for 3D Classification and Segmentation 89.2%
Reference [13]
Dataset: Taiji dataset 80%
Dynamic Time Warping (DTW) as a template matching solution is applied to do the action classification task, Skeleton tracking
Reference [12]
Accuracy achieved
Methods used
Papers
Table 1 (continued) Observations
Unordered 3D point cloud data which has got permutation invariance and rotation invariance property is fed to the network
3D skeletal joint information is fed to the network. Model is not considering the surface data
86 A. D. Naik and M. Supriya
Classification of Indian Classical Dance 3D Point Cloud Data …
87
Fig. 1 ICD point cloud dataset
points each represented by D-dimensional vector are called a point cloud data. The second challenge is, the model must be invariant under geometric transformation. Hence, the input is aligned by transformer network (T-Net) which transforms the input data points. Multilayer perceptron is shared across each input point. Now, each point is embedded to 64-dimensional space. Feature transformer is done, and the points are converted to 1024 embedding space. Next step is to aggregate all the points in larger dimensional space which is done with the help of max pooling function.
4 Implementation Below are the steps involved in implementing ICD 3D point cloud classification using PointNet architecture: 1.
Download the images of five different dance forms from the Internet.
88
A. D. Naik and M. Supriya
Fig. 2 PointNet architecture implementation on ICD point cloud dataset
Fig. 3 3D data format conversion and classifying ICD point cloud data
2.
3.
Convert 2D image to 3D point cloud data (.ply file format); it has vertices and faces with x-, y-, z-coordinates values. 2D image can be converted to 3D ply file using 3D builder software. Data format of Polygon File Format (PLY) of ICD is shown in Figs. 5 and 6. Using MeshLab, convert .ply to .off file format. Figure 7 shows the OFF file in MeshLab after conversion. Figure 8 shows the OFF file vertices and faces.
Classification of Indian Classical Dance 3D Point Cloud Data …
Fig. 4 Basic architecture of PointNet Fig. 5 Sample PLY file of Yakshagana with 9012 vertices and 19,724 faces
Fig. 6 Ply file showing x, y, z coordinates
89
90
A. D. Naik and M. Supriya
Fig. 7 Sample OFF file of Yakshagana with 9012 vertices and 19,724 faces
4. 5. 6. 7. 8.
9. 10.
Data folder contains all the five dance classes with train and test folder containing OFF files of each dance forms. Each OFF file contains large number of points. Only 2048 locations or points are to be sampled. Each point cloud data is converted to numpy array. Set the number of points to sample to 2048 and batch size to 16. Now feed the OFF data to the PointNet architecture. Now next step is to build the model. Each convolution and fully connected layer consist of convolution, dense, batch normalization, ReLU activation. As PointNet consists of two core components. One is multilayer perceptron (MLP). Other one is transformer network for input and features transformation Train the model with learning rate 0.001 and test the result. Table 2 shows the training and validation result for number of epochs. Now visualize the result. Figure 9. shows the prediction result.
5 Conclusion and Future Work The proposed work uses geometric deep learning approach and PointNet architecture to classify ICD 3D point cloud data. The proposed work achieved the training accuracy of 96.64% and validation accuracy of 73% for 250 epochs. Model can classify
Classification of Indian Classical Dance 3D Point Cloud Data …
91
Fig. 8 OFF file
Table 2 Training and validation result of ICD point cloud data Epochs Train_loss Valid_loss sparse_categorical_accuracy val_sparse_categorical_accuracy 1
2.0038
2.0868
0.4497
0.3933
20
2.1127
88.0075
0.4295
0.2921
50
1.3189
1.8251
0.7248
0.6966
100
1.1898
1.8205
0.7517
0.5843
150
0.8927
1.2007
0.8725
0.7865
200
0.7566
2.0571
0.8993
0.7191
250
0.6963
1.7391
0.9664
0.7303
more accurately by using the surface data which is captured by 3D Light Detection and Ranging (LiDAR) laser scanners. This approach can be extended to classify all the dance forms available in India. In the present digital era, this approach will help the students and the dance gurus to provide dance education in a more precise way, even though they are not able to visit the dance center. However, the training set has to be increased with more of the dance images with varied postures. The training and classification can also be performed on the varied postures and their attires too. The
92
A. D. Naik and M. Supriya
Fig. 9 Snapshot of visualized result of ICD 3D point clouds predictions and the label
model can also be fine-tuned to yield a higher validation accuracy with increasing the number of epochs or adding more images for the training and testing.
References 1. S. Pandey., M. Supriya, A. Shrivastava, Data classification using machine learning approach, ˙ ˙ in 3rd International Symposium on Intelligent System Technologies and Application, vol. 683 (2018), pp. 112–122 2. C. Sudarshana Tamuly, Jyotsna, J. Amudha,. Deep learning model for ımage classification, in International Conference On Computational Vision and Bio Inspired Computing (2019) 3. C. Sudarshana Tamuly, Jyotsna, J. Amudha, Effective spam image classification using CNN and transfer learning.in 3rd International Conference on Computational Vision and Bio Inspired Computing (2019) 4. B. Ankita, B. Riya, S. Goutam, S. Pushkar, R. Balasubramanian, Indian dance form recognition from videos. in 13th International Conference on Signal-Image Technology & Internet-Based systems (2017) 5. P.V.V. Kishore, K.V.V. Kumar, E. Kiran Kumar, AS.C.S. Sastry, M. Teja Kiran, D. Anil Kumar, M.V.D. Prasad. Indian Classical Dance Action Identification and Classification with Convolutional Neural Networks (Hindawi, 2018)
Classification of Indian Classical Dance 3D Point Cloud Data …
93
6. S. Soumitra, P. Pulak, C. Bhabatosh, Indian Classical Dance classification by learning dance pose bases. in IEEE Workshop on the Applications of Computer Vision (2012) 7. Y.M. Lui. Tangent bundles on special manifolds for action recgnition. IEEE Trans. Circu. Syst. Video Technol. 22(6), 930–942 (2012) 8. F. Monti, D. Boscaini, J. Masci, E. Rodola, J. Svoboda, M. Bronstein, Geometric deep learning on graphs and manifolds using mixture model CNNs. in IEEE Conference on Computer Visiona and Pattern Recognition (2017), pp. 5115–5124 9. Y. Wang, Y. Sun, Z. Lui, S.E. Sharma, Dynamic Graph CNN for Learning on Point Clouds. (Researchgate, 2018) 10. Y. Zhang, M. Rabbat, A Graph-CNN for 3D point cloud classification. in IEEE International Conference on Acoustics,Speech and Signal Processing (2018) 11. E. Brau, J. Hao, 3D Human Pose Estimation via Deep Learning from 2D Annotations. in IEEE fourth International Conference on 3D Vision 16544819 (2016). 12. H.-M. Zhu, C.-M. Pun, Human action recognition with skeletal ınformation from depth camera. in IEEE International Conference on Information and Automation (2013) 13. C.R. Qi, H. Su, K. Mo, L.J. Guibas, Pointnet: deep learning on point sets for 3D classification and segmentation. in IEEE Conference on Computer Vision and Pattern Recognition (2017) 14. Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, 3D ShapeNets: a deep representation for volumetric shapes. in IEEE Conference on Computer Vision and Pattern Recognition (2015) 15. ModelNet Dataset, Available: https://modelnet.cs.princeton.edu/
Fire Detection by Parallel Classification of Fire and Smoke Using Convolutional Neural Network A. Robert Singh, Suganya Athisayamani, S. Sankara Narayanan, and S. Dhanasekaran
Abstract Fire detection is considered as a part of remote surveillance in domestic, industrial and the areas that are not approachable by human like deep forests. In this paper, convolutional neural network (CNN) is used to detect fire by classifying both fire and smoke in videos. A sequence of 2D convolutional layers and max pool layers is used to convert the video frames into feature maps with lower rank. The neural network is trained with the videos containing both fire and smoke. The videos with either fire or smoke or both are tested for fire detection with the FIRESENSE and other such open-source databases. The results show that the proposed method can classify the fire, smoke and fire with smoke with a recognition rate of up to 94%, 95% and 93%, respectively. Keywords CNN · Fire detection · Smoke detection · Classification
1 Introduction Vision processing is an emerging area that provides lot of efficient real-time applications [1, 2]. Fire detection is considered as an open problem that is important to address the challenges associated with timely detection and control. Such an example is the recent Australian bush fire [3]. Today the deep forests are also monitored by using surveillance camera. Similarly, the industries like offshore oil rig are also vulnerable to fire accidents that can lead to severe damage for lives and assets. In these occasions, the fast and accurate fire detection will be more helpful. A. Robert Singh · S. Dhanasekaran School of Computing, Kalasalingam Academy of Research and Education, Anand Nagar, Srivilliputhur, Tamil Nadu, India S. Athisayamani (B) School of Computing, Sastra Deemed to be University, Thanjavur, Tamil Nadu, India S. Sankara Narayanan Department of Computer Science and Engineering, VelTech Rangarajan Dr. Sahunthala R&D Institute of Science and Technology, Chennai, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_8
95
96
A. R. Singh et al.
Table 1 Sample frames for the three classes Class 1: Fire only
Class 2: Smoke only
Class 3: Fire and Smoke
There are three ways available for analysing and detecting the fire: hardwarebased sensor systems, basic image processing methods and vision processing with evolutionary methods. In general, the vision processing methods for fire detection are using two ways: localizing fire and localizing smoke, where most of the fire accidents start with smoke. In this case, the fire localization methods fail to detect the fire. On the other hand, the fire due to electric and oil leakages are starting with a small amount of fire and blast to a big size. In this case, the smoke detecting methods fail to perform the task. These problems can be addressed by localizing both fire and smoke. This paper considers the important features for performing feature extraction of fire and smoke which are RGB intensity values, regional entropy, dynamicity and irregularity. In this paper, these features are used to classify the fire and smoke video frames. A parallel classification method is proposed to find the three labels like fire, smoke and normal video frames. In this paper, CNN is designed to classify three classes of videos with fire only, smoke only and fire with smoke. Some sample video frames for the three classes are shown in Table.1 taken from the FIRESENSE database [4].
2 Related Works A good number of literature are available for fire detection using hardware sensing and image processing methods as well as evolutionary methods. The hardware sensing methods is analysing the sensor data. Chen et al. [5] proposed a fire detection method with smoke and gas sensor. This will detect the fire if the amount of smoke and the gases like CO2 and CO crosses a threshold value. This will take more time in the case of high flame with less smoke. Tang et al. [6] have proposed another hardware-based remote alarm system to alert about fire accidents in remote area by using a GSM unit. This method fails to detect the smoke. Ono et al. [7] proposed a method to find fire in expressway tunnel. The fire flame region is automatically
Fire Detection by Parallel Classification of Fire and Smoke …
97
segmented and used for training a neural network. The method proposes an optimal location for placing the CCTV camera to cover a maximum area for fire surveillance. This method uses image processing methodologies to detect fire and depending on the result, the concerned officials are notified. Celik et al. [8] proposed a color image processing method to segment the fire areas in the image. This method used YCbCr color model to detect the fire pixels in the video. The Cb and Cr components are used to filter the pixels using thresholding. The method recorded 31% false alarm which is a high rate for emergency remote surveillance. Marbach et al. [9] proposed an image processing based fire detection method in video frames. In this paper, the pixels in the video frames are divided into three groups: active pixels, saturated pixels and fire pixels. These pixels are used to find the fire pattern using luminance and chrominance sensitivity. Dhanujalakshmi et al. [10] proposed a remote fire detection method using Raspberry Pi by image processing method. This method uses a local thresholding to localize the fire pixels. The segmentation of fire in the video frame is proposed for notifying through WhatsApp. Li et al. [11] proposed a CNN-based fire detection on images. In this paper, the CNN architectures like faster-RCNN, R-FCN, single short multibox detector (SSD) and YoLo v3 are proposed for fire detection. Except SSD, the other architectures can detect either fire or smoke. All the above discussed methods are well designed to detect either smoke or fire. In this paper, the problem of detecting both fire and smoke is addressed by having three classes in CNN-based classification that can simultaneously detect the smoke and fire.
3 Proposed CNN for Fire and Smoke Detection The preprocessing consists of segmentation of background and foreground in the video frames. The robust orthonormal subspace learning method [12] is used for background subtraction. Let V be the 4D matrix that denotes the video frames, B be the background matrix and F be the foreground matrix that are relatively lower-order than V. The foreground and background images are calculated with the objective of minimal coefficient factor as given in Eq. (1). min xr −1 + α M1
S,x,M
(1)
subject to, S.x + M = F and B = S.x where, S = {S1 , S2 , . . . Sk } ∈ R is the orthonormal subspace, x = {x1 , x2 , . . . xk } ∈ R, k is the dimension of subspace. This is the first step in the training and testing
98
A. R. Singh et al.
Input 256x256x 3
Convolutional 2D layer
reluLayer
Maxpooling 2D layer Classification layer
Softmax Layer
Fullyconnected layer
Fig. 1 Architecture of proposed CNN
processes. The proposed architecture has seven different layers as shown in Fig. 1. The input layer resizes the incoming frame into 256 × 256 × 3 sized frame. The convolutional layer is obtained by mapping two pixels from the input image into one pixel. This pixel is obtained by weighted average of every two neighbor pixels. This function is given in Eq. (2). f (x, y) = w ×
I (x, y) + I (x, y + 1) 2
(2)
where f (x, y) is the pixel in the convolutional layer, w is the weight and I(x, y) is the pixel in input layer in gray scale. Thus, the output of convolutional layer will be of size 128 × 128 pixels. The dimensions of the sample in each convolutional layer are shown in Fig. 2. Feature maps are calculated by mapping the dynamic features through temporal direction. The feature maps identified in the first 2D convolutional layer are shown in Fig. 3. These features are obtained with different weight values ranging between [0, 1]. The same weight values are applied in further convolutional layers also. ReLU layer is the linear rectifier layer, which is a part of convolutional layer. This enhances the feature map generated by the convolutional layer. In this paper, white balance in the greyscale image is applied. This assigns positive value to the white components and negative value to the black components. Thus, the fire features and smoke features are highlighted. The output of ReLU layer with the first 2D convolutional layer is shown in Fig. 4.
Pooling layer 1 Pooling layer 2
Input layer 256 x 256 x 3
Class 1 Fully Fully connected connected layer 1 layer 2 Te xt
Convolutional layer 1 128 x 128
Te xt
Convolutional layer 2 64 x 64
Fig. 2 Conversion of the input image through convolutional layers and max pool layers
Class 2
Class 3
Fire Detection by Parallel Classification of Fire and Smoke …
99
Fig. 3 a Input video frame, b feature maps in convolutional layer 1
Fig. 4 a Original video frame b One of the output feature map of ReLU layer
Each convolution layer is followed by a max pooling layer. Here, each frame is divided into number of sub-blocks. The maximum of each sub-block is used to replace it as a single value as given in Eq. (3). For a block of size n × n, the max pooled value is the maximum convoluted pixel value (g). (x, y) =
max g(i, j) 1≤i ≤n 1≤ j ≤n
(3)
Thus, the set of convolutional layer, ReLU layer and max pool layer is repeatedly applied until a 1-D feature map is obtained. The final layer is the fully connected layer that applies flattening on the matrix to convert into 1-D vector. The order of
100
A. R. Singh et al.
the output of the fully connected layer is equal to the columns equal to the number of columns obtained at the final max pooling layer.
4 Training and Testing The proposed architecture is further referred as smokiFi-CNN throughout the experiment. Three different datasets [4, 12, 13] are used for training and testing the smokiFi-CNN. The confusion matrix obtained for the classification is given in Fig. 5. In overall, 80% of the images in all the three datasets were used for training and 20% of the images were used to test the CNN. The logarithmic scale of learning rate is compared with the loss, and the result shows that the local maxima is the point where the CNN learning starts and the global minima is the point when the CNN start to over fit as shown in Fig. 6. Here, the local maxima is 0.91, and global minima is 0.21. Number of epochs is an important factor for training and testing that represents the total number of times the dataset is trained forward and backward. The accuracy and loss obtained in both training and testing (validating) are compared against different number of epochs.
Fig. 5 Confusion matrix for smokiFi-CNN
Fig. 6 Comparison of learning rate versus loss
Fire Detection by Parallel Classification of Fire and Smoke …
101
100 95 90 85
Accuracy
80 75 70 65 60 55 50
Validation accuracy Training accuracy
45 40 0
20
40
60
80
100
Number of epochs Fig. 7 Comparison of number of epochs and accuracy
Figure 7 shows the comparison of number of epochs and accuracy. The training starts with no previous knowledge, and so, the training accuracy for 2 epochs is 45 where it improves along with the increase in number of epochs. The accuracy stabilizes within a short range after the learning is reaching the converging stage. The validation accuracy starts from a higher level say 60 and increments along with the number of epochs. Figure 8 compares the number of epochs and the loss. The value of loss is inversely propositional to the accuracy. Hence, the loss is reducing when the number of epochs is increasing.
5 Result Analysis and Discussion The proposed architecture is implemented using MATLAB 2020, and the combined datasets [4, 12] and [13] are used for training and testing. The video frames from 59 videos of the three classes, namely fire only, smoke only and fire with smoke, are labeled as 1, 2 and 3, respectively. The proposed method is compared with the state-of-the-art methods as shown in Table 2. The result shows that the proposed method is giving accuracy more than majority of the state-of-the-art methods for classifying fire. The two methods [14, 15] have higher accuracy than the proposed method. They classify only two categories say, fire and normal. The occurrence of smoke and combination of smoke and fire is not considered in these two methods.
102
A. R. Singh et al. 60 55
% Validation loss % Training loss
50 45 40
Loss
35 30 25 20 15 10 5 0 0
20
40
60
80
100
Number of epochs Fig. 8 Comparison of number of epochs and loss
Table 2 Comparison of classification performance with state-of-the-art methods Fire and smoke detection method
False positive (%)
False negative (%)
Accuracy (%)
Deep CNN [14]
8.87
2.12
94.5
CE + MV + SV [16]
11.67
0
93.55
De Lascio et al. [17]
13.33
0
92.86
Habibugle et al. [18]
5.88
14.29
90.32
Rafiee et al. (YUV color) [19]
17.65
7.14
74.20
Celik et al. [20]
29.41
0
83.87
Chen et al. [21]
11.76
14.29
87.1
Arpit Jadon et al. [15]
1.23
2.25
96.52
Khan Muhammad et al. [14]
0
0.14
95.86
SmokiFi-CNN
1.1
2.1
94.74
Figure 9 is the visual representation for the output of classification for the video frames from the FIRESENSE database. Each output shows that there are classification values for three classes. From the results, it is evident that the percentage of classification for the fire, smoke and combination of both is more appropriate for the contents of the frame. Other than accuracy, the important metrics for classification evaluation are Precision, Recall and F1 Score. They are calculated as given in Eq. (4) through (6).
Fire Detection by Parallel Classification of Fire and Smoke …
103
Fig. 9 Result of classification using CNN
Precision = Recall =
True positive True positive + False positive True positive True positive + False negative
(4) (5)
104
A. R. Singh et al.
Table 3 Comparison of training methods with the state-of-the-art methods Metrics
GLNRGB [22]
ALEXRGB [22]
VGGNET CNN [23]
SmokiFi CNN
False positive
0.54
0.68
1.50
1.56
False negative
0.99
1.01
0.99
0.96
True positive
0.61
0.72
0.92
0.89
Precision
0.53
0.51
0.38
0.36
Recall
0.38
0.42
0.48
0.48
F1-Score
0.45
0.46
0.42
0.41
F1 score = 2 ×
Precision × Recall Precision + Recall
(6)
These metrics are analysed for the proposed method against the state-of-the-art methods as shown in Table 3. The results show that the proposed method achieves minimum precision and maximum recall values that lead to a minimum F1 score value than the existing methods.
6 Conclusion This paper proposes a CNN architecture to classify the videos into three classes say fire only, smoke only and fire with smoke. A reasonable large number of videos are used for training the architecture. The testing results show that the proposed architecture can classify the fire and smoke images with maximum accuracy and minimum F1-score. In future, the architecture will be improved by adding more number of layers to achieve a better accuracy in classifying the videos with both fire and smoke.
References 1. A.R. Singh, A. Suganya, Efficient tool for face detection and face recognition in color group photos, in 3rd International Conference on Electronics Computer Technology (Kanyakumari, 2011), pp. 263–265 2. A. Robertsingh G. Sathana, S. Sathya Sheela, Remote theft identification using raspberry Pi system based on motion detection. SSRG Int. J. Comput. Sci. Eng. (SSRG-IJCSE), 4(4), 21–23 (2017) 3. E. Nick, B. Andy, Z. Naaman, How Big are the Fires Burning in Australia? (Interactive map) (Guardian Australia, 2020) 4. N. Grammalidis, K. Dimitropoulos, E. Cetin, FIRESENSE Database of Videos for Flame and Smoke Detection (Zenodo, 2017) 5. S.-J. Chen, D.C. Hovde, K.A. Peterson, A.W. Marshall, Fire detection using smoke and gas sensors. Fire Saf. J. 42(8), 507–515 (2007)
Fire Detection by Parallel Classification of Fire and Smoke …
105
6. Z. Tang, W. Shuai, L. Jun, Remote alarm monitor system based on GSM and ARM. Procedia Eng. 15, 65–69 (2011) 7. T. Ono, H. Ishii, K. Kawamura et al., Application of neural network to analyses of CCD colour TV-camera image for the detection of car fires in expressway tunnels. Fire Saf. J. 41(4), 279–284 (2006) 8. T. Celik, K.-K. Ma, Computer vision based fire detection in color images, in Proceedings of the IEEE Conference on Soft Computing on Industrial Applications (SMCia ’08) (2008), pp. 258–263 9. G. Marbach, M. Loepfe, T. Brupbacher, An image processing technique for fire detection in video images. Fire Saf. J. 41(4), 285–289 (2006) 10. R. Dhanujalakshmi, B. Divya, C. Divya@sandhiya, A. Robertsingh, Image processing based fire detection system using rasperry Pi system. SSRG Int. J. Comput. Sci. Eng. 4(4), 18–20 (2017) 11. P. Li, W. Zhao, Image fire detection algorithms based on convolutional neural networks. Case Stud. Thermal Eng. 19 (2020). 12. X. Shu, F. Porikli, N. Ahuja, Robust orthonormal subspace learning: efficient recovery of corrupted low-rank matrices, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014), pp. 3874–3881 13. D.Y. Chino, L.P. Avalhais, J.F. Rodrigues, A.J. Traina, BoWFire: detection of fire in still images by integrating pixel color and texture analysis, in 28th SIBGRAPI Conference on Graphics, Patterns and Images (2015), pp. 95–102 14. K. Muhammad, J. Ahmad, Z. Lv, P. Bellavista, P. Yang, S.W. Baik, Efficient deep cnn-based fire detection and localization in video surveillance applications. IEEE Trans. Syst. Man Cybern. Syst. 99, 1–16 (2019) 15. A. Jadon, M. Omama, A. Varshney, M.S. Ansari, R. Sharma, FireNet: A Specialized Lightweight Fire & Smoke Detection Model for Real-Time IoT Applications. arXiv, arXiv:1905.11922 (2019). 16. P. Foggia, A. Saggese, M. Vento, Real-time fire detection for video-surveillance applications using a combination of experts based on color. Shape Motion IEEE Tran. Circuits Syst. Video Technol. 25(9), 1545–1556 (2015) 17. R. Di Lascio, A. Greco, A. Saggese, M. Vento, Improving fire detection reliability by a combination of video analytics, in Proceedings of the International Conference Image Analysis and Recognition, Vilamoura (Springer, Cham, Switzerland, 2014) 18. Y.H. Habib˘oglu, O. Günay, A.E. Çetin, Covariance matrix-based fire and flame detection method in video. Mach. Vis. Appl. 23, 1103–1113 (2012) 19. A. Rafiee, R. Dianat, M. Jamshidi, R. Tavakoli, S. Abbaspour, Fide and smoke detection using wavelet analysis and disorder characteristics, in Proceedings of the 2011 3rd International Conference on Computer Research and Development (ICCRD) (2011), pp. 262–265 20. T. Celik, H. Demirel, H. Ozkaramanli, M. Uyguroglu, Fire detection using statistical color model in video sequences. J. Vis. Commun. Image Represent. 18, 176–185 (2007) 21. T.H. Chen, P.H. Wu, Y.C. Chiou, An early fire-detection method based on image processing, in Proceedings of the International Conference on Image Processing (ICIP) (2014), pp. 1707– 1710 22. A. Leibetseder, M.J. Primus, S. Petscharnig, K. Schoeffmann, Real-time image-based smoke detection in endoscopic videos, in Proceedings of the on Thematic Workshops of ACM Multimedia (2017), pp. 296–304 23. P. Matlani, M. Shrivastava, Hybrid Deep VGG-NET convolutional classifier for video smoke detection. Comput. Model. Eng. Sci. 119, 427–458 (2019)
A Split Key Unique Sudoku Steganography (SKUSS)-Based Reversible High Embedded Data Hiding Technique Utsav Kumar Malviya and Vivek Singh Rathore
Abstract Confidential data transfer or secure communication is always remaining as an essential need in this era of modern communication. This research work is an implementation of a new method named Split Key Unique Sudoku Steganography (SKUSS). It is a reversible data hiding with high embedding capacity, where high embedding signifies data bits per pixels of the cover image that are more with maintaining the naturalness of the cover image. For the implementation of SKUSS first, a unique Sudoku has been generated with a user-defined 64-bit key and a 64-bit public key, then that Sudoku is used for hiding data digits (ASCII code) into cover image pixels. The data can be only recovered with that same Sudoku with the correct public and user keys. This method is based on a hybrid algorithm that applies the techniques of Sudoku and Steganography to offer different security features to the images transmitted between entities on the Internet. Based on the proposed method, the authenticity and integrity of the transmitted images can be verified either in the spatial domain or in the encrypted domain or both domains. The work is implemented on MATLAB design and simulation tools. Keywords Public–private key · Sudoku · Steganography · Bitmap image file · Split Key Unique Sudoku Steganography (SKUSS) · Bit per pixels
1 Introduction Applications where data security is a big concern than bandwidth, steganography can be used over encryption. Steganography is a data hiding technique for secure data communication; Steganography is less suspicious than encryption. There are certain issues with steganography like network monitoring systems that will not mark the steganography files, which have secrete data in it. Hence, if someone tries to steal secrete data, they cover unmark files with different files and can send it simple email [1]. Another problem is that lots of data have to be transmitted which arises U. K. Malviya (B) · V. S. Rathore Government Engineering College, Bilaspur, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_10
107
108
U. K. Malviya and V. S. Rathore
suspiciousness to intruders. Also, the same method cannot be used for all types of cover image [2]. In steganography, the time for hiding data should be low enough so it does not disturb the communication [3]. The issue is to maintain the balance between robustness, imperceptibility, and extent as increasing one-factor reverse affects others. If payload size increases, it bargains with the imperceptibility [4]. So, the major problem is to maintain balance among parameters. Sudoku-based data hiding resolves the issues up to certain levels define in [1–4]. Steganography method based on Sudoku was developed in 2008 [5]. Chang’s method significantly improves embedding capacity by using Sudoku. Because Sudoku can have a large number of possible solutions in comparison with other steganography methods like LSB hiding. In 2015, Nguyen [1] presented a Sudoku-based reversible data hiding method, reversible signifies that at the receiver end, both the secrete data and cover image get recovered. Nguyen’s [1] uses the Sudoku scheme for data hiding which allows embedding more data bits also a good quality of cover image at the receiver side. Recovery of stego image was the advantage of Nguyen’s method [1] work in comparison with Chang’s [4] method. Later, in 2017, for obscure, security Sudoku-based dual cover hiding technique [3] presented. Jan et al. [5] and Chang et al. [4] methods use a single cover image where the cover image is divided into two areas: first embedding area in which secrete data is hidden and the second non-embedding area which has a location map. Sarvan’s method [6] resolves the issue of location map in the same image and uses a separate image for the location map. The use of dual image allows very high security; however, the bandwidth requirements also get double. Later, in 2018, Cheng [2] develop a method for data hiding in an image using Sudoku and use division arithmetic and generalized exploiting modification direction (DA-GEMD) for embedding the data into the cover image where pixels are selected using Sudoku. Chang’s [5] and Nguyen’s [7] methods use Sudoku and interpolation of the cover image used for location map and hide the data bits into pixels newly generated using interpolation, the issue with the interpolation technique to maintain the naturalness of the original image hence DA-GEMD [2, 8] based location map is better in keeping naturalness of cover image, Steganography using Sudoku and DA-GEMD are not reversible. This work defines a new method SKUSS for data hiding using Sudoku-based encoding to choose the digit replacement of cover image with message data, which resolve the general steganography issues as was define in [1–4]. The data hiding method by [5, 7, 6] and [9] was not reversible and proposed SKUSS has an advantage as it is reversible data hiding method. The proposed SKUSS uses a single image for data hiding; hence, it requires less bandwidth then [6, 10]. Nguyen’s [1] data hiding method use Sudoku as a key which makes a key itself big data (648-bit key), proposed SKUSS use a 64 bit and generate unsolved Sudoku with the split key method then solve the Sudoku with Pramberg’s [11] method of Sudoku solver, This technique of the proposed work reduces the Key size and payload. Because only 64 bits key same cover image can be used for data and location map with high naturalness of cover image. This work uses Jorgen Pramberg [11] method for solving Sudoku; Jorgen [11] states that the generated Sudoku puzzle should only have one solution. Table 1 below shows the literature summary with the outcome of a few researcher’s work.
A Split Key Unique Sudoku Steganography (SKUSS) Based …
109
Table 1 Literature summary Authors
year
Chang et al. [4]
2008 Data hiding in cover Sudoku solution is used to guide cover pixels
Method
1.5 bpp hidden in test Image obtain PSNR of 44.5
Kumar [3]
2017 Dual cover image-based data hiding scheme-based where Sudoku used as pixel finder
0.06688 bpp hidden in test Image obtain PSNR of 59.27
Cheng et al. [2]
2017 Division arithmetic and 1.28 bpp hidden in Lena Image generalized exploiting obtain PSNR of 45.41 modification direction (DA-GEMD) used for data hiding and Sudoku used for finding pixel to be changed
Nguyen et al. [1] 2015 Develop a new reversible data hiding scheme using Sudoku-based encoding and LSB substitution
Outcomes
1.2 bpp hidden in Lena Image obtain PSNR of 41.03
2 Methodology The concept of Sudoku-based data hiding is that not to make any changes in the original cover image pixels even in the cover image only encoded according to the secrete data. Encoding of the cover image instead of direct data bits hiding gives high robustness, imperceptibility, and extent. Figure 1 explains the process flow of the data hiding process using the proposed SKUSS method. This work has four stages: first, create an unsolved unique Sudoku with given 64-bit key using proposed split key method; second, solve the key-based newly developed Sudoku with Jorgen’s Sudoku solver [11]; third, interpolate the cover image segments of size 3 × 3 into the size of 6 × 6, and last hide the message data bits into an interpolated cover image with decoding method of solved Sudoku. Proposed Split Key Unique Sudoku Steganography (SKUSS) uses two split keys: one 8 BCD digit (32 bit) public Key [12] provided to the authorized user and one 8 BCD digit (32 bit) private key for an authorized service provider, with a combination of both the keys develop the final 16 BCD digit key (64 bit). The final 16 digit Sudoku key is used to develop unsolved Sudoku. This method of developing unsolved Sudoku using split public and private keys makes the proposed method robust against intrusion. A key-based Sudoku is used for locating the data position and the interpolated pixel of cover modified accordingly. Because the interpolated pixels are slightly changing as per data digits [13], and data digits are not changing the original cover image pixels, the naturalness of cover image will be sustained with high PSNR at receiver and also the cover image can be extract at the receiver side and the reversible steganography will be achieved.
110
U. K. Malviya and V. S. Rathore
Fig. 1 Process flow of SKUSS: data embedding
2.1 Algorithm Step1 : At transmitter side, 8 digit public key (Eqs. 1 and 8) digit private key (Eq. 2) combines using XOR logic operation and produce 8 digit Sudoku key (Eq. 3) which further converted into base 9 (Eq. 4). Base 9 conversion required because Sudoku can have 9 digits only. With the 8 digit Sudoku key, nine different BCD numbers (t 1 –t 9 ) were developed using process shown in Eq. 6. Notice that the numbers t 1 –t 9 are not same and are random. Using fixed Sudoku format (Eq. 5) and space for t 1 –t 9, the unsolved Sudoku is created which is unique because it was developed with the help of public and private keys. It may be noticed that for different keys, the different unsolved Sudoku’s will be created.
A Split Key Unique Sudoku Steganography (SKUSS) Based …
111
Key = K 1 K 2 K 3 K 4 K 5 K K 6 K 7 K 8
(1)
NK = K 2 K 3 K 4 K 5 K K 6 K 7 K 8 K 1
(2)
where K 1 ,K 2, … K 8 are the eight digits of the public key. Logical XOR operation between NK with private key (PK) generates the main Key MK MK = NK ⊕ PK
(3)
MKr = {(10 − r ) − MK}
(4)
Sud is a 9 × 9 matrix of BCD digits where the position of t 1 ,t 2 … t 9 are fixed and unknown. t1 UN UN UN Sud = UN t6 t9 UN UN
UN UN t4 UN UN t7 UN t1 t5 UN UN UN UN UN t2 UN UN t4
UN UN UN t5 t8 UN t2 UN UN t4 UN UN UN UN UN UN UN UN
t2 UN UN UN UN t9 UN t3 UN UN t8 UN UN UN UN UN t5 UN
t3 UN UN t6 UN UN UN UN UN t7 UN UN t1 UN UN t3 UN UN
(5)
Process below is used to find the unknown variable of Eq. (5) with the value of MK from Eq. (4)
(6)
112
U. K. Malviya and V. S. Rathore
Step 2: The unknown BCD digits in Sudoku matrix ‘sud’ of Eq. (5) is solved using Jorgen’s Sudoku solver. The solved Sudoku is represented with variable ‘Sud1 ’ below in Eq. 7. This solved Sudoku is derived by the 8 digits public and 8 private key. Different keys will always produce new unique solved Sudoku. This Sudoku also has good avalanche means one digit input key changes cause changes of 75–90% digits of developing Sudoku. It behaves like a chaotic system. Convert Sudoku (Eq. 7) into base 9 Sudoku (Eq. 8). Next creates multiple copies of solved base 9 Sudoku (9 × 9) so the final matrix has 255 × 255 digits shown in Fig. 2. t1 U7 U13 U19 Sud1 = U25 t16 t18 U45 U52 0
1
U1 t4 U14 U20 t13 U31 U38 t20 U53
U2 U8 t7 t10 U26 U32 U39 U46 t22
U3 U9 t8 t11 U27 U33 U40 U47 U54
U4 t5 U15 U21 t14 U34 U41 U48 U55
t2 U10 U16 U22 U28 t17 U42 U49 U56
U5 U11 t9 t12 U29 U35 U43 U50 t23
t3 U12 U17 U23 U30 U36 t19 U51 U57
2 3…
0 1 2 3 … … ... ... ... ... ... .... .... r11 … … … … ... ... ... ... 255
Fig. 2. 255 × 255 unique Sudoku (SubM)
……..
…….
….
U6 t6 U18 U24 t15 U37 U44 t21 U58
(7)
A Split Key Unique Sudoku Steganography (SKUSS) Based …
113
Convert Sudoku (Eq. 7) into base 9 Sudoku by subtraction each element of Sud1 (Eq. 7) by one. Base 9 conversion is required because Sudoku can have only 9 digits. Sud M = Sud1 − 1
(8)
Step 3: Isolate 3 × 3 segments of cover image (Eq. 9). Perform interpolation using the formula (Eqs. 10, 11 and 12) and convert cover image segment 3 × 3 into 6 × 6 segment and to maintain naturalness of cover image, four pixels of cover image are used to develop new interpolated cover image pixel. 6 × 6 interpolated segment of cover image is shown in Eq. 13. Let ‘img’ is one 3 × 3 module of cover image r 11 r 12 r 13 img = r 21 r 22 r 23 r 31 r 32 r 33 r xi + r xi+1 2 r xni = r xi + 2 ri x + r (i + 1)x cxni = ri x + 2 2
r xni +r xn(i+1) cxni +cxn(i+1) cxni + r cxni = r xni + 2 + 2 2 2 2
(9)
(10) (11) (12)
where x is constant r 11 c1n1 r 21 img1 = c2n1 r 31 c3n1
r 1n1 r c1n1 r 2n1 r c2n1 r 3n1 r c3n1
r 12 c1n2 r 22 c2n2 r 32 c3n2
r 1n2 r c1n2 r 2n2 r c2n2 r 3n2 r c3n2
r 13 c1n3 r 23 c2n3 r 33 c3n3
r 1n3 r c1n3 r 2n3 r c2n3 r 3n3 r c3n3
(13)
img1 in Eq. (13) is the 6 × 6 block of interpolated of the cover image. Step 4: This step explains Sudoku-based hiding of data digits into cover image. The unique public-private key-based solved Sudoku multiple copies matrix 255 × 255 as explain in step-1 shown in Fig. 2. Let input ASCII message data is Di and NDi is base 9 converted message data where i is the digit position. From the interpolated cover image ‘img1’ (Eq. 13), find the values of (r 11 and r 1n1 ) and in Fig. 2, Sudoku selects the position correspond to (r 11 , r 1n1 ) then selects nine values (4 forward, 4 backward, and from position (r 11 , r 1n1 ). For example, let (r 11 , r 1n1 ) selected element in Fig. 2 is t18, then chosen nine digits will be as Si = [U 42 U 43 t19 U 44 t18 U 38 U 39 U 40 U 41], As these nine digits are part of
114
U. K. Malviya and V. S. Rathore
a Sudoku, hence all 9 digits will be different. Now, for first digit data ND1 find its position (p) of ND1 in the Si and then in interpolated cover image img1 (Eq. 13) replace the value at (r 11 , r 1n1 ) with (r 11 , p), After first data digit encoding again in interpolated cover image selects (r 11 and c1n1 ) and then finds position (r 11 , c1n1 ) in Fig. 2 Sudoku matrix SudM, then select second data digit ND2 similarly select nine values (4 forward, 4 backward and from position (r 11 , c1n1 ) and again find ND2 digit position (p) in Si and replace interpolated image (r 11 , c1n1 ) value with (r 11 , p), keep doing the process for all data digits will produce stego-image which is Sudoku-based encoded version of cover image as shown in Eq. 14. r 11 Yn4 r 21 g1 = Yn10 r 31 Yn16
Yn1 r c1n1 Yn7 r c2n1 Yn13 r c3n1
r 12 Yn5 r 22 Yn11 r 32 Yn17
Yn2 r c1n2 Yn8 r c2n2 Yn14 r c3n2
r 13 Yn6 r 23 Yn12 r 33 Yn18
Yn3 r c1n3 Yn9 r c2n3 Yn15 r c3n3
(14)
The whole idea of SKUSS is that here not making any changes in the original information pixels of the original cover image, new generated pixels values after interpolated pixel get modified and small changes in this interpolated pixels do not significantly affect quality of the image. Also the cover image pixels are just finding location in the 255 × 255 Sudoku matrix and as per the data digit and that location pixel of cover gets modified, it may be noted that the change in pixel can be ±8 only (Fig. 3). At the receiver side with the same 16 BCD digits (64 key), public and private keys, same Sudoku generated as already explained in Step-1 and Step-2 of algorithm. At the receiver side, reverse encoding of 255 × 255 Sudoku matrix on stego image reconstructs the base-9 of original data which can be further converted into base10 for original ASCII message data. Also simple decimation of the stego-image at receiver side reconstructed the original cover image. As both the data and the cover image reconstructed at the receiver side, the proposed SKUSS method is a reversible data hiding method.
3 Results and Discussion The work is implemented using MATLAB and tested for a different combination of cover images and data sizes. One example of simulation is explained here with eightdigit public key 52,642,847 and eight-digit private key 12,345,678, the unsolved Sudokum and then its solved Sudoku developed and shown in Fig. 4. Data conversion into Base-9 can be explained with an example that let the data is ‘Ram’ its base-10 ASCII is {082,097,109} and its base-9 is {101,117,131}. The base-9 data digits is used to modify interpolated pixels values.
A Split Key Unique Sudoku Steganography (SKUSS) Based …
115
Fig. 3 Process flow of SKUSS: Data Extraction
Fig. 4 a Unsolved Sudoku developed with a combination of 8 digits public key and 8 digit private key b Sudoku after the solution
Figure 5 below shows the original Lena image, interpolated Lena image, and final stego Lena image. The image format taken is a bit map file. Bits per pixel is the number of bits that can be hidden inside a pixel. Proposed method is not a direct data hiding method; here, the interpolated cover image is just changed according to base-9 data digits and Sudoku encoding and develop stego image. The capacity of data which can be hidden inside the proposed work can be
116
U. K. Malviya and V. S. Rathore
Fig. 5 a Original 512 × 512 pixels Lena image b interpolated Lena image of 1024 × 1024 pixels c stego Lena image of size 1024 × 1024 pixels
explained with an test image of Lena with 512 × 512 × 3 pixels. Total 170 × 170 × 3 = 86,700 numbers of 3 × 3 blocks can be developed with the test image and after interpolation, 3 × 3 block is converted into 6 × 6 block. Hence, the interpolated image will have{(170 × 170 × 6 × 6) + (170 × 6 × 2) + (170 × 2 × 6) + (2 × 2)} × 3 = 3,133,452 pixels. In one 6 × 6 block of interpolated image total 18, base-9 digits can be encoded hence total 86,700 × 18 × 8 = 12,484,800 maximum data bits can be hidden in interpolated cover image. maximum numbers of data bits Decoded at reciver Pixels in cover image 12, 484, 800 = 3.98 = 3, 133, 452
Maximum BPP =
Maximum BPP can be achieved in this work is 3.98; hence, up to 49.75% of cover image size message data can be hided with proposed SKUSS method however with maximum BPP (i.e., 3.98) the PSNR obtained is only 28.87 which is very low and cannot be considered as maintained image naturalness. Table 2 shown below is the results obtained for different sizes of message data hidden and recovered message with the proposed method. PSNR and MSE are computed between the cover and stego image. It may be noted that for cover image of size 512 × 512 × 3, total 1,024,000 data digits can fully be recovered in the proposed method; if need to hide bigger data (i.e., more than 1,024,000 data digits), then the size of the cover image must be increase. Table 3 shown below shows the analysis between embedding capacity/bit per pixels and also comparison with Nguyen’s [1] data hiding method. The experimental results are shown in Table 2 for analyzing the test image of Lena with size of 512 × 512. From Table 3 and Fig. 6, it may be observed that proposed work can hide 1.2 bits per pixel for the image naturalness PSNR 47.82; however, Nguyen [1] can hide only
A Split Key Unique Sudoku Steganography (SKUSS) Based …
117
Table 2 MSE and PSNR observed for different message size for Lena’s cover image of 512 × 512 Number of message data digits hided
Between cover and stego image Bits per pixel PSNR
MSE
Message recovered
2000
74.31
0.0024
0.00051
YES
4000
70.79
0.0055
0.01021
YES
8000
65.01
0.0207
0.02042
YES
16,000
62.15
0.0399
0.04084
YES
32,000
58.65
0.0894
0.08169
YES
64,000
55.84
0.1708
0.16339
YES
128,000
53.83
0.2713
0.32679
YES
256,000
51.35
0.4803
0.6535
YES
512,000
47.18
1.2545
1.30718
YES
1,024,000
43.19
3.1440
2.61436
YES
Table 3 Analysis between BPP and PSNR and comparison with Nguyen [1] data hiding scheme Embedded capacity (BPP)
PSNR observed in SKUSS (proposed method)
PSNR observed in (Nguyen [1])
0.5
52.07
47.81
0.8
50.98
44.65
1
50.09
42.57
1.2
47.82
41.09
BPP vs PSNR
54 52.07
52
50.98 50.09
50
P S 48 N 46 R
47.82
47.81
PSNR observed in Nguyen, T,-S [1]
44.65
44
42.57
42
PSNR observed in SKUSS (Proposed method)
41.09 40 0.5
0.8
1
1.2
Bit Per Pixels Fig. 6 Comparison between proposed SKUSS data hiding and Nguyen [1] data hiding scheme
118 Table 4 Comparative results
U. K. Malviya and V. S. Rathore Methods and observe results
Proposed work observe results
Chang et al. [4]
1.5 bpp hidden and obtain PSNR of 44.5
1.5 bpp hidden and obtain PSNR of 45.08
Kumar [3]
0.06688 bpp and 0.06 bpp and obtain obtain PSNR of 59.27 PSNR of 60.11
Cheng et al. [2]
1.28 bpp hidden and 1.3 bpp hidden and obtain PSNR of 45.41 obtain PSNR of 47.18
Nguyen et al. [1] 1.2 bpp hidden and 1.2 bpp hidden and obtain PSNR of 41.03 obtain PSNR of 47.82
0.5 bits per pixels for image naturalness PSNR 47.81. This proves that this work is having high embedding capacity. Form Table 4, it can be observed that the embedding capacity of proposed work is better than other methods.
4 Conclusion This work shows a new method of data hiding named Split Key Unique Sudoku Steganography (SKUSS), where two split keys one 8 digits public key and another 8 digit private key combine and a 16-digit steganography key developed. Unsolved Sudoku is designed with the help of Steganography key and that Sudoku is solved with the fast Sudoku solver algorithm. With the help of solved Sudoku, the message data digits modifies the interpolated pixels of cover image and produces Stego image in Bit map format. Here, no direct message data is hidden only pixels modifies according to message. Hence robustness, naturalness imperceptibility, and extent are maintained significantly. As only newly generated pixels of the cover image get modify according to data the reconstruction of the original image at the receiver side became easy with simple decimation of stego-image and this work achieves reversible data hiding. High PSNR and low MSE are achieved because of the same. The work is a design and simulated using MATLAB tool. Proposed SKUSS can hide 1.2 bits per pixel for the image naturalness PSNR 47.82 though Nguyen [1] can hide only 0.5 bits per pixels for image naturalness PSNR 47.81; hence, it may be concluded that proposed work is having high embedding capacities. This work is tested with bit map file format only and in near future it can be implemented with the other image file formats.
A Split Key Unique Sudoku Steganography (SKUSS) Based …
119
References 1. T.S. Nguyen, C.C. Chang, A reversible data hiding scheme based on the Sudoku technique. Displays 39, 109–116 (2015). https://doi.org/10.1016/j.displa.2015.10.003 2. J.C. Cheng, W.C. Kuo, S.R. Su, Data-Hiding based on sudoku and generalized exploiting modification direction. J Electron Sci Technol 16(2), 123–128 (2018) 3. M.V.S. Kumar, E. Mamatha, C.R. Reddy, V. Mukesh, R.D. Reddy, Data hiding with the dual based reversible image using the Sudoku technique, in IEEE International Conference on Advances in Computing, Communications, and Informatics (ICACCI) (2017), pp. 2166–2172. https://doi.org/10.1109/ICACCI.2017.8126166 4. C. Chang, Y. Chou, T.D. Kieu, An information hiding scheme using sudoku, inIEEE International Conference on Innovative Computing Information and Control (2008), pp. 17–21. https://doi.org/10.1109/ICICIC.2008.149 5. S. Jana, A.K. Maji, R.K. Pal, A novel SPN-based video steganography scheme using Sudoku puzzle for secured data hiding. Innov. Syst. Softw. Eng. 15, 65–73 (2019). https://doi.org/10. 1007/s11334-019-00324-8 6. P. Jorgen, Sudoku Solver and Generator (2010) 7. C. Chang, T. Nguyen, Y. Liu, A reversible data hiding scheme for image interpolation based on reference matrix, in IEEE International Workshop on Biometrics and Forensics (IWBF) (2017), pp. 1–6. https://doi.org/10.1109/IWBF.2017.7935098 8. C. Chang, C. Li, Reversible data hiding in JPEG images based on adjustable padding, in 5th International Workshop on Biometrics and Forensics (IWBF), pp. 1–6 (2017) https://doi.org/ 10.1109/IWBF.2017.7935083 9. Y. Lin, C. Wang, W. Chen, F. Lin, W. Lin, A novel data hiding algorithm for high dynamic range images. IEEE Trans. Multimedia 19(1):196–211 (2017). https://doi.org/10.1109/TMM. 2016.2605499 10. S. Dixit, A. Gaikwad, S. Gaikwad, S.A. Shanwad, Public key cryptography based lossless and reversible data hiding in encrypted images. Int. J. Eng. Sci. Comput. 6(4), 75–79 (2016). https:// doi.org/10.4010/2016.822 11. S. Chakraborty, S.K. Bandyopadhyay, Steganography method based on data embedding by sudoku solution matrix. Int. J. Eng. Sci. Invent. 2(7), 36–42 (2011) 12. R. Kohias, U. Maurer, Reasoning about public-key certification: on bindings between entities and public keys. IEEE J. Sel. Areas Commun. 18(4), 551–560 (2000). https://doi.org/10.1109/ 49.839931 13. V.M. Manikandan, V. Masilamani, Reversible data hiding scheme during encryption using machine learning. Int. Conf. Robot. Smart Manuf. Procedia Comput. Sci. 133, 348–356 (2018) 14. A. Haj, A.H. Nabi, Digital image security based on data hiding and cryptography. in International Conference on Information Management (ICIM), pp. 437–440 (2017). https://doi.org/ 10.1109/INFOMAN.2017.7950423 15. S. Rawal, Advanced encryption standard (AES) and it’s working. Int. Res. J. Eng. Technol. 3(8), 125–129 (2014)
Identification of Insomnia Based on Discrete Wavelet Transform Using Time Domain and Nonlinear Features P. Mamta and S. V. A. V. Prasad
Abstract Insomnia is a type of sleep disorder that affects both the psychological and mental state. Conventionally, clinicians diagnose insomnia with a clinical interview that is subjective and suffer from personal error decision. The aim of this study is to identify insomnia subjects from normal subjects by using an electroencephalogram (EEG) signal, which is considered from a publicly available CAP sleep database. The EEG signal is decomposed by applying discrete wavelet transform (DWT) to obtain different brain wave patterns namely beta, alpha, theta, and delta waves. The time domain and nonlinear features are extracted from the Fp2–F4 EEG channel to form a feature vector. The performance of the four different classification techniques comprising k-nearest neighbor (KNN), support vector machine (SVM), ensemble, and decision tree (DT) is evaluated by employing fivefold cross-validation. The DT classifier achieved a classification accuracy of 85%. Also, the results demonstrate the feasibility of prefrontal channel EEG (Fp2–F4) for the identification of insomnia. Keywords Insomnia · Discrete wavelet transform · EEG · Time domain features · Nonlinear features
1 Introduction Insomnia is related to sleep dispossession, which has the tendency to affect both the psychological and mental state. Mental issues are more prone to insomnia or any other sleep disorders. Individuals with insomnia may have a ten-fold risk in the development of psychiatric illness or depression compared to the individuals, who are good sleepers. Sleep issues are most common in patients with bipolar disorder, anxiety, and depression. About one-third of depressed individuals have insomnia symptoms. Approximately 10% of older patients and 40% of young depressed patients suffer P. Mamta (B) Department of EEE, G.Narayanamma Institue of Technology and Science, Hyderabad, India S. V. A. V. Prasad Department of EEE, Lingayas Vidyapeeth, Faridabad, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_11
121
122
P. Mamta and S. V. A. V. Prasad
from hyper-insomnia. Traditionally, clinicians treat patients with psychiatric disorders or depression also view insomnia as a significant symptom. Generally, the diagnosis of insomnia is based on the insomnia severity index questionnaire [ISI], where physicians ask some sleep-related questions such as daytime sleepiness and awake. Besides, physicians also diagnose insomnia through objective measures using PSG recording from the subjects complaining of insomnia. Subjects spend overnight or two days at the sleep center to record PSG, where it includes EEG, EOG, and EMG signals. EEG is a non-invasive diagnostic instrument that is used to record the potential difference of a signal by placing a 10–20 electrode system on the brain scalp. It is prevalently used in the diagnosis of depression, insomnia, epilepsy, and many other psychiatric anomalies. EEG was claimed as a non-invasive procedure to study cognitive response [1, 2] and other disorders like insomnia, epilepsy, and depression [3–5]. Among all the types of cognitive data, EEG indicates emotional human brain activity in real time. Nauta et al. highlighted that the prefrontal cortex plays a crucial role in various aspects of the cognitive process [6]. In [7] demonstrated the EEG signal classification of healthy, epilepsy groups, groups of epileptic syndrome during seizure using a wavelet-based neural network classifier. In [8], the alertness level is distinguished by the error backpropagation neural network classifier and uses the power spectral density of DWT as input features. Furthermore, it is studied that the wavelet transform decomposes the EEG signal into frequency sub-bands through which statistical features were extracted [9]. In a study, [10] has analyzed normal and insomnia subjects from ECG and EEG signals and extracted linear and nonlinear features such as largest Lyapunov exponent, sample entropy, and correlation dimension from the signals. The execution of wavelet transform (WT) has proven to be an optional tool for the Fourier transform (FT) to the study of non-stationary EEG signals [11, 12]. The author [13] demonstrated graph spectral theory using a hypnogram. Logistic regression was applied for the identification of insomnia and obtained accuracy, sensitivity, and specificity of values 81%, 87%, and 75%, respectively. In the study [14], nonlinear features of the EEG signal are used for the classification of insomnia subjects from healthy subjects and achieved the classification accuracy of 83% by using a support vector machine. At present, many of the researchers have reported their studies on automatic sleep stage classification [15] and sleep disorder [16] using different machine learning algorithms. But, there are very few studies focusing on the identification of insomnia. Observational studies have demonstrated that insomnia is highly prevalent in depression patients and often, insomnia is considered as a core symptom of depression [17]. This paper remains as the first step toward a more profound understanding of the relation among insomnia, the prefrontal cortex, and depression. To the best of the author’s knowledge, this is the first study to report that the prefrontal region and channel Fp2–F4 are feasible for the identification of insomnia using nonlinear and time domain features considered as a core symptom of depression. The proposed method used in this work provides a significant performance increase in comparison with others.
Identification of Insomnia Based on Discrete Wavelet …
123
In this paper, the single-channel (Fp2–F4) EEG signal based on the prefrontal cortex region is analyzed from insomnia patients and healthy patients. The signals are decomposed using DWT, which includes four brain wave patterns such as beta (β), alpha (α), theta (θ ), and delta (δ). Then, time domain and nonlinear features are computed from the obtained brain wave patterns. Further, computed features are applied to four different classification techniques namely SVM, DT, KNN, and ensemble using a fivefold cross-validation method to access the classification accuracy. Herewith, the study aims to identify the insomnia subjects from the healthy subjects based on discrete wavelet transform (DWT) using single-channel EEG (Fp2– F4). In this work, it is shown that the methodology used in our study could identify insomnia using the prefrontal cortex region. This paper is organized as follows: Sect. 2 about EEG data collection and Sect. 3 presents an overview of the methodologies employed in our work. In Sect. 4, results and discussions are elaborated and the conclusion of the study is presented in Sect. 5.
2 EEG Data Collection In this study, the EEG data is obtained from the publicly available datasets on the physionet named CAP Sleep database version1.0.0. Dataset is constructed by the sleep disorder center, the ospedate Maggiore of Parma, Italy [18]. It has 108 subjects of polysomnographic (PSG) recordings comprising of three EEG channels (F3 or F4, O1 or O2, and C4 or C3), EEG bipolar channels, two EOG channels, EMG signals, and ECG signals. In our proposed work, the Fp2–F4 EEG channel is selected for insomnia identification and the experiment is carried out on five healthy subjects and seven insomnia patients with a sampling rate of 512 Hz. The number of samples drawn from data is 51200 for the first 10 s and it followed by 1 min with six epochs. Also, the EEG signal is employed by IIR notch filter to remove the power line interference (PLI) noise of 50 Hz. The EEG sample data segment of healthy subjects and insomnia patient is shown in Fig. 1.
3 Methodology The flow diagram of the insomnia identification based on single-channel EEG (Fp2– F4) is shown in Fig. 2, which comprises of three main parts (i) discrete wavelet transform (DWT), (ii) feature extraction methods, and (iii) classification.
124
P. Mamta and S. V. A. V. Prasad
Fig. 1 Sample EEG signal of (i) insomnia subject and (ii) healthy subject
Fig. 2 Flow diagram for the insomnia identification
Identification of Insomnia Based on Discrete Wavelet …
125
Fig. 3 The six-level decomposition of EEG signal using DWT
3.1 Discrete Wavelet Transform Discrete wavelet transform (DWT) divides the EEG signal into two: detail and approximate coefficients at different frequencies. The detail coefficient is the highfrequency component, high-pass filter, and the approximate coefficient is the lowfrequency component, low-pass filter. In this work, discrete wavelet transform (DWT) is employed to EEG signal of a healthy subject and insomnia patient to obtain mainly four different brain wave patterns comprising beta, alpha, theta, and delta waves. The EEG signal (S) with a sampling frequency of 512 Hz is decomposed applying sixlevel multiresolution decompositions using mother wavelet as Daubechies 4(Db4). According to the basic theory of the Nyquist sampling theorem [19], the corresponding frequency patterns are obtained as illustrated in Fig. 3. The detail coefficients at levels 4, 5, and 6 (D4, D5, and D6) represent the three brain wave patterns which include β (16–32 Hz), α (8–16 Hz), and θ (4–8 Hz) waves, respectively. The approximate coefficient at level 6 (A6) represents the δ wave (0–4 Hz).
3.2 Feature Extraction This study has utilized two feature extraction methods, which are divided into (1) time domain method and (2) nonlinear approach. Statistical features and nonlinear
126
P. Mamta and S. V. A. V. Prasad
Table 1 Features extracted from Fp2–F4 channel EEG
Features
Label
f1
Mean, variance, kurtosis, and skewness
f2
Approximate entropy (ApEn)
f3
Shannon entropy (SE)
f4
Renyi’s entropy (RE)
features are computed from the single-channel (Fp2–F4) EEG full-wave, β, α, θ , and δ waves. The extracted features are labeled and listed in Table 1.
3.2.1
Time Domain Method
In the time domain method, the statistical parameters comprising mean, variance, skewness, and kurtosis are determined. These parameters are mostly known for their ability to describe the statistical moments of the EEG signal and are known as linear features. The expression to calculate the mean, variance, skewness, and kurtosis as follows: N 1 yi N i=1
Mean(μ) =
Variance(ϑ) = σ 2 σ =
1 /yi − μ/2 N − 1 i=1
(1) (2)
N
(3)
where σ is standard deviation skewness(ψ) = kurtosis(k) =
E(y − μ)3 σ3 E(y − μ)4 σ4
(4)
(5)
where yi is the sample and N is the number of samples.
3.2.2
Nonlinear Methods
Nonlinear methods can capture the chaos behavior and sudden changes in the EEG signal caused by biological events developing in the brain. In this study, the nonlinear
Identification of Insomnia Based on Discrete Wavelet …
127
methods used are approximate entropy (ApEn), Shannon entropy (SE), and Renyi’s entropy (RE). The extracted features are listed in Table 1. Approximate Entropy ApEn is introduced by Pincus to address the complexity and irregularity of the time series [20]. ApEn is a statistical method employed to quantize the abnormality of the signal. The more signal is complex, the higher the ApEn values. The following expressions compute approximate entropy (ApEn). Let us consider the initial signal be y(1), y(2), ...., y(N ), where N is the number of samples of the signal. Y (i) = [y(i), y(i + 1), y(i + 2), . . . , y(i + m − 1)], 1 ≤ i ≤ N − m + 1 where Y (i) represents the time series of the signal and m is the embedding dimensions. Approximate entropy (ApEn) is calculated by ApEn(m, r, N ) = ψ m (r ) − ψ m+1 (r )
(6)
where ψ m (r ) =
N −m+1 1 ln(Cim (r )) N − m + 1 i=1
(7)
To compute Cim (r ) the distance between the two elements namely Y (i) and Y ( j) are compared with a threshold value (r). In this work, the threshold value is chosen as r = 0.25 × sd (standard deviation of the signal). And: Cim (r ) for each i,i = 1, 2, 3,…, N − m is defined as: Cim (r ) =
number of d[Y (i), Y ( j)] ≤ r N −m+1
(8)
where Cim (r ) represents a correlation integral for a given time series, it measures the information generated in a chaotic system with a threshold value r, for m dimensions. Shannon Entropy Shannon entropy evaluates the irregularity of the signal [21]. The Shannon entropy of Y as an EEG signal is defined as: SE(Y ) = −
n
Pi logb Pi
(9)
i=1
where Pi is the probability distribution, and b is the logarithmic base, i.e., bits b = 2.
128
P. Mamta and S. V. A. V. Prasad
Renyi’s Entropy Renyi’s entropy (RE) is a measure used for computing the spectral complexity of the time series [22]. This work has considered α value and logarithm base as 2. It is defined as: n α α log Pi , α ≥ 0 and α = 1 (10) RE(α) = 1−α i=1 where quantity Pi is the probability distribution of the EEG signal with n bins and α is the Renyi’s entropy order.
3.3 Classification Techniques The proposed research work has considered four different classifiers which include k-nearest neighbor (KNN) [23], support vector machine (SVM), ensemble [24], and decision tree (DT) [25]. These classifiers are used to evaluate and compare the classification performance by employing fivefold cross-validation. And, also to demonstrate best suitable classifier for the identification of insomnia, SVM has been employed in the field of depression differentiation [26]. KNN is used in the field of medical informatics, including the diagnosis of stress and epilepsy [27]. The classification performance of the classifiers evaluated by calculating sensitivity (S e ), specificity (S p ), and accuracy (Acc ) is defined as follows [28]: Acc = No. of correct decisions/ total number of cases
(11)
Se = True negative/ False positive + True negative
(12)
S p = True positive/ False negative + True positive
(13)
4 Results and Discussion This section presents a detailed evaluation of our results. In this study, all data preprocessing and parameter analysis have been implemented using MATLAB software. The features f1, f2, f3, and f4 are extracted from full-wave EEG signal and brainwave patterns comprising β, α, θ, and δ waves that are obtained by employing DWT with six-level decomposition using Fp2–F4 channel EEG signal. The extracted features form a feature set. Further, PCA is utilized to decrease the dimensionality of the
Identification of Insomnia Based on Discrete Wavelet …
129
Table 2 Classification performance of features with different classifiers Features
Classifiers
Accuracy (%)
Sensitivity (%)
Specificity (%)
f1
DT
78.3
83
73.3
SVM
65
43.3
86.7
KNN
78.3
83.3
73.3
Ensemble
78.3
83.3
73.3
DT
60
53.3
66.7
SVM
63.3
63.3
63
KNN
73
83.3
63.3
Ensemble
63.3
76.6
50
DT
76.7
83
70
SVM
68.3
73
63
KNN
66.7
67
67
Ensemble
75
83
67
DT
63.3
77
50
SVM
66.7
53
80
KNN
68.3
73
63
Ensemble
78.3
83
73
f2
f3
f4
feature set and it also improves the performance of the classifiers with a high dimension of feature set. The reduced feature set is fed to four different classifiers KNN, SVM, ensemble, and DT to identify the insomnia patient’s EEG signal from the normal subject by employing a fivefold cross-validation technique. Classification performance is obtained for individual feature f1, f2, f3, and f4 by applying four different classifiers, which are given in Table 2, and Fig. 4 depicts the comparison of different classifiers based on features. In this study, it showed that f1 and f4 had obtained a classification accuracy of 78.3%, and f3 has achieved a classification accuracy of 76.7% less compared to f1 and f4. The accuracy obtained by f2 is low compared to f1, f3, and f4 with a value of 73%. But the individual feature is not providing good accuracy. So, the selected features are combined and applied to four different classifiers to evaluate the classification performance. The selected combined features are f3–f4, f2–f3–f4, f1–f2–f3–f4, and f1–f2 are listed in Table 3. The features f3–f4 and f2–f3–f4 have not shown accuracy improvement compared to the individual feature sets, whereas the features f1–f2–f3–f4 and f1–f2 have achieved an accuracy of the value of 80% and 85%, respectively, as shown in Fig. 5. The study demonstrates that the choosing feature combination for the identification of insomnia is reasonable. It also indicates that the Fp2–F4 EEG channel is suitable and efficient in identifying the EEG signal of insomnia patients. Our outcomes are compared with the previously reported finding on the identification of insomnia, as shown in Table 4 and the classification accuracy of the proposed method is compared with the others based on channel selection as depicted in Fig. 6. Though
130
P. Mamta and S. V. A. V. Prasad
90.00% 80.00% 70.00% Accuracy
60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00%
f1
f2 DT
SVM
f3
f4
features KNN Ensemble
Fig. 4 Classification accuracy based on individual features Table 3 Classification performance of selected feature combination with different classifiers Features
Classifiers
Accuracy (%)
Sensitivity (%)
Specificity (%)
f3–f4
DT
76.7
77
77
SVM
68.3
73
63
KNN
76.7
63
70
Ensemble
75
70
83
DT
73.3
70
66.7
SVM
68.3
66.7
70
KNN
71.7
73.3
70
Ensemble
78.30
80
66.7
DT
80
70
90
SVM
65
43.3
86.7
KNN
78.3
76.7
80
Ensemble
78.3
76.7
80
DT
85
76.7
93.3
SVM
63
43.3
83.3
KNN
81.7
80
83.3
Ensemble
81.7
80
83
f2–f3–f4
f1–f2–f3–f4
f1–f2
Identification of Insomnia Based on Discrete Wavelet …
131
90.00% 80.00% 70.00% Accuracy
60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00% f3-f4 DT
f2-f3-f4 f1-f2-f3-f4 Selected featutes SVM KNN Ensemble
f1-f2
Fig. 5 Classification accuracy based on selected features Table 4 Comparison of study References
Channel
Features
Validation
Classifiers
Accuracy (%)
[13]
C4–A1
Linear features
Leave-one-out
Logistic regression
81
[14]
C3–A2
Nonlinear features
50–50%
SVM
83
Our study
Fp2–F4
DWT—linear and nonlinear features
Fivefold cross-validation
DT
85
classifiers
DT
SVM
LR 79%
80%
81%
82%
Fp2-F4 (proposed study)
83% C3-A2
84% C4-A1
Fig. 6 Comparison of classification accuracy based on channel selection
85%
132
P. Mamta and S. V. A. V. Prasad
the referred studies showed good classification accuracy, two main differences are to be highlighted in our work. Firstly, the channel selection of EEG. Secondly, extracted both linear features from time domain analysis and nonlinear features that represent the nonlinearity of the signal.
5 Conclusion Insomnia is most common in people suffering from depression, stress, and anxiety. In this work, the EEG signal of insomnia and healthy subjects is measured from the Fp2–F4 channel which represents the prefrontal region. The time domain and nonlinear features are computed from detail and approximate coefficients by using DWT. The selected features are combined and fed to four different classifiers in which DT demonstrates the highest classification accuracy of about 85%. Overall, the identification of insomnia using the DWT technique produces good results with feature combination of time domain features and ApEn. The results also suggest that the channel Fp2–F4 is feasible to identify and analyze the insomnia EEG signal. The overall results are much satisfactory and fare well in comparison with others.
References 1. A.S. Gevins, G.M. Zeitlin, C.D. Yingling, et al., EEG patterns during cognitive tasks. I. Methodology and analysis of complex behaviors. Electroencephal. Clin. Neurophys. 47 693–703 (1979) 2. F. Fan, Y. Li, Y. Qiu, Y. Zh., Use of ANN and complexity measures in cognitive EEG discrimination, in 27th IEEE Annual Conference on Engineering in Medicine and Biology, 4638–4641, Shanghai, China (2005) 3. R.R. Rosa, M.H. Bonnet, Reported chronic insomnia is independent of poor sleep as measured by electroencephalography. Psychos. Med. 62(4), 474–482 (2000) 4. H. Adeli, S. Ghosh-Dastidar, N. Dadmehr, A waveletchaos methodology for analysis of EEGs and EEG subbands to detect seizure and epilepsy. IEEE Trans. Biomed. Eng. 54(2), 205–211 (2007) 5. R. Tibodeau, R.S. Jorgensen, S. Kim, Depression, anxiety, and resting frontal EEG asymmetry: a meta-analytic review. J. Abnormal Psychol. 115(4), 715–729 (2006) 6. W.J.H. Nauta, Te problem of the frontal lobe: a reinterpretation. J. Psychiatric Res. 8(3–4), 167–187 (1971) 7. I. Omerhodzic, S. Avdakovic, A. Nuhanovic, K. Dizdarevic, Energy distribution of EEG signals:EEG signal wavelet-neural network classifier. Int. J. Biol. Life Sci. 6(4), 210–215 (2010) 8. M. Kemal Kiymik, M. Akin, A. Subasi, Automatic recognition of alertness level by using wavelet transform and artificial neural network. J. Neurosci. Methods 139, 231–240 (2004) 9. A. Subasi, Automatic recognition of alertness level from EEG by using neural network and wavelet coefficients. Expert Syst. Appl. 701–711 (2005) 10. H. Abdullah, T. Penzel, D. Cvetkovic, Detection of Insomnia from EEG and ECG, in IFMBE Proceedings. 15th International Conference on Biomedical Engineering, vol. 43, pp. 687–690 (2014)
Identification of Insomnia Based on Discrete Wavelet …
133
11. I. Clark, R. Biscay, M. Echeverria, T. Virues, Multiresolution decomposition of nonstationary EEG signals: a preliminary study. Comput. Biol. Med. 25(4), 373–382 (1995) 12. D.P. Subha, P.K. Joseph, U.R. Acharya, C.M. Lim, EEG signal analysis: a survey. J. Med. Syst. 34(2), 195–212 (2010) 13. R. Chaparro-Vargas, B. Ahmed, N. Wessel, T. Penzel, D. Cvetkovic, Insomnia characterization: from hypnogram to graph spectral theory. IEEE Trans. Biomed. Eng. 63(10), 2211–2219 (2016) 14. H. Abdullah, C.R. Patti, C. Dissanyaka, T. Penzel, D. Cvetkovic, Support vector machine classification of EEG nonlinear features for primary insomnia, in Proceedings of the International Conference for Innovation in Biomedical Engineering Life Sciences, pp. 161–164 (2018) 15. K. Chen, C. Zhang, J. Ma, G. Wang, J. Zhang, Sleep staging from single-channel EEG with multi-scale feature and contextual information. Sleep Breath. 23(4), 1159–1167 (2019) 16. S. Fallmann, L. Chen, Computational sleep behavior analysis: a survey. IEEE Access 7, 142,421–142,440 (2019) 17. S. Kaya, C. McCabe, What role does the prefrontal cortex play in the processing of negative and positive stimuli in adolescent depression? Brain Sci. 9, 104 (2019) 18. A.L. Goldberger, L.A.N. Amaral, L. Glass, J.M. Hausdorff, P.Ch. Ivanov, R.G. Mark, J.E. Mietus, G.B. Moody, C.-K. Peng, H.E. Stanley, PhysioBank, PhysioToolkit, and PhysioNet.: components of a new research resource for complex physiologic signals 215–220 (2003) 19. H. Ocak, Automatic detection of epileptic seizures in EEG using discrete wavelet transform and approximate entropy. Expert Syst. Appl. 36(2), 2027–2036 (2009) 20. S.M. Pincus, Approximate entropy as a measure of system complexity. Proc. Natl. Acad. Sci. USA 88, 2297–2301 (1991) 21. C.E. Shannon, A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423 (1948) 22. P. Grassberger, T. Schreiber, C. Schaffrath, Nonlinear time sequence analysis. Int. J. Bifurc. Chaos 1, 521–547 (1991) 23. S. Shakya, Analysis of artificial ıntelligence based ımage classification techniques. J. Innov. Image Process. (JIIP) 2(01), 44–54 (2020) 24. C. Kuo, G. Chen, A short-time Insomnia detection system based on sleep EOG with RCMSE analysis. IEEE Access 8, 69763–69773 (2020) 25. S.F. Liang, Y.H. Shih, P.Y. Chen, C.E. Kuo, Development of a human-computer collaborative sleep scoring system for polysomnography recordings. PLoS ONE 14(7) (2019) 26. O. Faust, P.C.A. Ang, S.D. Puthankattil, P.K. Joseph, Depression diagnosis support system based on eeg signal entropies. J. Mech. Med. Biol. 14(3) (2014) 27. J.S. Wang, C.W. Lin, Y.T.C. Yang, A k-nearest-neighbor classifer with heart rate variability feature-based transformation algorithm for driving stress recognition. Neurocomputing 116, 136–143 (2013) 28. A. Baratloo, M. Hosseini, A. Negida, G. El Ashal, Part 1: simple definition and calculation of accuracy, sensitivity and specificity. Emergency 3(2), 48–49 (2015)
Transfer Learning Techniques for Skin Cancer Classification Mirya Robin, Jisha John, and Aswathy Ravikumar
Abstract Increase in usage of cosmetics, pollution, and radiations will always result in skin-related diseases. Moreover, skin cancer has also become a common disease. There are many features available to help in the process of identification of skin cancer. Deep learning algorithms are successfully applied to identify the skin cancer. Various transfer learning strategies can also be applied. This paper attempts to compare various transfer learning approaches and the degree of accuracy by using these models. The analysis can be made by training the system with the details obtained from images available in the database and the current image is then tested to find whether it is malignant or not. The dataset images obtained from Kaggle dataset have been used for this study. Pre-trained models like InceptionV3, ResNet50, and MobileNet are used along with the extracted pre-trained weights in this research work. Keywords Skin cancer · Deep learning · MobileNet · ResNet50 · Prediction
1 Introduction Skin cancer is considered as a rapidly spreading disease in this era due to the prevailing atmospheric conditions. There are approximately 5.4 million cases in USA alone each year. According to recent reports, during 2008–2018, there has been a 53% increase in new melanoma cases annually. There is also an expected rise in the mortality rate of this disease in the next decade. If the disease is diagnosed in the later stages, the rate of survival is less than 14% but if the skin cancer is detected M. Robin (B) · J. John · A. Ravikumar Department of Computer Science and Engineering, Mar Baselios College of Engineering and Technology, Kerala, India J. John e-mail: [email protected] A. Ravikumar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_12
135
136
M. Robin et al.
at early stages, the rate of survival is approximately 97%. Hence, early detection of skin cancer is lifesaving. Dermatologists usually perform diagnosis manually. It was observed that the skin specialist follow a series of traditional steps for diagnosis which includes the observation with naked eye of suspected lesions, then dermoscopy (magnifying lesions microscopically) and finally conduct the biopsy. The above steps require time and sometimes there are chances patient may move to later stages and the accurate diagnosis is highly dependent on the skill of the clinician. Even the best dermatologist may have only 80% accuracy in correctly diagnosing cancer. The availability of skilled dermatologist is also lacking globally in public healthcare. Hence, in order to diagnose skin cancer in a fast and accurate manner and to solve the above problems, extensive computer image analysis algorithms have been developed. Most of the algorithmic solutions are parametric, and require data to be normally distributed. The data available is diverse and its nature cannot be controlled and this leads to the inefficiency of these methods to diagnose the cancer. But the non-parametric solutions lack the requirement of the data to be in normal distribution form. In this work, various pre-trained deep learning models are used to train and analyze skin cancer. In this paper, we address different pre-trained learning methods for developing deep neural network model for image classification to identify the skin cancer based on skin lesions [1]. A.
Skin Cancer
Melanoma: Melanoma is a malignancy of melanocytes. Melanocytes are special cells in the skin located in its outer epidermis. Melanoma can be observed by human eye as it develops in the epidermis. The harmful effects can be reduced by the early detection and treatment. Excision operation can cure melanoma if detected at an early stage but the main threat faced in the treatment is the increased rate of false negative of malignant melanoma. Figures 1 and 2 [2] show benign and malignant images from the dataset. Female patients usually have melanoma on the lower limbs and male patients have them on the back but it can also be found on other organs containing cells such as the mouth and eye which is very rare [3]. Basal-Cell Carcinoma: There are at least 3 to 4 million cases of basal-cell carcinoma (BCC) in the USA annually. It arises from the deepest layer of the epidermis. They usually look like red patches or open sores. The cases of spread of BCC are very rare but people who have had BCCs are prone to develop it again in their lifetime.
2 Literature Survey There are numerous researches being done for skin cancer based on analysis of images. In 2018, the International Skin Imaging Collaboration (ISIC) hosted a challenge contest in skin cancer detection which became the defacto standard. Mobile apps were also devised which could detect skin cancer. Different classification algorithms have been used to obtain better classification accuracy. Convolutional neural network (CNN) structure was introduced by Fukushima [4] and later Le-Cunn [5]
Transfer Learning Techniques for Skin Cancer Classification
137
Fig. 1 Benign image [2]
Fig. 2 Malignant image [2]
which led to a boom in this analysis. CNNs basically mimic the human visual cognition system and are considered to be the best method for image classification. The pre-trained GoogLeNet, Inception V3, CNN model came from Esteva et al. which was a major breakthrough. They used 129,450 clinical skin cancer images including 3,374 dermatoscopic images [1]. The classification accuracy reported was 85.5%.
138
M. Robin et al.
In 2018, Haenssle et al. [6] utilized a deep convolutional neural network to classify a binary diagnostic category of dermatoscopy melanocytic images, and reported 86.6% sensitivity and specificity for classification. A multiclass classification using ECOC SVM and deep learning CNN was developed by Dorj et al. [7]. The approach was to use ECOC SVM with pre-trained AlexNet Deep Learning CNN and classify multi-class data. An average accuracy of 95.1% is reported in this work [1] (Table 1).
3 Proposed System The proposed system is based on the transfer learning concept for the classification of the skin cancer lesions as malignant or benign. Here, the model is built based on the information obtained from the pre-trained models which are trained on huge datasets and the patterns are saved. The transfer learning helps to make use of the previously learned information. These pre-trained models have been extensively trained on big data sets and they are used widely to solve problems with similar dataset. In this work mainly three pre-trained models ResNet50, MobileNet and InceptionV3 weights for the classification. The dataset used for the work is from Kaggle with half of the skin lesion images are pathology confirmed. The dataset consists of 3000 lesion images, which includes 1799 benign images and 1500 malignant images. Image resizing refers to the scaling of images. Scaling comes handy in many image processing. It helps in reducing the number of pixels from an image and that has several advantages, for example, it can reduce the time of training of a neural network as more is the number of pixels in an image more is the number of input nodes that in turn increases the complexity of the model. It also helps in zooming in images. The image needs to be resized at times as per the standard size requirements. In image processing application like image recognition and classification, the most commonly used neural network is convolutional neural network (CNN). The CNN is mainly used highly demanding fields like self-driving cars, medical diagnosis, robotics, etc. CNN is the supervised ANN trained using the data set which is labeled. CNN is used to automatically learn the relation between classes and the hidden feature relations. The main components are the hidden layers and fully connected layers. The main steps are convolution, pooling, and the final fully connected layer. The proposed model consists of the following steps: The data collection, image resizing, the models are setup and the trained models are used for the classification and finally the testing phase and evaluation of the model as shown in Fig. 3. The transfer learning is highly effective due to the simple way to implement it to the specific problem, the better performance can be obtained with less training time, and the need of labeled data is not relevant in this case and is versatile. Figure 4 shows sample skin lesion benign images from the Kaggle dataset. The Kaggle dataset is split based on the 80–20 rule into training and testing phase. In the
Transfer Learning Techniques for Skin Cancer Classification
139
Table 1 Comparison of existing techniques Method
Advantage
Disadvantage
Image-wise supervised • More capable of segmenting • Lower segmentation learning (ISL) and multi-scale skin lesions of varying accuracy, due to the larger super pixel-based cellular contrast and size super pixels responsible for • Better segmentation automata (MSCA) for more discriminative features performance segmentation of skin lesion [1] that are crucial for lesion • Compared to other methods detection more reliable and accurate • Small part of lesion area is represented by smaller super pixel scales K-mean and PSO technique • The quality of image and for detection of skin cancer [2] increase in detection accuracy is obtained by the making use of the k-mean with PSO • Enhanced detection and quality of the image obtained
• When dataset is large or complex performance of PSO clustering method is found to degrade • Prior knowledge of the initial selection of number of clusters in k-means is required
Skin lesion segmentation using the artificial bee colony algorithm [3]
• Compared to other methods it is faster, simple for implementation, flexible, and have fewer parameter • Higher performance
• Fails to get accurate segmentation • Accuracy is less due to lack of contrast between skin lesion and background
Computer-aided diagnosis of melanoma Skin cancer [4]
• User friendly and robust tool • System includes challenges for automated diagnostics of in input data collection, the skin cancer preprocessing, processing, • This tool is more useful for and system assessment the rural areas where the experts in the medical field may not be available • It increases the diagnosis accuracy as well as the speed
Convolutional neural network (CNN) [5]
• CNNs are considered as the • Reproducibility is best image classification and troublesome due to usage of analysis model non-public datasets for • CNN is incorporated into training and additionally deep learning for improved testing accuracy • Automated feature extraction is provided • The number of steps is drastically reduced by using convolution on patches of adjacent pixels
140
M. Robin et al.
Fig. 3 Proposed method
second step, the image augmentation is done using the zooming, shearing, and flipping operations to increase the dataset. The third step is the loading of the pertained models and in order to avoid overfitting the dropout and normalization (batch normalization) is applied. The neural network is having two dense layers with 64 and 2 neurons, respectively, with softmax activation function on the final layer. The loss function used here is binary cross entropy. The model was trained with 20 epochs by varying the learning rate and batch size.
4 Result and Analysis The transfer learning concept is used here for the classification of the skin. The main advantage of transfer learning is there is always a starting point to begin rather than building the model from the initial scratch. In this method, the previously learned models are leveraged and these models are already trained on a big dataset and the weights are freezed for easy progress. In this work, mainly the three pertained networks are used the MobileNet, Inception v3, and ResNet. The first step was to analyze all the images in the training set and calculate the bottle neck which refers to the previous layer of the last fully connected deep neural network layer.
Transfer Learning Techniques for Skin Cancer Classification
141
Fig. 4 Sample skin lesion benign image
A.
Inception-v3 Model
A 42-layer deep neural net with learning rate 0.0001 and Adam optimizer used for both asymmetric and symmetric basic building blocks which consist of convolutional layers, pooling layers with max pool, average pool functions, dropout layers, and the final fully connected neural network layer is an Inception V3 model. Figure 5 shows the accuracy loss graph and Fig. 6 shows the confusion matrix when the skin lesion dataset was trained and classified by using the Inception V3 model. B.
ResNet Model
Residual network (ResNet) model is a pre-trained neural net used in image and visionbased problems. The ImageNet challenge was won by the ResNet. The ResNet have 50 layers and the input image size is 224 × 224 pixels. This is basically a CNN with multiple layers stacked which helps to analyze the low-, medium-, and high-level features. Figure 7 shows the accuracy and loss in training using Resnet50. Figure 8
142 Fig. 5 Accuracy and loss graph
Fig. 6 Confusion matrix
Fig. 7 Accuracy and loss graph
Fig. 8 Confusion matrix
M. Robin et al.
Transfer Learning Techniques for Skin Cancer Classification
143
shows the confusion matrix when the skin lesion dataset was trained and classified by using the ResNet model. C.
MobileNet Model
MobileNet was proposed by Google and the main integral part of this network is the filter that is used for depthwise separation which helps to implement the pointwise collection. And in this model, it is applied to each input channel separately. Both depthwise and pointwise convolutions are applied to the training dataset and the optimum parameter are considered and the lightweight neural net is developed. MobileNet showed a good learning rate and accuracy on the training dataset and Fig. 9 shows the accuracy and loss in training using MobileNet. Figure 10 shows the confusion matrix when the skin lesion dataset was trained and classified by using the MobileNet model (Table 2). This paper has compared three different models for classification of benign and malignant images in the given skin cancer images. Inception, ResNet, and MobileNet Fig. 9 Accuracy and loss graph
Fig. 10 Confusion matrix
Table 2 Comparison of models Models
Sensitivity Specificity Precision Negative False False Accuracy F1 predicitive positive negative score value rate rate
Inception v3
0.6667
0.4000
0.5263
0.5455
0.6000
0.3333
0.5333
0.5882
MobileNet 0.7333
0.5000
0.5946
0.6522
0.5000
0.5667
0.7177
0.6567
ResNet50
0.0667
0.4815
0.3333
0.9333
0.1333
0.6967
0.6190
0.8667
144
M. Robin et al.
are the convolutional neural networks used for an image classification task. All of the models showed a similar and statistically significant performance. Among the CNN models discussed in this paper, ResNet model achieved the best results on our dataset with a sensitivity of 86.67%, followed by MobileNet with 73.37% and and Inception v3 with 66.67%. On the other hand, MobileNet has achieved 71.77% of accuracy, ResNet has achieved 69.67%, and Inception v3 has achieved 53.33%.
5 Conclusion and Future Scope In this work, the skin cancer was classified based on skin lesions confirmed by pathology. They are trained by using the transfer learning-based models to classify it either as malignant or benign cancer. From the results, it is clear that the output is better when compared to the dermatologist suggestions. The better results can be obtained by fine tuning the network. In this work, the preprocessing method was not applied since the dataset was highly unbalanced but in the future, the preprocessing steps can be included to make the system more accurate. The model can be easily available through Web-based platform or even as API for assisting the dermatologists. In the future, more dermoscopy images can be added to the training dataset to make it better and more efficient. The dataset of different age groups and categories can be included to make the model more diverse in nature. Metadata of the images can be included for prediction to make it even more efficient. In the future, the personalized system can be developed based on the patient’s medical history and other personal information.
References 1. M.A. Kadampur, S. Al Riyaee, Skin cancer detection: applying a deep learning based model driven architecture in the cloud for classifying dermal cell images, in Informatics in Medicine Unlocked (2020) 2. Kaggle Dataset, Skin Cancer: Malignant vs. Benign Processed Skin Cancer pictures of the ISIC Archive 3. J. Lemon, S. Kockara, T. Halic, M. Mete, Density-based parallel skin lesion border detection with webCL. BMC Bioinform. (2015) 4. K. Fukushima, Neocognitron: a hierachical neural network capable of visual pattern recognition. Neural Networks 1(2), 119–130 (1988) 5. Y. LeCun, B.E. Boser, J.S. Denker, D. Henderson, R.E. Howard, W.E. Hubbard, L.D. Jackel, Handwritten digit recognition with a back-propagation network, in Advances in Neural Information Processing Systems, pp. 396–404 (1990) 6. H.A. Haenssle, C. Fink, R. Schneiderbauer, F. Toberer, T. Buhl, A. Blum, A. Kalloo, A. Ben Hadj Hassen, L. Thomas, A. Enk, L. Uhlmann, Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists 7. U.-O. Dorj, K.-K. Lee, J.-Y. Choi, M.J.M.T. Lee, The skin cancer classification using deep convolutional neural networks. Applications, pp. 1–16 (2018)
Transfer Learning Techniques for Skin Cancer Classification
145
8. A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Particle Swarm Optimization Based on Random Walk Rajesh Misra and Kumar Sankar Ray
Abstract Particle swarm optimization (PSO) undergoes several modifications since it was first proposed by J. Kennedy and R. Eberhart in 1995. There are many variants of PSO till date. These variants often confuse researchers about the improvement, applicability and novelty of those variants. As a result, in 2007 Daniel Bratton and James Kennedy defined a standard for particle swarm optimization which is the extension of the original algorithm with latest improvement. This latest improvement of Particle Swarm Optimization often known as SPSO when tested against baseline benchmark functions it fails to achieve required performance in non-separable and asymmetrical functions. Here, a new algorithm has been proposed to modify the canonical PSO algorithm by introducing the concept of random walk and particle reinitialization. Inherent nature of random walk balances the “exploration” and “exploitation” properties in search area and successfully handles the problem of trapping in local and global optima, which most of the algorithm fails to handle. This newly proposed variant clearly outperforms other well-known algorithms which belong to the category where velocity term is eliminated. Also, it achieves better performance when tested with same benchmark function where SPSO fails to perform well. Keywords Random walk · Constrained biased random walk · Canonical PSO · Gaussian distribution
1 Introduction Among the metaheuristic algorithms, PSO is most promising method. Particle swarm optimization is mostly influenced by the concept of the particles’ interaction and their quick movements from one place to another in a large search region. Particle R. Misra (B) · K. S. Ray Electronics and Communication Science Unit, Indian Statistical Institute, Kolkata, India K. S. Ray e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_13
147
148
R. Misra and K. S. Ray
position and velocity are two important components of PSO algorithm. As per social behavior of particles, velocities are changed and the particles get new positions by applying the velocity on current position. In 1998, Y. Shi and R. Eberhart proposed canonical PSO by adding inertia weight to the original PSO [1, 2], Pbest and Gbest are clearly explained in their papers. A very common limitation imposed by almost every researcher is that PSO weather it is canonical or original or some other variants stuck into local optima. Though it is a matter of debate whether researchers use proper topology as it is rightly pointed out by Blackwell and Kennedy in 2015 [3]. Though original PSO has two components velocity and position, in 2003, Kennedy proposed a new idea by eliminating velocity term named as bare bone PSO (BBPSO) [4]. BBPSO becomes much simpler method and is very easy to understand while keeping the same benefit intact with original PSO. Lot of variations over original or canonical PSO are developed over the last three decades, so much that it creates confusion which approaches to use and where. To overcome this critical issue, Bratton and Kennedy proposed a standard for particle swarm optimization, known as SPSO [5]. In 2011, SPSO got major improvement with adaptive random topology and rotational invariance, known as SPSO11 by Clerc [6]. It is recommended that any new PSO variant should be better than existing SPSO11 in terms of experiment on a difficult enough non-biased benchmark. In 2013, Mauricio Zambrano-Bigiarini et al. set a baseline for future PSO improvement by conducting experiment on a complex benchmark test suite [7]. When SPSO11 tested against baseline benchmark functions though it shows very good performance in unimodal and some of the multimodal functions, but for non-separable and asymmetrical functions it fails to achieve required performance. So at this juncture, SPSO11 which is the latest PSO according to the baseline benchmark function performance is employed. There is very less number of research focus on BBPSO than original/canonical (classical PSO). Our main focus is to develop a new algorithm which is “velocity free” and well performed with the latest benchmark defined in [7]. Shakya et al. [8] combine discrete and binary version of PSO by proposing bivelocity PSO algorithm for solving multicast routing problems. Haoxiang et al. [9] use modified ACO method to solve routing protocol for vehicular network. Our approach is to apply random walk in PSO particle movement. First, particle positions are initialized randomly over the search space. Then, rather than calculating velocity of each particle, probability functions between all particles are computed. Our chosen probability function is influenced by the work done by Blanchard and Volchenkov [10] which is based on biased random walks on undirected graphs. Utilizing this probability, a random coin toss is performed to decide which way corresponding particle should move. This random coin toss procedure is the basis of randomization of random walk method. In our algorithm, this random coin toss operation is like, if it is HEAD then “exploit,” if it is TAIL then “explore.” Thus, in both the cases, the benefit of searching the entire search space has been achieved for building more solution region (exploration) and if any particle finds a promising solution, then move toward that solution (exploitation). Most of the well-known algorithm fails to find the balance between “exploration” and “exploitation,” and they trap into local optima and perform poorly, whereas our algorithm handles this
Particle Swarm Optimization Based on Random Walk
149
property efficiently. When an experiment is performed on various multimodal and unimodal functions as suggested in [7], it gives us extremely satisfactory results which not only outperform other variants PSO where velocity is not used and very much comparable to SPSO11 performance. Paper organization is as follows: Simple random walk and biased random walk are discussed in Sect. 2. The proposed random walk PSO algorithm is detailed in Sect. 3. Experimental results are shown in Sect. 4. Section 5 draws conclusion.
2 Random Walk 2.1 Simple Random Walk Karl Pearson first introduced the phrase random walk in 1905 [11]. Here, particle movement is based on random choice and expected progression of the particle is root-mean-square distance after n steps [12]. The simple random walk approach is discussed in [13]. Let us algorithmically explain simple random walk as follows.
In the above procedure, particle movement is uniform where the random process (i.e., the coin tossing) is unbiased. There is also another version of random walk called biased random walk which is explained below.
2.2 Biased Random Walk In simple random walk, movement of particles is influenced by equal probability. After n random steps, particle is close to the starting position. But according to the biased random walk, particle jumps from the current position to the next position with unequal probability. Let us take an example and consider a biased coin tossed and 80% chances coming head and 20% tail. If head indicates left direction movement and tail indicates right direction movement, then it is obvious that particle moves left
150
R. Misra and K. S. Ray
more than right. After n number of steps, particles’ expected position will be more left direction than right. Algorithmically biased random walk [13] is explained as follows.
The above biased random walk procedure is based on an uneven coin tossing that outcomes with either maximum time HEAD or TAIL. So, the particles’ movement will be more left or more right. As a result, the particle will not roam around its starting position. Particles basically move between regions. In every possible step, movement is uniform. Another version of random walk also exists, called constrained biased random walk [13], algorithmically which is as follows.
3 Proposed Random Walk-Based PSO 3.1 General Idea Our proposed approach is based on the idea of performing constrained biased random walk of particles over PSO topology. Let us first discuss the general framework of our approach by building of particle swarm optimization graph. This is discussed
Particle Swarm Optimization Based on Random Walk
151
in Sect. 3.2. Then, probability of all particles is computed; but for that it has been required to calculate attribute values and weight parameters. Attribute value α i is computed for each particle i. Calculation of α i is explained in Sect. 3.3. Next, each particle i computes weight parameter Aij with respect to all other particles j which is discussed in Sect. 3.4. Once Aij is computed for each particle, finally probability function Piαj is computed for every particle connected to all other particles using α i and Aij . Probability function (Piαj ) computation is discussed in Sect. 3.5. Once the probability computation has been done, each particle moves to the direction based on random coin toss operation. By direction it means a specific particle chosen as a target particle (Ptarg ) and our intended particle Pi jumps to the nearest location of Ptarg . The procedure of choosing Ptarg is by utilizing probability (Piαj ) as mentioned earlier. A random number r, like a random coin, is picked up, if r ≤ min (Piαj ) particle. Then randomly, any particle is chosen as possible target particle Ptarg . Otherwise, highest probability particle will be chosen as Ptarg . After Ptarg is chosen, a displacement vector k d is calculated and added to the current position of the particle. Algorithmically, it has been summarized in Algorithm 4.
152
R. Misra and K. S. Ray
Fig. 1 PSO graph with 5 randomly initialized nodes
8
p2(5,5)
6
p1(-2,4)
4 2
-8
-6
-4
2
-2
4
6
p5(-4,-3)
8
p3(8,-1)
-2 -4 -6 -8
p4(4,-6)
3.2 PSO Graph Construction G(P, E) In particle swarm optimization graph G (P, E), each particle is counted as node pi ∈ P and the Euclidean distance from pj to pi is an edge ej that belongs to E. Entire PSO graph construction is discussed in [13]. The weight of each edge of the PSO graph is calculated as Euclidean distance between pi (x i , yi ) and pj (x j , yj ) as follows:
pi (xi , yi ), p j x j , y j =
x j − xi
2
2 + y j − yi
Let us explain this PSO graph construction with a simple example. Imagine there are 5 particles randomly distributed in search domain where 2D is considered for simplicity. Suppose the particles are—P1 (−2, 4), P2 (5, 5), P3 (8,−1), P4 (4,−6), P5 (−4,−3) as shown in Fig. 1. Each particle is connected with other particles by an edge.
3.3 Attribute Computation (α i ) The purpose of computation of attribute values is that it is used in probability computation in later phase. A concept of rank has been used in attribute computation. A detailed discussion of attribute computation is available in [13]. Algorithmically, attribute computation method is as follows.
Particle Swarm Optimization Based on Random Walk
153
There are various different optimization functions. For simplicity, an example where the particles are close to (0,0) coordinate is considered as best fitness achieving particles, as shown in Fig. 2. This fitness consideration is just for an example. As per Fig. 2, p1 achieves maximum rank α = 5; similarly, other particles receive their respective ranks from rank vector rank(5) as there are 5 particles in our example. Fig. 2 Same graph with attribute value α
8
y
p2(5,5) α=3
6
p1(-2,4) α=5
-8
-6
-4
4 2 -2
2
4
α=4 p5(-4,-3)
-4 -6 -8
6
8α=1
p3(8,-1)
-2
α=2 p4(4,-6)
154
R. Misra and K. S. Ray
y
Fig. 3 PSO graph with Aij [13]
5 p1(−2, 4) A11 = 1 α =5
−5
p2(5, 5) A21 = 7.07
α =3
A 31
α=4 A51 = 7.28 p5(−4, 3) −5
x α = 1 5 11.18 = p3(8,−1)
α =2 A41 = 11.66 p4(4, 6)
3.4 Weight Parameter Calculation (Aij ) Weight parameter computation algorithm is provided in Algorithm 6. Detailed discussion is available in [13] (Fig. 3).
3.5 Probability Computation ( Piαj ) Now all the parameters are available for the computation of probability, for each possible path from node pj to pi . This probability guides us where will be the position
Particle Swarm Optimization Based on Random Walk
155
y
Fig. 4 Similar graph as stated above is considered for probability calculation Piαj
p2(5, 5)
5 p1(−2, 4)
A21 = 7.07 P21 = 23%
A11= 1 P11= 5%
P31 = 12% 5
x
A31 = 11.18 p3(8, −1)
−5
P51 = 32% A51= 7.28 p5(−4,−3) −5P = 25% 41
A41 = 11.66 p4(4, −6)
of the particle pj . Probability computation function Piαj from node j to node i is as follows αi Ai j . Piαj = ∞ k Ak j
(1)
Inspiration of using this above function comes from the work done by Blanchard et al. on “Fair and Biased Random Walks on Undirected Graphs and Related Entropies.” This probability function is effectively utilized in various graphs such as network graph and social graph where biased random walk can be applied. One of the notable works in this area is by G´omez-Gardeñes and Latora [14]. Figure 4 shows the graph after calculation of probability function. Once the calculation is done, each path from P1 assigned a probability value which reflects the possibility of selecting that path. The end particle of the path becomes its target particle Ptarg probability calculation of the above graph is as follows P11 =
5 5×1 = = 0.05 1 × 11.18 + 2 × 11.66 + 3 × 7.07 + 4 × 7.28 + 5 × 1 89.83
So, P11 = 5%. Similarly for other 4 nodes, we get 21.21 3 × 7.07 = = 0.23 = 23% 89.83 89.83 11.18 1 × 11.18 = = 0.12 = 12% P31 = 89.83 89.83 2 × 11.66 23.32 P41 = = = 0.25 = 25% 89.83 89.83 P21 =
156
R. Misra and K. S. Ray
P51 =
29.12 4 × 7.28 = = 0.32 = 32% 89.83 89.83
Based on the above calculations, particle p5 gets highest probability and p1 gets lowest probability.
3.6 Target Particle Selection ( Ptarg ) Before going into much detail about ptarg selection, let us first discuss what happens in canonical PSO. Velocity and position in canonical PSO are calculated as follows Vi (t) = w ∗ Vi (t − 1) + C1 ∗ R1 (PL B (t) − X i (t − 1)) + C2 ∗ R2 (PG B (t) − X i (t − 1))
X i (t + 1) = X (t) + Vi (t + 1)
(2) (3)
where V i (t + 1) is the velocity of particle i at time (t + 1) and X i (t + 1) is the position of the particle i at time (t + 1). The inertia weight (w) is introduced by Shi and Eberhart in “Parameter selection in particle swarm optimization.” This variant of PSO is called canonical PSO. Inertia weight w tries to balance the exploration and exploitation capability of the particles. But yet, it fails to find balance in the position so that particle should not be trapped in local/global optima. Now the question is it can get rid of velocity and still find the next position. The answer is given by Kennedy in his work velocity-free bare bone PSO (BBPSO). The next position of the particle in BBPSO is determined by the Gaussian distribution of the particles in the search space. But the Gaussian version is not as good as canonical version. Bare bone PSO suffers from premature convergence or converges to a point which is neither global nor local optimum. This is where our ptarg comes into play. Particle pi probabilistically finds a strong candidate among other particles which gives the best solution at that iteration. It is marked as ptarg and tries to jump toward that direction. The next question is to choose ptarg and why it is appropriate. In our (α i ) calculation, the particle which is nearest to solution got the maximum value. In pij computation, (α i × Aij ) receive maximum values that reflects which particle to select. That is why the particle holding highest probability value became ptarg . ptarg always gives us the local search ability that has more exploitive power. For solving separable function this Exploitive nature help particles to conver towards the solution very fast. According to the random walk concept, a random process like picking a random card, coin toss or rolling dice decides the path of the particle. Similarly, a random number is chosen in our approach to decide the path selection of the moving particle pi . If pi failed to select ptarg , then pi randomly selects any other particle as ptarg and tries to jump to that position. By this approach, pi explores more into the search region and successfully avoids trapping in local optima. This flexibility of random
Particle Swarm Optimization Based on Random Walk
157
y
Fig. 5 Node p5 on the other side of the green color edge is selected as Ptrag
p2(5, 5)
5) p1 (−2, 4
P21 = 23%
P11 =5%
x P31= 12% 5 p3(8, −1)
−5 P51 = 32% p5(−4,−3) = ptarg −5
P41 = 25% p4(4, −6)
walk helps the particle to solve multimodal and complex optimization functions, as these functions need more exploration rather than exploitation. In Fig. 5, the ptarg particle has been chosen.
3.7 New Position Computation ( Pnew ) After ptarg , pi jumps toward that direction; but jump depends on random coin toss. The new position of the particle pi is computed as follows d = pid + k d pnew
(4)
d is the new position of the particle pid and k d is the step count. d is the where pnew number of dimension which is based on application. k d is calculated as follows
P(s, n) = p (n+s)/2 (1 − p)(n−s)/2
n (n + s)/2
(5)
s indicates no. of steps, and n is no. of coin toss. As the probability distribution P(s, n) is no longer an even function of s, it has been expected that the particle will tend to drift toward ptarg at a steady rate of (2p − 1) steps per coin toss. Applying the simplified Stirling formula to the above biased random walk result, it is found that
n
s 1 + s/n 1 − s/n 1+ log + (1 − s/n) log log P(s, n) ≈ − 2 n 2p 2(1 − p) By simplifying the above equation, we get
(6)
158
R. Misra and K. S. Ray 2
P(s, n) =
e−(s−s) )(8 p(1− p)n) (8π p(1 − p)n)1/2
(7)
So finally, k d will be 2
kd =
e−(s−s) )(8 p(1− p))) (8π p(1 − p)(P(s, n)))1/2
(8)
3.8 Particle Reinitialization Though our algorithm perfectly balance local and global optimal search but as a failsafe approach a reinitialization method is incorporated. This method is simple, and a counter is kept to check whether all particles are trapped inside local optima. If no further improvement is seen, the particles are reinitialized. But here the trick is that particles are not reinitialized for the entire search space rather than the search region which is the “so-far-best-achieved” region. As all the particles already search the entire search area, there is no point to begin from the same region all over again. Our experimental results show the enormous benefits from this approach.
4 Experimental Setting 4.1 Benchmark Functions The performance of our random walk PSO (RWPSO) algorithm is benchmarked against a set of functions defined in CEC 2013, named as “Problem definitions and evaluation criteria for the CEC-2013 special session and competition on realparameter optimization” [15]. This set has been chosen because this is the latest benchmark functions for optimization and this set is tested with SPSO11, where author mentions a baseline for future PSO improvements. Three different sets of PSO variants—SPSO11, BBPSO and RWPSO (our proposed algorithm)—are considered for test comparison.
Particle Swarm Optimization Based on Random Walk
159
4
SPSO11 BBPSO RWPSO
Log 10( Error )
2
0
-2
-4 0
1,000
2,000
3,000
4,000
5,000
iteration Fig. 6 Sphere function
SPSO11 BBPSO RWPSO
Log 10( Error )
4
2
0
-2
-4 0
1,000
2,000
3,000
4,000
5,000
iteration Fig. 7 Rotated discus function
Five functions are considered from [15]. They are “sphere function”and “rotated discus function” from unimodal function set, “rotated Rosenbrock’s function” and “Rastrigin’s function” from basic multimodal function set and “composition function 1 (n = 5, Rotated)” from composition function set. For detailed definitions of all these functions, please follow [15]. In our work, population size is maintained as 40. All these experimental details and experimental settings are taken from [15].
160
R. Misra and K. S. Ray
In Figs. 6, 7, 8, 9 and 10, 5 graphs are shown for 5 different optimization functions. Vertical axis of those graphs represents error, and horizontal axis represents iteration. It should be noted that this error parameter is based on Log10. Basically, this graph represents the function error values [f (x) − f (x*)] with respect to no. of iterations. For example, in sphere function graph (Fig. 6), SPSO11 algorithm takes more than 5000 iterations to get error value 0 and BBPSO takes lot more iterations than SPSO11 but still cannot reach error value 0, whereas RWPSO takes approximately 4500 iterations to reach 0 error values. These graphs help identify which algorithm 6 SPSO11 BBPSO RWPSO
Log 10( Error )
4 2 0 -2 -4 -6 0
1,000
2,000
3,000
4,000
5,000
iteration Fig. 8 Rotated Rosenbrock’s function 6 SPSO11 BBPSO RWPSO
Log 10( Error )
4 2
0 -2 -4 -6 0
1,000
2,000
3,000
iteration Fig. 9 Rastrigin’s function
4,000
5,000
Particle Swarm Optimization Based on Random Walk
161 SPSO11 BBPSO RWPSO
Log 10( Error )
10
5
0
-5
10 0
1,000
2,000
3,000
4,000
5,000
iteration Fig. 10 Composition function 1 (n = 5, Rotated)
convergence is faster than others. By observing those 5 graphs, it is concluded that our proposed algorithm RWPSO is way better than BBPSO and very much competitive with SPSO11 in terms of convergence. In Table 1, all 5 test function results are summarized. In first row, sphere function runs with 10, 20 and 30 dimensions. Three algorithms show their mean and standard deviation values for each dimension after 51 runs. If it has been observed closely, it is seen that RWPSO clearly outperforms BBPSO and is very close to SPSO11. In consecutive rows, other results of the functions have been shown.
5 Conclusion This paper uses random walk approach with random toss coin operation and a particle reinitialization method, which prove the strength of the algorithm RWPSO of finding solution region in much faster way. Not only that, this algorithm proves its capability to reach optimal solution in various difficult functions. Here, the results of 5 optimization functions are presented from 3 categories. More functions are tested from [2], which will be reported elsewhere.
Test function
Sphere function
Rotated discus function
Rotated Rosenbrock’s function
Rastrigin’s function
Composition function 1 (n = 5, Rotated)
Function category
Unimodal functions
Unimodal functions
Multimodal functions
Multimodal functions
Composition function
6.19E+05 ± 1.23E+02 9.41E+05 ± 5.28E+06 5.20E+03 ± 2.67E+03 5.39E+05 ± 3.43E+04 6.27E+06 ± 3.23E+06 −14.32E+03 ± 5.67E+01 −15.16E+04 ± 5.13E+02 −15.23E+04 ± 5.23E+06 −6.62E+01 ± 7.24E+01 −6.162E+02 ± 5.13E+01 −8.23E+03 ± 5.73E+04 12.73E+01 ± 2.66E+02 −8.342E+02 ± 2.56E+01 12.23E+03 ± 5.73E+04
−1.400E+03 ± 1.875E−13 −1.400E+03 ± 3.183E−13 −1.100E+03 ± 4.556E+03 −1.100E+03 ± 6.702E+03 −1.100E+03 ± 8.717E+03 −9.000E+02 ± 4.974E+00 −9.000E+02 ± 2.825E+01 −9.000E+02 ± 2.405E+01 −4.000E+02 ± 5.658E+00 −4.000E+02 ± 2.740E+01 −4.000E+02 ± 4.183E+01 7.000E+02 ± 3.042E+02 7.000E+02 ± 6.796E+01 7.000E+02 ± 0.000E+00
30 50 10 30 50 10 30 50 10 30 50 10 30 50
7.82E+01 ± 1.89E+00
−8.13E+01 ± 5.25E+01
8.332E+02 ± 1.342E+00
−4.42E+02 ± 3.89E+01
−4.931E+01 ± 2.25E+01
−4.125E+02 ± 4.213E+00
−10.284E+03 ± 3.66E+04
−10.931E+03 ± 3.25E+05
−10.521E+03 ± 4.114E+01
−1.633E+03 ± 5.26E+05
−1.630E+03 ± 3.78E+05
−1.630E+03 ± 4.122E+03
−1.400E+03 ± 2.78E−08
−1.400E+03 ± 0.78E−15
−1.400E+03 ± 0.00E+00
Mean ± Std
Mean ± Std 5.20E+03 ± 2.67E+03
RWPSO
BBPSO [4]
−1.400E+03±0.00E+00
Mean ± Std
SPSO11 [6]
10
Dimension
Table 1 Summary statistics for the 10, 30 and 50 dimensional case in fifty-one times experiments
162 R. Misra and K. S. Ray
Particle Swarm Optimization Based on Random Walk
163
References 1. Y. Shi, R. Eberhart, A modified particle swarm optimizer, in 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360). Anchorage, AK, USA, pp. 69–73, (1998). https://doi.org/10. 1109/ICEC.1998.699146;P. May, H.C. Ehrlich, T. Steinke, ZIB structure prediction pipeline: composing a complex biological workflow through web services, in Euro-Par 2006. LNCS, vol. 4128, ed. by W.E. Nagel, W.V. Walter, W. Lehner (Springer, Heidelberg, 2006), pp. 1148–1158 2. J. Kennedy, R. Eberhart, Particle swarm optimization, Proceedings of ICNN’95 - International Conference on Neural Networks, vol. 4. Perth, WA, Australia, 1995, pp. 1942–1948. https:// doi.org/10.1109/ICNN.1995.488968 3. T. Blackwell, J. Kennedy, Impact of communication topology in particle swarm optimization. IEEE Trans. Evol. Comput. 23(4), 689–702 (2019). https://doi.org/10.1109/TEVC.2018.288 0894 4. J. Kennedy, Bare bones particle swarms, Proceedings of the 2003 IEEE Swarm Intelligence Symposium. SIS’03 (Cat. No.03EX706), Indianapolis, IN, USA, 2003, pp. 80–87. https://doi. org/10.1109/SIS.2003.1202251 5. D. Bratton, J. Kennedy, Defining a standard for particle swarm optimization, in 2007 IEEE Swarm Intelligence Symposium, Honolulu, HI, 2007, pp. 120–127. https://doi.org/10.1109/SIS. 2007.368035 6. M. Clerc, Standard particle swarm optimisation (2012). ffhal-00764996f 7. M. Zambrano-Bigiarini, M. Clerc, R. Rojas, Standard particle swarm optimisation 2011 at CEC-2013: a baseline for future PSO improvements, in 2013 IEEE Congress on Evolutionary Computation, Cancun, 2013, pp. 2337–2344. https://doi.org/10.1109/CEC.2013.6557848 8. S. Shakya, L.N. Pulchowk, A novel bi-velocity particle swarm optimization scheme for multicast routing problem. IRO J. Sustain. Wireless Syst. 02, 50–58 (2020) 9. W. Haoxiang, S. Smys, Qos enhanced routing protocols for vehicular network using soft computing technique. J. Soft Comput. Paradigm (JSCP) 1(2), 91–102 (2019) 10. P. Blanchard, D. Volchenkov, Fair and biased random walks on undirected graphs and related entropies, in Towards an Information Theory of Complex Networks, ed. by M. Dehmer, F. Emmert-Streib, A. Mehler (2011) 11. K. Pearson, The problem of the random walk. Nature, p. 294 (1905). https://doi.org/10.1038/ 072294b0-1905 12. E.W. Weisstein, Random walk–1-dimensional. https://mathworld.wolfram.com/Random Walk1-Dimensional.html (2002) 13. R. Misra, K.S. Ray, A modification of particle swarm optimization using random walk. arXiv:1711.10401v2[cs.AI] (2017), https://arxiv.org/abs/1711.10401v2 14. J. Gómez-Gardeñes, V. Latora, Entropy rate of diffusion processes on complex networks. Phys. Rev. E pp. 2339–2348 (2008) https://doi.org/10.1103/PhysRevE.78.065102 15. J.J. Liang, B.-Y. Qu, P.N. Suganthan, A.G. Hernández-Diaz, Problem definitions and evaluation criteria for the CEC 2013 special session and competition on real-parameter optimization, Technical Report 201212, Computational Intelligence Laboratory. Zhengzhou University, Zhengzhou China and Technical Report, Nanyang Technological University, Singapore, Tech. Rep., Jan 2013, https://www.ntu.edu.sg/home/EPNSugan/indexfiles/CEC2013/CEC2013.htm. Last accessed 12 Feb 2013
Signal Processing Algorithms Based on Evolutionary Optimization Techniques in the BCI: A Review Ravichander Janapati, Vishwas Dalal, N. Govardhan, and Rakesh Sengupta
Abstract Brain–computer interfaces (BCIs) collect, analyze and transform brain signals into commands that are linked to desirable tasks involved by target equipment. The feature extraction process is used to deploy an alternate solution interpretation of signal obtained, which made way a collection of BCI actions more effectively. Pre-processing phase involving re-referencing of electrodes, deterioration, normalization, size reduction and removal of artifacts, etc. is often used before feature extraction. The classification of features was examined in this paper, and evolutionary technique is applied to BCI. The implications of the paper can be helpful for academicians, researchers and scientists in this domain to quickly understand the previous work in this field. These algorithms can be used in various applications such as the classification of motor imagery tasks, filter banks, etc. Keywords Brain–computer interface · Feature extraction · Evolutionary technique · Motor imagery · Classification · Signal processing algorithms
1 Introduction The most rapidly growing theme in neuroimaging is brain–computer interface (BCI) [1]. Their purpose is to translate for different purposes, including in automation, communications and entertainment, biological signals derived from different areas of the brain [2, 3]. It has the most common use of amyotrophic lateral sclerosis (ALS), the impairment of leg and so on for the recovery of people affected by paralysis [4, 5].The basic module for the BCI includes: signal pretreatment, extraction and R. Janapati (B) ECE Department, S R Engineering College, Warangal, India e-mail: [email protected] V. Dalal · R. Sengupta Cognitive Science Department, S R Engineering College, Warangal, India N. Govardhan ECE Department, SRIT(W), Warangal, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_14
165
166
R. Janapati et al.
Fig. 1 BCI-based Web interface navigation
classification. The following steps are taken. The classified results produce the control signals needed to drive an aid. High dimensionalism of features and the collection of specific features are the key focus of BCI study, so they can be discriminated against as strongly as possible [6]. It is also found that the exactness of the classifier is significantly reduced due to having a large number of irrelevant data in the form factor. Another module along with the after the module, based on feature selection, is introduced throughout this context. PCA [7], SVD [8] and ICA [9] are widely used feature selection techniques. MI EEG signals detection with SVM and PSO. In traditional classifiers, the nuclear specifications are typically selected as per the verifiable evidence, overlooking the significance of refining the function to enhance the output of a classifier. A PSO algorithm has been applied to boost this by choosing the best kernel and limitation variables and thereby enhancing classification accuracy. Combination of PSO and SVM used to diagnose and other sleep disorders [10] and to refine the SVM for EMG signals classification [11]. Figure 1 shows the BCI interface.
2 Literature Survey To survey the literature regarding EEG-based BCIs, we explore IEEE, Web of Science, Scopus, Science direct and Google scholar. Among the many articles that met our search criteria, we first excluded those that had nothing to do with BCI research. The articles were screened, based on title, abstract and number of citations to only include studies involving evolutionary-based BCI for signal processing
Signal Processing Algorithms Based on Evolutionary …
167
applications. This screening exercise has yielded 21 articles as listed in Table 1, which presents the diverse evolutionary-based BCI for signal processing applications. Initially, it is investigated from reputed journals for the last decade, i.e., 2010– 2020, by considering parameters such as algorithms used, signal processing function and applications. Table 1 shows the evolutionary algorithms used in BCI signal processing applications.
3 Literature Review and Discussion This study discusses the revised version of the M-PSO PSO algorithm in order to improve the EEG and ECoG signal feature collection. Four BIAs and function extraction are included in the study to search for relevant features providing information on BCI systems. This technique was intended to increase the precision of classification. Therefore, a current optimal solution was developed that uses the SF to control parameters and to determine reliable metrics for the best aspects. Designers introduced the SVM classification and the proposed algorithm to compare our outcomes. For all aspects, the majority of the EEG or ECoG datasets exceed the algorithm. Our findings show that it is possible to use another proposed algorithm to pick fewer electrodes that are substantially helpful in building a BCI system [12]. Based on these characteristics, the gain and bandwidth of the filter are built and adjusted while integrating harmonic SSVEP responses as well. Not only does this technique improve accuracy, but it also increases the number of commands obtainable by facilitating the use of stimulus frequencies to evoke poor SSVEP responses. The findings indicate the ability of bio-inspired architecture to be expanded to include additional SSVEP features (e.g., time-domain waveform) for future BCIs based on SSVEP [13]. This article provides a novel technique based on a clustering algorithm for feature selection. The proposed strategy is validated on a dataset using power spectral density as the function and artificial bee colony as the clustering algorithm. The results thus obtained have validated our argument that when the dataset is condensed to partly of its unique size, enclose typically the pertinent features, an increase in accuracy is observed. Around the same time, the computational complexity was also reduced [14]. Selecting the EEG channels used to establish the training predictor has an effect on the output of the classifier. Current findings on the actual dataset of both metaheuristic techniques when the classifier is a Bayesian predictor. We equate those outputs significantly with a random set of EEG networks. This method greatly improves the precision of the training indicator, per the empirical findings [15] with bio-inspired optimization for feature selection and classifier enhancement to boost the precision rate of the MI-BCI, and a new CSP\AM-BA-SVM approach is suggested. Optimal various time selections for each topic are added to the proposed method. The parameters are taken from EEG signal with the popular spatial pattern (CSP). By using a one-vs-one approach, binary CSP is extended to multi-class issues. This method presents the application of the bat optimization algorithm (BA) and the hybrid
168
R. Janapati et al.
Table 1 Evolutionary-based algorithms used in BCI Reference No.
Algorithm
Signal processing function
Application
[12]
Ant colony optimization (ACO), genetic algorithm (GA), cuckoo search algorithm (CSA) and modified particle swarm optimization (M-PSO)
Feature selection
SVM classifier
[13]
Artificial bee colony (ABC) cluster algorithm
Feature selection for Classification motor imagery EEG data
[14]
Bio-inspired filter banks (BIFB)
Frequency detection method
Filter
[15]
Genetic algorithm and simulating annealing
Choose the input features of a classifier
classifier
[16]
Hybrid asttractor metagene (AM) algorithm along with the bat optimization algorithm (BA)
Choose the majority discriminant CSP features and optimize SVM parameters
Classifier optimization
[17]
Bio-inspired filter banks (BIFBs)
Feature extraction stage
Filter banks
[7]
A modified genetic Classification algorithm (GA) wrapped around a support vector machine (SVM) classifier
Classification
[13]
Artificial bee colony (ABC) cluster algorithm
Reduce the features
Classification
[18]
Hybrid GA-PSO-based K-means clustering technique
Feature selection, blind source separation
Classification of two class motor imagery tasks
[19]
Quantum behaved particle swarm optimization
Feature extraction
Classification
[20]
And multilevel hybrid PSO-Bayesian linear discriminant analysis
Channel selection and feature selection
Classification
[21]
CPSO-based TWSVM classifier combined with CSP
Classification
Classification of MI electroencephalography (EEG)
[22]
PSO by linear discriminant analysis
Selecting optimal channels
Classification accuracy
[23]
Classic algorithm and the Reduce the huge space culling algorithm of features extracted from raw electroencephalography
Feature selection
(continued)
Signal Processing Algorithms Based on Evolutionary …
169
Table 1 (continued) Reference No.
Algorithm
Signal processing function
Application
[24]
Modified particle swarm optimization (PSO)
Feature selection
Classification accuracy
[25]
PSO-RBFN
Feature selection
Classifier
[26]
Incremental quantum particle swarm optimization (IQPSO) algorithm
Classification
Classification
[27]
Particle swarm optimization (PSO)-based neural network (NN)
Projected the features into a neural network
IoT applications
[28]
Ring topology-based particle swarm optimization (RTPSO) algorithm
Feature extraction
Classifiers
[29]
Choquet integral along with PSO algorithm
Classifier
For MI recognition
attractor metagene (AM) algorithm to pick the most discriminating CSP attributes and optimize SVM specifications [16]. This article developed a new SSVEP detection framework that includes advantage of SSVEPs’ implicit genetic features. In the extracting features phase, the BIFBs collect frequency specificity, topic specificity and harmonic SSVEP responses and boost class separability. The proposed approach is tested on two online available test sets and exceeds many known detection algorithm. The BIFBs particularly show potential in the high frequency band, wherever SNR is small. This approach therefore not only improves the ITR of an SSVEP-based BCI, but may also boost comfort conditions owing to little visual exhaustion. The outcome shows the potential of bio-inspired design, as well as further SSVEP features will be extended to include the results [17]. Blind source isolation of the EEG signals in specific contributes to improved classification precision leading to spectral energy conversion. In addition, the use of specific features subsets instead of the full set of features can be helpful even to advanced classifiers like a support vector machine. When searching for functional subsets exacerbates the risk that the classifier will fit into the tests used to train the BCI, a range of different methods exist to minimize the risk and can be evaluated during relevant feature searches. The selection of features is showing promise inquiry for signal processing in BCIs, because the subject-specific features could be used off-line to obtain optimal online output [7]. This research aims to enhance classification accuracy and reduces the redundant and irrelevant features of a dataset. In order to minimize attributes and achieve their subsequent precision, they implement artificial bee colony (ABC) cluster optimizations [13]. K-means clustering methods based on hybrid GA-PSO was employed to
170
R. Janapati et al.
differentiate between two groups of motor imaging (MI) activities. The proposed hybrid GA-PSO clustering centered on the segmentation of the K-means cluster has been developed to perform better the genetic algorithm (GA) and particulate swarm optimization (PPSO)-centered k-media, which implies the grouping of methods, in terms of accuracy and execution time. That performance appraisal on TFRs is derived, and the definition of event-related synchronization (ERD) and ERD (ERD) is established [7]. This develop a new evolved optimization algorithms for optimizing the discovery channels, incorporating popular spatial patterns for the collection of functions and supporting the identification of vector machines for the optimum quantum behaved particle swarm. Experimental findings demonstrated that, relative to the common spatial pattern method with maximum channels in raw datasets, the new binary quantum-completed particle swarm optimization method surpassed all other three main spatial pattern methodologies: the substantial decrease in the error classification and the number of channels. A motor imagery-based brain–computer system can be enhanced by the current technique considerably [13]. Propose an effective channel and selection, channel and feature-based signal processing architecture focused on particle swarm optimization (PSO). For the optimization and classification of characteristics, modified Stockwell transforms have been used, and multimodal PSOBayesian Linear discriminatory analyses implemented. The findings demonstrate that perhaps the channel selection system will speed up convergence to the optimal global level and minimize time training. As the proposed framework can boost classification efficiency, decrease the number of features effectively and significantly reduce test time, it can act as a guidance for relevant BCI application system analysis in real time [18]. In this report, we suggested a new CPSO-based classification system for TWSVM in conjunction with CSP feature extraction technique for the identification of MI events. In order to encourage the signal–noise ratio, the adaptive reduction method has been adopted. For multi-channel EEG signal generation, CSP has been used. For classification outcome of the MI BCI system, the specification of the TWSVM variable is very critical. In addition, the classification TWSVM surpassed the fastest total length of the CPU. In CPSO SVM, the analysis showed a slight change in the PSO TWSVM system again. An extensive BCI database analysis revealed that perhaps the CPSO TWSVM classification performs superior overall exposed to different machine learning approaches, which have been commonly use for established MI recognition research [19]. Authors propose the enhancement of the CSP PSO optimization technique. The whole research investigates the selection by the linear discriminant analysis of optimal channels amongst all channels and compares distinction correctness among CSP and CSP with PSO. This article suggests an optimum channel selection approach by BPSO rather than using all channels. Researchers used a smaller number of channels with greater precision as demonstrated above with the proposed procedure. In addition, developers are changing the fitness function because every time the calculation takes place the same system enhancement configuration [20]. The GA for functional choice in the BCI domain has been established with smelting entities, so
Signal Processing Algorithms Based on Evolutionary …
171
it performs function removal in an unregulated way. This mode is extremely significant in the BCI domain because readjustment (i.e., to be carried out after a few thousand activities) is not controlled. Even then, BCI systems are not limited to the realm of potential optimization implementations. In reality, the optimization can be used in any investigation in which it is appropriate choose from a much bigger main number of characteristics a limited number of genes [21]. In order to categorize topic emotional states based on EEG signals researchers presented an enhanced algorithm and merged such feature selection to develop an online brain–computer (BCI) online emotion detection application. In particular, various functional attributes have been identified from the time domain, from the frequency domain and from the time frequency field. Instead, a multi-level linearly decreasing inertia weight (MLDW) modified particle swarm optimization (PSO) approach for selecting features was used. The MLDW method can also be used to optimize the inertia weight reduction method effectively. After this, the forms of emotions are listed by the vector cluster [22]. This study addresses an analysis of the implementations of different classification algorithms including the selected classification PSO-RBFN. Three well-known clustering algorithms are comparable to one another by their ability to minimize the loss function, to measure the effects of clustering algorithms on RBFN classification results. Better clusters acquired with the PSO algorithm can now be inferred by helping RBFN enhance its classification efficiency. In relation to two other classifiers, the PSO-RBFN can meet or exceed FFSVC and IFFSVC for most datasets. Whenever the PSO classification is applied to the two other classification models. the extendability and the easy implementation and learning of a PSO-RBFN classifier to deal with a supervised learning problem allow it desirable for just a PSO-RBFN classifier in near real time for an EEG classification application [23]. In this research introduced using the incremental IQPSO method for the gradual classification of the EEG data stream for the very first stage. IQPSO constructs an effective classifier as a set of laws, based on conceptual symbolic depiction of experience and increased comprehension. The suggested algorithm profits concurrently from gradual training and quantum-oriented improvements. Compared IQPSO ‘s output with ten other classifiers that were used on two EEG datasets for BCI systems. The findings indicate that even in addition to performing linear classification, specificity and retrieval IQPSO is superior to other classificators. In addition, it indicates appropriate online learning time consumption [24]. It is proposed an advanced neural network (NN), which provides a clear—headed among BCI devices and IoT, based upon particle swarm optimization (PSO) to resolve above well-mentioned issues. Researchers built a system architecture in the tests, which collected specific EEG data using the particle swarm optimization and then transformed the attributes into a neural network training system. The experimental results showed that the PSO-based NN methodology is feasible for the precision of 98.9% in the classification of the engine images (MI) activities [25]. This article suggests an R3PSO methodology with two classifier approaches for measuring performance that will be utilized in an EEG motor imaging classification to determine parameters. In organize to evaluate the effectiveness of the study
172
R. Janapati et al.
classification models, FFNN, SS-SVM, SMO-SVM and QP-SVM are being used as classifieds and as a tenfold CV and holdout process. The process of extraction of the function uses certain wavelet coefficient statistical techniques [26]. Throughout this research, they establish an advanced and efficient BCI framework for the purposes of MI detection that uses SBCSP, LDA and the fuse method. In addition, they achieved the maximum AUC while using quadratic additive with PSO as a nuclear fission level prediction model. By using a portable EEG collector, Mindo 4S, and the based algorithms, functional BCIs could be built in real-world situations [27]. The results of three mental task classifications indicate greater FPSOCM-ANN precision in multiclass (SVM, LDA and linear perceptron) classification systems in both classes and time windows. The findings of these three mental tasks are unique. While the classification performance of clinicians is decreased, this is enhanced by the time periods, the better precision at a window period of 7 s [28].
4 Conclusion This paper has examined some of the more recent studies that employ EA-based methods to enhance the analysis of EEG signal. The paper has concentrated on the observations such as (1) dimensional reduction by feature or electrode reduction, (2) promote classifier learning process and (3) benefits of EA-based methods in traditional approach assembly. Such a hybrid system will take advantage of features based on EA to optimize the ensemble’s required parameters and boost the overall performance. To the best of our knowledge, in EEG and BCI research, such a device has not been used yet. This paper offers an investigation of the EA-based methods such as decomposition, filtering process to minimize the sizes of the function or electrode collection. This is a multi-objective topic in which it is beneficial to minimize the size of sets of features or electrodes while preserving the precision of the classification.
References 1. R. Abiri et al., A comprehensive review of EEG-based brain–computer interface paradigms. J. Neural Eng. 16(1), 011001 (2019) 2. S. Bhattacharyya et al.,Interval type-2 fuzzy logic based multiclass ANFIS algorithm for realtime EEG based movement control of a robot arm. Rob. Autonomous Syst. 68, 104–115 (2015) 3. K. Takahashi, T. Nakauke, M. Hashimoto, Remarks on hands-free manipulation system using bio-potential signals from simple brain-computer interface, in 2006 IEEE International Conference on Systems, Man and Cybernetics, vol. 2 (IEEE, New York, 2006) 4. V. Suma, Computer vision for human-machine interaction-review. J. Trends Comput. Sci. Smart Technol. (TCSST) 1(02), 131–139 (2019) 5. V. Bindhu, An enhanced safety system for auto mode E-vehicles through mind wave feedback. J. Inform. Technol. 2(03), 144–150 (2020) 6. J. Atkinson, D. Campos, Improving BCI-based emotion recognition by combining EEG feature selection and kernel classifiers. Expert Syst. Appl. 47, 35–41 (2016)
Signal Processing Algorithms Based on Evolutionary …
173
7. D.A. Peterson et al., Feature selection and blind source separation in an EEG-based braincomputer interface. EURASIP J. Adv. Sign. Process. 19, 218613 (2005) 8. A.F. Cabrera, D. Farina, K. Dremstrup, Comparison of feature selection and classification methods for a brain–computer interface driven by non-motor imagery. Med. Biol. Eng. Comput. 48(2), 123–132 (2010) 9. K. Li et al., Single trial independent component analysis for P300 BCI system, in 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society (IEEE, New York, 2009) 10. Y. Maali, A. Al-Jumaily, A novel partially connected cooperative parallel PSO-SVM algorithm: study based on sleep Apnea detection, in Proceedings of the IEEE Congress on Evolutionary Computation (CEC ’12) (IEEE, Brisbane, Australia, June 2012), pp. 1–8 11. A. Subasi, Classification of EMG signals using PSO optimized SVM for diagnosis of neuromuscular disorders. Comput. Biol. Med. 43(5), 576–586 (2013) 12. O.P. Idowu, P. Fang, G. Li, Bio-inspired algorithms for optimal feature selection in motor imagery-based brain-computer interface, 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC) (IEEE, New York, 2020) 13. P. Rakshit et al., Artificial bee colony based feature selection for motor imagery EEG data, in Proceedings of Seventh International Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA 2012) (Springer, India, 2013) 14. A.F. Demir, H. Arslan, I. Uysal, Bio-inspired filter banks for SSVEP-based brain-computer interfaces, in 2016 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI) (IEEE, New York, 2016) 15. S. Basterrech et al., Nature-inspired algorithms for selecting EEG sources for motor imagery based BCI, in International Conference on Artificial Intelligence and Soft Computing (Springer, Cham, 2015) 16. S. Selim et al., A CSP\AM-BA-SVM approach for motor imagery BCI system. IEEE Access 6, 49192–49208 (2018) 17. A.F. Demir, H. Arslan, I. Uysal, Bio-inspired filter banks for frequency recognition of SSVEPbased brain–computer interfaces. IEEE Access 7, 160295–160303 (2019) 18. P. Tiwari, S. Ghosh, R.K. Sinha, Classification of two class motor imagery tasks using hybrid GA-PSO based-means clustering. Comput. Intell. Neurosci. 2015 (2015) 19. L. Zhang, Q. Wei, Channel selection in motor imaginary-based brain-computer interfaces: a particle swarm optimization algorithm. J. Integrat. Neurosci. 18(2), 141–152 (2019) 20. Y. Qi et al., Channel and feature selection for a motor imagery-based BCI system using multilevel particle swarm optimization. Comput. Intell. Neurosci. 2020 (2020) 21. L. Duan et al., Recognition of motor imagery tasks for BCI using CSP and chaotic PSO twin SVM. J. China Univer. Posts Telecommun. 24(3), 83–90 (2017) 22. J.-Y. Kim et al., A binary PSO-based optimal EEG channel selection method for a motor imagery based BCI system, in International Conference on Hybrid Information Technology (Springer, Berlin, Heidelberg, 2012) 23. I. Rejer, Genetic algorithms for feature selection for brain–computer interface. Int. J. Pattern Recognit. Artif. Intell. 29(05), 1559008 (2015) 24. Z. Li et al., Enhancing BCI-based emotion recognition using an improved particle swarm optimization for feature selection. Sensors 20(11), 3028 (2020) 25. E. Cinar, F. Sahin, New classification techniques for electroencephalogram (EEG) signals and a real-time EEG control of a robot. Neural Comput. Appl. 22(1), 29–39 (2013) 26. K. Hassani, W.-S. Lee, An incremental framework for classification of EEG signals using quantum particle swarm optimization, in 2014 IEEE International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA) (IEEE, New York, 2014) 27. O.P. Idowu et al., Efficient classification of motor imagery using particle swarm optimizationbased neural network for IoT applications, in 2020 IEEE International Workshop on Metrology for Industry 4.0 & IoT (IEEE, New York, 2020)
174
R. Janapati et al.
28. H. Mirvaziri, Z.S. Mobarakeh, Improvement of EEG-based motor imagery classification using ring topology-based particle swarm optimization. Biomed. Sign. Process. Control 32, 69–75 (2017) 29. T.-Y. Hsieh et al., Developing a novel multi-fusion brain-computer interface (BCI) system with particle swarm optimization for motor imagery task, in 2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) (IEEE, New York, 2015) 30. S.-L. Wu et al., Fuzzy integral with particle swarm optimization for a motor-imagery-based brain–computer interface. IEEE Trans. Fuzzy Syst. 25(1), 21–28 (2016)
Cancelation of 50 and 60 Hz Power-Line Interference from Electrocardiogram Using Square-Root Cubature Kalman Filter Roshan M. Bodile and T. V. K. Hanumantha Rao
Abstract Heart is the most vital organ in the human body, generating a systematic time-varying signal due to its electrical activity is called as an electrocardiogram (ECG). However, the acquisition process of ECG adds sufficient amount of unwanted artifacts in the clean ECG. Out of these artifacts, power-line interference (PLI) commonly corrupts the ECG signal. Therefore, in this paper, the dynamical model joint with a square-root cubature Kalman filter (SR-CKF) is proposed to remove 50 and 60 Hz PLI from ECG. The SR-CKF is tested on an arrhythmia database with an input signal to noise ratio (SNR) of −10 to 10 dB. Results obtained after denoising show that the SR-CKF performs better in terms of keeping original content or diagnostic information of the ECG, less mean square error (MSE), and achieving higher SNR output compared to the discrete wavelet transform (DWT) and notch filter. Keywords Electrocardiogram · Dynamical model · SR-CKF · Discrete wavelet transform · Power-line interference · Notch filter
1 Introduction An electrical activity produced at atria is followed by ventricles that generate PQRST waves, and these sequences of waves are vital for clinical purposes. The heart activity can be fast, slow, abnormal, or normal, and hence, the ECG signal is more important clinically. The detection of any abnormalities in rhythm or heart rate at an early stage can save a person’s life. The heart signal is weak, and during acquisition, noise contamination like electromyogram, motion noise, baseline wander, powerline interference (PLI), etc. masks the clean ECG signal. Out of these artifacts, the
R. M. Bodile (B) · T. V. K. Hanumantha Rao Department of ECE, National Institute of Technology, Warangal, India T. V. K. Hanumantha Rao e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_15
175
176
R. M. Bodile and T. V. K. Hanumantha Rao
PLI commonly corrupts the ECG signal. So, to extract the original morphology of the ECG, the elimination of PLI present in the ECG recordings is essential. An overview of the literature that relates to the work is presented here. In early work, the digital filtering methodology [1] was used to remove PLI by designing a bandpass filter with a cut-off frequency of 50 Hz. Similarly, to eliminate PLI, the use of fewer taps in finite impulse response (FIR) filter [2, 3] is suggested. The FIR technique presented claims that it is more effective than a traditional FIR filter for attenuating PLI. The notch filter [4, 5] is another classical method for removing PLI from noisy ECG. However, the design of high quality factor is challenging and also susceptible to the artifact and can quickly become stable to unstable. Hence, the adaptive notch filters are desirable, which can adapt according to selective frequency and noise or signal contents. In the literature, various adaptive notch filters are suggested that can manipulate the quality factor adaptively. Apart from these methodologies, the techniques based on the decomposition [6–9] such as DWT, packet wavelet, etc. are famous for removing 60 Hz PLI. However, these methods are not suitable for high noise contamination situations. Apart from these techniques, the DWT with adaptive dual thresholding is also suggested by researchers [10]. The gradient-based adaptive filters with and without references [11–13] are used to eliminate PLI from the ECG signal. These adaptive filters require some samples to follow the original morphology of the ECG after starting transient. According to the electricity standard, the variation in the PLI frequency is trivial, and hence, the PLI variation in ECG processing is also negligible. Therefore, the methods based on tracking and cancelation of PLI using Kalman filter, smoother, and extended Kalman filters [14–19] are found more flexible and stable. The Kalman filter with the dynamical model is more adaptable to eliminate either 60 or 50 Hz PLI and also its harmonics in a boisterous environment. This paper has both 60 and 50 Hz PLI frequency that corrupts the ECG signal in range −10 to 10 dB input SNR. The dynamical model in [14] is more flexible and does not require any prior assumptions, and it merely depends on the quality factor, which is the ratio of noise covariances. Therefore, this model is combined with the SR-CKF [20], which is a numerically, more stable method. This work is also focused on high noise contamination that masks the ambulatory arrhythmia ECG recordings. The remainder of the paper is organized as follows: In Sect. 2, the dynamical model is briefly described and the proposed method with time update and measurement steps. Data details, quantitative assessment, and 50 and 60 Hz filtering results are provided in Sect. 3. The last section gives the overall conclusion of the work.
2 Proposed Method 2.1 Dynamical Model The single-tone PLI noise contamination can be assumed as some sinusoids with random phase and amplitude:
Cancelation of 50 and 60 Hz Power-Line …
177
xn = A cos(2π ∗ n ∗ f 0 f s + φ)
(1)
where A, n, f 0 , f s , and φ are amplitude, time index, PLI noise, sampling frequency, and phase, respectively. After adding the model error parameter (ηn ) and some trigonometric manipulation [14] in Eq. (1), it is expressed as: xn+1 + xn−1 = 2 ∗ cos(2π ∗ f 0 f s )xn + ηn .
(2)
It is seen that PLI noise does not have any rapid fluctuations in phase and amplitude. However, the addition ηn makes the model more flexible; hence, it is desired to add ηn in the SR-CKF approach. The clean ECG corrupted by PLI noise can be assumed as a concoction of PLI, clean ECG, and other unwanted noise or signal; that is, yn = wn + xn
(3)
where quantities xn and wn represent PLI and zero mean arbitrary term, indicating all signals and noises apart from PLI. However, it neglects the fact that wn it can contain biosignals and other noises, as this work is considering only 50 and 60 Hz PLI. The tracking of PLI noise is possible when Eqs. (2) and (3) are converted into the state-space form [14] to apply the SR-CKF approach, and it is given as:
xn+1 = C xn + dηn yn = a T xn + wn
(4)
xn 1 1 2 cos(2 ∗ π ∗ f 0 f s ) −1 , xn = where C = ,d = , and a = . 1 0 0 0 xn−1 The above model in Eq. (4) is now ready to apply on noisy ECG using the SR-CKF approach. After tracking PLI, the filtered ECG signal can be obtained by merely subtracting this tracked PLI from noisy ECG. The block diagram of the ECG denoising scheme is shown in Fig. 1.
2.2 Square-Root Cubature Kalman Filter Let us assume multi-dimensional integral [20] of the following form I( f ) =
T f (x) ∗ e(−x x ) dx
(5)
Rn
where f (x) and R n be arbitrary function and domain of integration. Integral in Eq. (5) is hard to calculate in a general form. Therefore, using the cubature rule, it is
178
R. M. Bodile and T. V. K. Hanumantha Rao
SR-CKF
PLI
Noisy ECG
Denoised ECG
Fig. 1 ECG denoising scheme
approximated into points and corresponding weights [21] I( f ) =
m
ωj f ζj
(6)
j=1
where ω j and ζ j are the jth cubature points and their corresponding weights. Here, the third-degree cubature rule is considered, and hence, total support points are equal to (2*state dimension). Figure 2 shows the cubature points and their corresponding weights for the two-dimension system. After this approximation and calculation, the steps required for the SR-CKF [20–22] are:
Fig. 2 Cubature points and their corresponding weights
Cancelation of 50 and 60 Hz Power-Line …
179
Time update • Compute the cubature points (CP). X j (k − 1|k − 1) = S(k − 1|k − 1)J ( j) + x(k − 1|k − 1),
(7)
j = 1, 2, . . . , 2∗ state dimension. √ n[1] j , j = 1, 2, ..., state dimension where √ n[1] j−n , j = 1, 2, ..., 2 ∗ state dimension • After calculation of CP, evaluate the propagated CP state dimension.
X j· (k − 1|k − 1) = g k − 1, X j (k − 1|k − 1) , j = 1, 2, . . . , 2 ∗ state dimension (8) • Let Tri (·) is triangularization algorithm, now compute the prior state and its square-root of the covariance matrix: 1 j· ∗ X (k − 1|k − 1), m = 2∗ state dimension, m j=1 m
x(k|k − 1) =
S(k|k − 1) = T ri χ · (k|k − 1), S Q (k − 1) , ·
where χ (k|k − 1) =
√1 m
X1· (k|k − 1) − x(k|k − 1)... Xm· (k|k − 1) − x(k|k − 1)
(9) (10)
, j = 1, . . . , m.
Measurement • Estimate CP. X j (k − 1|k − 1) = S(k − 1|k − 1)J ( j) + x(k − 1|k − 1), j = 1, . . . , m.
(11)
• After calculation of CP, evaluate the propagated CP.
X j·· (k − 1|k − 1) = γ k, X j (k − 1|k − 1) , j = 1, 2, . . . , 2 ∗ state dimension (12) • Now, compute the prior measurement and its square-root of the covariance matrix: 1 j·· X (k − 1|k − 1), m = 2∗, state dimension, ∗ m j=1 m
z(k|k − 1) =
(13)
180
R. M. Bodile and T. V. K. Hanumantha Rao
Szz (k|k − 1) = T ri([Z (k|k − 1), S R (k − 1)]), where Z (k|k − 1) =
√1 m
(14)
X1·· (k|k − 1) − z(k|k − 1)... , j = 1, . . . , m., Xm· (k|k − 1) − z(k|k − 1)
• Calculate the cross-covariance matrix. Sx z (k|k − 1) = X (k|k − 1)Z T (k|k − 1), where X (k|k − 1) =
√1 m
X1· (k|k − 1) − x(k|k − 1)... Xm· (k|k − 1) − x(k|k − 1)
(15)
, j = 1, . . . , m.,
• Now, estimate the Kalman gain.
T G(k) = Sx z (k|k − 1)/Szz (k|k − 1) /Szz (k|k − 1).
(16)
• Finally, calculate the posterior state and its square-root of the covariance matrix. x(k|k) = x(k|k − 1) + G(k)(z(k) − z(k|k − 1)), S(k|k − 1) = T ri([X (k|k − 1) − G(k)Z (k|k − 1), G(k)S R (k − 1)])
(17) (18)
3 Results and Discussion This section summarizes and discusses the main findings of the work. The performance of the proposed approach is tested on publicly available the MIT-BIH arrhythmia database [23], and ECG records are considered as −101, 103, and 217, respectively. The sampling frequency of these ECGs is 360 Hz, and 3600 samples are assessed for filtering and comparing purposes. The PLI noises present in the ECG can either 50 or 60 Hz as it varies from country to country. Therefore, this work considered both 50 and 60 Hz PLI noise. The performance of the SR-CKF scheme is measured using output SNR [24], L Output SNR = L l=1
and MSE
l=1
s 2 (l)
{s(l) − sˆ (l)}2
,
(19)
Cancelation of 50 and 60 Hz Power-Line …
MSEl =
L 2 1
s(l) − sˆ (l) L l=1
181
(20)
where quantities s(l) and sˆ (l) are clean ECG and filtered ECG, respectively. Here, the results of the proposed method are compared with a notch filter (infinite impulse response) and DWT methods. The notch filter and DWT-based techniques are very competitive while removing PLI under noisy conditions. Hence, the SR-CKF framework is compared with the DWT and notch filter techniques. The proposed work and comparative methods simulations are performed in MATLAB.
3.1 60 Hz PLI Denoising Results The filtering results are compared quantitatively using output SNR and qualitatively using reconstructed signal after denoising. A visual representation of this denoising result can be seen here in Fig. 3. Figure 3a is a clean ECG; after adding 60 Hz PLI noise to ECG record 217 at −10 dB input SNR, the noisy signal is depicted in Fig. 3b. All the filtering results are displayed in Fig. 3c–e. The DWT method at low SNR clearly shows the presence of 60 Hz PLI noise, even though DWT is a powerful denoising tool. On the other hand, the second comparative method is the notch filter removes the 60 Hz PLI noise from the noisy ECG. From Fig. 3d, it is also observed that there is a reduction in R-peaks amplitude for given samples. The filtering result produced by the proposed method is depicted in Fig. 3e. It is observed from Fig. 3e that the proposed method is removed 60 Hz PLI noise from ECG, and also it followed the morphology of the ECG signal very closely. Results obtained after denoising show that this method performs better in keeping original content or diagnostic information of the ECG compared to the DWT and notch filter. To further examine this result, output SNR values for each method were analyzed quantitatively. From Table 1, it is observed that the comparative methods yield less output SNR compared to the proposed approach. Quantitatively, the SR-CKF yields average output SNR of 23.67–23.80 dB, the notch filter provides average output SNR of 15.79–15.81 dB, and the DWT gives average output SNR of 12.60–22.69 dB. Moreover, the MSE performance also indicates that the SR-CKF achieved a minuscule difference between the estimated and clean ECG signal and shown in Fig. 4.
3.2 50 Hz PLI Denoising Results In the previous section, 60 Hz PLI noise removal results are shown. Here, it considers the removal of 50 Hz noise from noisy ECG. Figure 5a is a clean ECG; after adding 50 Hz PLI noise to ECG record 103 at 0 dB input SNR, the noisy signal is depicted in Fig. 5b. Based on Fig. 5 and Table 2 results, the following observations are made. The
182
R. M. Bodile and T. V. K. Hanumantha Rao
Fig. 3 60 Hz removal denoising results for ECG 217 at −10 dB. a Clean ECG, b noisy ECG, c DWT, d notch filter, and e proposed
Cancelation of 50 and 60 Hz Power-Line …
183
Fig. 3 (continued)
DWT method in Fig. 5c at 0 dB SNR clearly shows the presence of 50 Hz PLI noise; though the fact that PLI is mild still it can alter the diagnostic information in the ECG. On the other hand, the notch filter method nearly removes the 50 Hz PLI noise from the noisy ECG. From Fig. 5d, it is also observed that there is a reduction in R-peaks amplitude for given ECG record 103. We can see from Fig. 5e that the SR-CKF is removed 60 Hz PLI noise from ECG, and also it followed the morphology of the ECG signal very closely. Likewise, results obtained after denoising show that the SRCKF method performs better in keeping original content or diagnostic information of the ECG compared to the DWT and notch filter. Finally, quantitative results are compared in terms of the output SNR. The SR-CKF technique exhibits excellent performance in almost all cases. Quantitatively, the SR-CKF yields average output SNR of 21.85–21.86 dB, the notch filter provides average output SNR of 14.62– 14.64 dB, and the DWT gives average output SNR of 10.19–22.06 dB. Moreover, the MSE performance also indicates that the SR-CKF achieved a minuscule difference between the estimated and clean ECG signal and shown in Fig. 6.
184
R. M. Bodile and T. V. K. Hanumantha Rao
Table 1 Output SNR for ECG signals 101, 103, and 217 Output SNR Input SNR
101
103
217
Avg.
−10
12.64
11.30
13.87
12.60
−5
15.63
15.35
18.92
16.64
0
19.09
18.36
21.98
19.81
5
20.57
20.15
22.67
21.13
10
22.48
21.99
23.60
22.69
−10
16.14
12.90
18.32
15.79
−5
16.15
12.91
18.33
15.80
0
16.16
12.91
18.33
15.80
5
16.17
12.92
18.33
15.81
10
16.17
12.92
18.33
15.81
−10
23.10
23.74
24.17
23.67
−5
23.27
23.76
24.21
23.75
0
23.34
23.77
24.23
23.78
5
23.37
23.77
24.24
23.79
10
23.39
23.76
24.25
23.80
DWT
Notch filter
Proposed
8.0x10-2 7.0x10-2
DWT Notch filter Proposed
6.0x10-2
MSE
5.0x10-2 4.0x10-2 3.0x10-2 2.0x10-2 1.0x10-2 0.0 101
103
217
ECG data
Fig. 4 Average MSE for different ECGs (in case of 60 Hz PLI removal condition)
Cancelation of 50 and 60 Hz Power-Line …
185
Fig. 5 50 Hz removal denoising results for ECG 103 at 0 dB. a Clean ECG, b noisy ECG, c DWT, d notch filter, and e proposed
186
R. M. Bodile and T. V. K. Hanumantha Rao
Fig. 5 (continued)
4 Conclusion In this paper, the dynamical model joint with the SR-CKF is proposed for eliminating 50 and 60 Hz PLI from ECG. The dynamical model is based on PLI data (50 Hz or 60 Hz), and the SR-CKF tracks PLI to remove from noisy ECG to preserve clinical information available in the ECG signal. The SR-CKF is tested on an arrhythmia database (ECG records—101, 103, and 217) with an input SNR of -10 to 10 dB. Results obtained after denoising show that the SR-CKF performs better in terms of keeping original content or diagnostic information of the ECG, minimum MSE, and achieving higher output SNR (for 50 Hz 21.85–21.86 dB and for 60 Hz 23.67– 23.80 dB) compared to the DWT and notch filter.
Cancelation of 50 and 60 Hz Power-Line …
187
Table 2 Output SNR for ECG signals 101, 103, and 217 Output SNR Input SNR
101
103
217
Avg
−10
5.08
12.58
12.92
10.19
−5
10.60
16.97
15.37
14.31
0
13.46
20.75
17.16
17.12
5
18.70
21.30
20.37
20.12
10
21.71
23.18
21.29
22.06
−10
14.94
11.62
17.29
14.62
−5
14.96
11.63
17.30
14.63
0
14.97
11.63
17.30
14.63
5
14.97
11.64
17.30
14.64
10
14.97
11.64
17.30
14.64
−10
21.53
22.06
21.95
21.85
−5
21.60
22.10
21.90
21.87
0
21.62
22.12
21.87
21.87
5
21.62
22.13
21.85
21.87
10
21.62
22.13
21.84
21.86
DWT
Notch filter
Proposed
1.0x10-2
DWT Notch filter Proposed
MSE
8.0x10-3
6.0x10-3
4.0x10-3
2.0x10-3
0.0 101
103
217
ECG data
Fig. 6 Average MSE for different ECGs (in case of 50 Hz PLI removal condition)
188
R. M. Bodile and T. V. K. Hanumantha Rao
References 1. M. Kunt, H. Rey, A. Ligtenberg, Preprocessing of electrocardiograms by digital techniques. Sign. Process 4, 215–222 (1982) 2. J.A. Van Alste, T.S. Schilder, Removal of base-line wander and power-line interference from the ECG by an efficient FIR filter with a reduced number of taps. IEEE Trans. Biomed. Eng. 12, 1052–1060 (1985) 3. R. Warlar, C. Eswaran, Integer coefficient bandpass filter for the simultaneous removal of baseline wander, 50 and 100 Hz interference from the ECG. Med. Biol. Eng. Comput. 29, 333–336 (1991) 4. P. Tichavsky, A. Nehorai, Comparative study of four adaptive frequency trackers. IEEE Trans. Sign. Process. 45, 1473–1484 (1997) 5. M. Sedlacek, J. Blaska, Low uncertainty power-line frequency estimation for distorted and noisy harmonic signals. Measurement 35, 97–107 (2004) 6. S. Poornachandra, N. Kumaravel, A novel method for the elimination of power line frequency in ECG signal using hyper shrinkage function. Digit. Sign. Proc. 18, 116–126 (2008) 7. Z. German-Sallo, ECG signal baseline wander removal using wavelet analysis, in International Conference on Advancements of Medicine and Health Care through Technology (Springer, Berlin, Heidelberg, 2011), pp. 190–193 8. B. El, O. Latif, R.K. Elmansouri et al., ECG signal performance de-noising assessment based on threshold tuning of dual tree wavelet transform. Biomed. Eng. Online 16, 1–18 (2017) 9. D.L. Donoho, De-noising by soft thresholding. IEEE Trans. Inf. Theory 41, 613–627 (1995) 10. W. Jenkal, R. Latif, A.D. Ahmed Toumanari, O. El B’charri, F.M.R. Maoulainine, An efficient algorithm of ECG signal denoising using the adaptive dual threshold filter and the discrete wavelet transform. Biocybern. Biomed. Eng. 36, 499–508 (2016) 11. N. Razzaq, S.A.A. Sheikh, M. Salman, T. Zaidi, An intelligent adaptive filter for elimination of power line interference from high resolution electrocardiogram. IEEE Access 4, 1676–1688 (2016) 12. Q. Wang, X. Gu, J. Lin, Adaptive notch filter design under multiple identical bandwidths. AEU—Int. J. Electron. Commun. 82, 202–210 (2017) 13. J. Lin, X. Sun, J. Wu, S.C. Chan, W. Xu, Removal of power line interference in EEG signals with spike noise based on robust adaptive filter, in IEEE Region 10 Conference, TENCON; Singapore, pp. 2707–2710 (2016) 14. R. Sameni, A linear Kalman notch filter for power-line ınterference cancellation, in The 16th CSI International Symposium on Artificial Intelligence and Signal Processing, pp. 604–610 (2012) 15. P. Dash, R. Jena, G. Panda, A. Routray, An extended complex Kalman filter for frequency measurement of distorted signals. IEEE Trans. Instrum. Meas. 49, 746–753 (2000) 16. A. Routray, A. Pradhan, K. Rao, A novel Kalman filter for frequency estimation of distorted signals in power systems. IEEE Trans. Instrum. Meas. 51, 469–479 (2002) 17. L.D. Avendano-Valencia, L.E. Avenda, J. Ferrero, C.G. Castellanos-Dominguez, Improvement of an extended kalman filter power line interference suppressor for ECG signals, in Computers in Cardiology, pp. 553–556 (2007) 18. L.D. Avendano-Valencia et al., Reduction of power line interference on ecg signals using Kalman filtering and delta operator (2007) 19. G. Warmerdam, R. Vullings, L. Schmitt, J. Van Laar, J. Bergmans, A fixed-lag Kalman smoother to filter power line interference in electrocardiogram recordings. IEEE Trans. Biomed. Eng. 64, 1852–1861 (2017) 20. I. Arasaratnam, S. Haykin, Cubature Kalman filters. IEEE Trans. Autom. Control 54, 1254– 1268 (2009) 21. D. Jianmin, S. Hui, L. Dan, H. Yu, Square root cubature Kalman filter-Kalman filter algorithm for ıntelligent vehicle position estimate. Proc. Eng. 137, 267–276 (2016) 22. L. Xi, Q. Hua, Z. Jihong, Y. Pengcheng, Maximum correntropy square-root cubature Kalman filter with application to SINS/GPS integrated systems. ISA Trans. 80, 195–202 (2018)
Cancelation of 50 and 60 Hz Power-Line …
189
23. A.L. Goldberger, L.A.N. Amaral, L. Glass et al., PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101 215– 220 (2000) 24. K. Kærgaard, S.H. Jensen, S. Puthusserypady, A comprehensive performance analysis of EEMD-BLMS and DWT-NN hybrid algorithms for ECG denoising. Biomed. Signal Process. Control 25, 178–187 (2016)
A Comprehensive Study on the Arithmetic Operations in DNA Computing V. Sudha and K. S. Easwarakumar
Abstract The computer has become a part of human life, and fast computation is vital. Addition, subtraction, multiplication, and division are the fundamental mathematical operations. Most of the arithmetic operations performed on a computer are realized using these basic operations. Among the four, addition and subtraction operations form the basis, as the other two operations can be realized by using these procedures. This paper focuses on the algorithms proposed for implementing addition and subtraction operations in DNA computing. Keywords DNA computing · Arithmetic operations · Representation · Reusability · Biological operations
1 Introduction Computing is the process of performing certain operations with the help of a computer. DNA computing is a form of molecular computing, replacing silicon with DNA. DNA computing makes DNA the computational medium, providing solutions for problems that cannot be solved through normal architecture. Adleman [1] initiated DNA computing by solving an instance of the Hamiltonian path problem (HPP). He got this idea by mapping the finite control in the turing machine to the polymerase enzyme. The polymerase enzyme could read a base and find its complement which is much like the finite control that reads a character and replaces it with some other character. With this exciting thought, he tried solving an instance of the HPP, and succeeded in it. This opened the path for DNA computing. Adleman felt that the Watson–Crick pairing [2], polymerase, ligases, nucleases, gel-electrophoresis, and
V. Sudha (B) Kumaraguru College of Technology, Coimbatore, India e-mail: [email protected] K. S. Easwarakumar Anna University, Chennai, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_16
191
192
V. Sudha and K. S. Easwarakumar
DNA synthesis [3] are the essential tools for computing a problem in DNA. Adleman used linear self-assembly for solving the problem. In mathematics, all the advanced concepts are built upon the basic arithmetic operations. The arithmetic operations deal with the numbers of all types. It includes the basic operations such as addition, subtraction, multiplication, and division. The computer has become a part of human life, and fast computation is vital. Addition, subtraction, multiplication, and division are the fundamental mathematical operations. Most of the arithmetic operations performed on a computer are realized using these basic operations. Among the four, addition and subtraction operations form the basis, as the other two operations can be realized by using these procedures. The success of a computing model depends on the implementation of the basic operations. In any model of computing, arithmetic operations are the basic operations that need to be implemented efficiently. When arithmetic operations need to be performed on given operands, these operands must be represented in a form acceptable by the respective systems. Hence, the following parameters must be taken into consideration for proposing an algorithm for performing arithmetic operations. They are representation of the operands, logic used for performing the operation, name of the biological operations used for realizing the above logic, etc. This paper discusses various existing algorithms proposed for performing basic arithmetic operations.
2 Existing System DNA computing is an emerging type of molecular computing. Hence, it is important to find an efficient algorithm to address this problem. In DNA computing, a problem is solved by developing an algorithm and proved either experimentally or theoretically. There are many algorithms proposed in the literature for realizing these operations. Also, there are few insertion–deletion systems [4] for the same. In this section, we discuss the methodology, pros, and cons of these algorithms.
2.1 Number Representation In DNA computing, the pieces of information must be represented in terms of four base values: Adenine (A), Cytosine (C), Guanine (G), and Thymine (T ). Hence, to implement any operation in DNA computing, the inputs must be encoded using DNA bases. The different representations used in the literature are discussed in this subsection. Guarnieri et al. [5] proposed a procedure for the addition of two-bit numbers. A bit can take two possible values: 0 and 1. Procedures are proposed to represent these two possible values. Following this, the encoding of a binary number is the encoding of each bit value in the number.
A Comprehensive Study on the Arithmetic Operations …
193
Gupta et al. [6] proposed the logic for performing arithmetic and logic operations. In the proposed method, the DNA strands are designed in such a way that the output of one operation can be the input of another operation. As operations such as NAND, NOR, and OR are basic for all arithmetic operations, it is implemented first. These gates take two inputs and produce one output. The first number is known as input, while the second is known as the operand. To perform the above-stated operation in DNA computing, the bit values 0s and 1s are represented as nucleotides. An instance of the encoding is given below for reference. input strand: 1-AU; 0-UA operand strand: 1: AT or TP; 0: TA or PT i.e. 0 and 1 in the operand strand are given two possible representations. Frank Qiu and Mi [7] used insertion, deletion, and substitution operations for performing Boolean operations. To apply these operations on the input, it must first be encoded using DNA bases. It is known that, for any Boolean operation, two operands are passed as an input. Among the two operands, at least one of the operands must be encoded using DNA bases. Let X = X n X n−1 , …, X1 and Y = Y n Y n−1 , …, Y 1 be the two operands. The operand X is represented in DNA form using the following notation. L n − X n − Rn − L n−1 − X n−1 − Rn−1 − ... − L 1 − X 1 − R1 where L i and Ri represent the left and right locators for the bit X i . Barua and Misra [8] proposed the following methodology for representing a binary number in DNA computing. Given a binary number, all positions with bit value 1 are initially collected. This results in a collection of some decimal numbers. Next, each of the decimal numbers in the collection is given a unique DNA representation. Thus, a binary number is represented by a set of DNA strands in DNA computing. Fujiwara et al. [9] used DNA operations such as merge, copy, detect, separator, cleavage, annealing, and denaturation for performing logic and arithmetic operations with DNA strands. They encoded each binary number using the following representation: X = A0 , A1 , . . . , An−1 , B0 , B1 , . . . , Bm−1 , C0 , C1 , D0 , D1 , 1, 0, ], A0 , A1 , . . . , An−1 , B0 , B1 , . . . , Bm−1 , C0 , C1 , D0 , D1 , 1, 0, ] In the above representation, A0 , A1 , …, An−1 represents addresses of binary numbers, B0 , B1 , …, Bm−1 represents the bit value, and C 0 , C 1 and D0 , D1 are the specified symbols cut by a cleavage operation. For instance, if the address and bit position of a bit are i and j, then the corresponding single-strand representation for the above is Si, j = D1 Ai B j C0 C1 V D0
194
V. Sudha and K. S. Easwarakumar
where the value of V ∈ {0, 1}. Brun [10] represented numbers as a set of tiles. To add two binary numbers, tiles are designed for all possible combinations. For instance, when a bit of a binary number needs to be added, then the tile can be designed for the combinations (0,0), (0,1), (1,0) and (1,1). The tiles are designed in such a way that there exists a unique matching for each tile, and also, the number of the matching side should be equal to or greater than τ, where τ represents the temperature in the tiling system. A similar procedure is used for implementing the multiplication operation. Wang and Huang [11] used two types of DNA strands for number representation— one for representing the given input decimal number, and the other for the output strand. The decimal number, say n, is represented by a single DNA strand containing n segments. For instance, the decimal number 5 can be represented by the DNA strand containing five fragments namely 1, 2, 3, 4, and 5; however, the output strand is designed with log2n fragments. A molecular beacon is used for implementing the multiplication operation. Both the face and place value of the numbers are important for performing a valid arithmetic operation. Sudha and Easwarakumar [12] used both values in their representation. In this representation, given number of any base value, it is directly converted into a string made up of DNA bases. This representation enables us to perform arithmetic operations easily.
2.2 Operation Implementation Frank Qiu and Mi [7] implemented Boolean operations for binary numbers by designing a DNA strand for each bit present in the input. The designed DNA strand includes a complementary strand that contains the result. For instance, for performing the AND operation, the strand for the bit Xi is designed as shown below, L j − X i − X i ∧Yi − Ri In the above representation, the bit Xi can take the value either 0 or 1. Depending upon the value it takes, the DNA strand looks as shown below, 1. 2.
X i = 0 : L i − 0 − 0 ∧ Yi − R i X j = 0 : L j − 1 − 1 ∧ Yj − Rj
When the strands in the above manner are placed in a test tube, annealing takes place, and which results in a strand with a looping structure. Since a strand with a looping structure is the required one, the remaining strands are removed from the test tube, which gives the necessary result. Like the above AND operation, other Boolean and binary operations can be implemented. To add two numbers, say a and b, Fujiwara et al. [9] applied the following operations over the strands designed using the above representation.
A Comprehensive Study on the Arithmetic Operations …
1. 2. 3. 4.
195
For each j(0 ≤ j ≤ m − 1), compute x j = aj ⊕ bj, and yj = aj ∧ bj . For each j(0 ≤ j ≤ m − 1), compute pj = x j ∧ yj . For each j(1 ≤ j ≤ m − 1), set cj = 1 if yi − 1 = 1 or there exists k( T H = 1
(5)
Step v: Repeat Step iii and Step iv, until the pixels of the whole image is checked with threshold (TH) value. (a)
K-means
K-means algorithm [14] is an iterative method that is used to panel an image into clusters and assigns data point to cluster whose center is adjacent. The center refers the average data points in the group. K-means algorithm performs faster and runs on large datasets. Algorithm: Let D = {d 1 , d 2 , d 3 ,…, d n } be the set of data points and C = {c1 , c2 , …, cc } be the set of centers. Step i: Select one cluster center randomly. Step ii: Compute the distance between the data point and cluster center. Step iii: Assign the value to the cluster center which has a minimum distance from the data point. Step iii: Select another cluster center. where, ‘ci ’ represents the data points in each cluster. Step v: Compute the distance between the data point and cluster center. Step vi: Do the step iv until all the data points are assigned with cluster. (c)
Fuzzy-C-Means
The FCM algorithm assigns membership to data point on the basis of distance between the cluster center and the data point. More the data is neighboring, more is its membership directed for the particular cluster center. Evidently, summation of membership of data point equals ‘1’. The membership and cluster centers are updated with the formula [15]. It gives better results than K-means algorithm. Algorithm: Let F = {f 1 , f 2 , f 3, …, f n } which refers the set of data points and C = {c1 , c2 , c3 , …, cc } refers the set of centers. Step i: Choose a random ‘c’ cluster center. Step ii: Calculate the fuzzy membership μab using: μab
= 1 c
(2lm − 1)
k=1 (dab /dac )
(7)
204
Shalini and Sasikala
Step iii: Calculate the fuzzy centers ‘cb’ using: Cb =
n
n m (μab ) xa / (μab ) , ∀b=1,2,...,c m
a=1
(8)
a=1
Step iv: Repeat step ii and iii until the minimum ‘R’ value is attained. (or) (t+) U − U (t) < β
(9)
where ‘k’ denotes the iteration step. ‘β’ denotes the termination condition between [0, 1]. ‘U’ = (μab )n*c ’ denotes the fuzzy membership matrix. ‘R’ denotes the objective function.
4 Results and Discussion The dataset for this segmentation method was obtained from online repository called DIARETDB1. The database contains 95 input images for the experiment with the resolution of 93 × 71 in 24 bit depth in which, 5 are unaffected by NPDR and 90 are NPDR affected. And these retinal input images are rescaled with 512 × 512 dimensions as depicted in Fig. 2 to maintain uniformity. This resizing helps in proper visibility of images on screens of different devices. The hard exudate is the bright lesion formed by the eye disease called DR. This type of lesion exists in the first type of DR disease called NPDR. It is the sign of leaking blood vessels which drop out pale, fatty deposits on the retina which is termed as protein lipids. Segmentation of exudates is an essential factor in DR diagnosis to obstruct the disease severity. This segmentation process undergoes binarization which needs green channel extraction that gives good results for bright pixels so bright lesion detection becomes easier followed by, the unwanted noise is removed by applying the median filter. Then CLA histogram equalization method makes the regions of hard exudate more projective. Fig. 2 Fundus input image, Resized image
Fuzzy C-means for Diabetic Retinopathy Lesion Segmentation Segmentation
Preprocessing S.No
Resize-d Image
Green Chan-nel
Medi-an Filter
205
Kmea-ns
Kmeans + Binary Thresholding
FCM
FCM + Binary Thresh-olding
1.
2.
3.
Fig. 3 Exudates segmentation
Further, the hard exudate is segmented using two different segmentation algorithms as shown in Fig. 3. First algorithm is called K-means where the set of data points that are similar in an image are clustered. The grouping is done by assigning each data point whose center is the nearest; the result is purely determined by the initial random assignments of data points and then binary thresholding is done, which segments the hard exudate lesion from the grouped image depending on the actual value. Unlike conventional thresholding algorithms, K-means decreases intra-cluster variance but the algorithm has local minimum and yields dissimilar results in altered executions. The second algorithm is called fuzzy C-means which also decreases intra-cluster variance and has local minimum. Unlike K-means and other segmentation algorithms, in FCM, the results depend on the initial choice of weights so the algorithm yields same results in altered executions. So among these two methods, the FCM method has given better results than the K-means method and achieved accuracy of 95.05%.
5 Performance Analysis The evaluation of the proposed segmentation techniques is done using certain validation measures like mean-squared error (mse), structural similarity index (ssi), sensitivity (Sn) and specificity (Sp) and accuracy (Acc) are calculated. Mean-squared error value validates the prominence of the segmented image, where the values must non-negative and closer to zero.
206
Shalini and Sasikala n 2 1 Mi − Mi∧ n i=1
(10)
Here, n is data points, Mi is observed value, and Mi∧ is predicted values. Structural similarity index measure is a cluster validation method for envisaging the superficial quality of segmented image. It is used for measuring the similarity between two images based on luminance, contrast and structure. And this measure improves on traditional measure ‘mean-squared error’. S(A, B) = [l(A, B)α .c( A, B)β .s(A, B)γ
(11)
Here, setting the weights of α, β, γ to 1. Sensitivity evaluates the percent of positive rates. (e.g., the count of retinal images which are properly recognized as NPDR affected). Sensitivity =
tp × 100 tp + tn
(12)
Specificity evaluates the percent of negative rates (e.g., the count of retinal images which are properly recognized as NPDR unaffected). Specificity =
tp × 100 fp + tn
(13)
Accuracy is evaluated as the sum of positive and negative rates (the count of NPDR affected and unaffected retinal images) dividable by the count of images in the dataset. Accuracy =
tp + tn × 100 tp + fp + tn + fn
(14)
where tp, fp, tn and fn represent true positive, false positive, true negative and false negative. The validation of the proposed segmentation algorithms is evaluated with mse and ssim which are formulated in Eqs. (10) and (11). The values obtained are 0.52 and 0.15 for K-means, 0.52 and 0.15 for FCM. The performance analysis of hard exudates segmentation using K-means with binary thresholding algorithm gives 78.07% sensitivity, 67.57% specificity, 83.21% accuracy and FCM with binary thresholding gives 97.92% sensitivity, 28.57% specificity, 95.05% accuracy which is formulated in Eqs. (12), (13) and (14). And the evaluated results are presented in Table 1 and Fig. 4, which shows that proposed FCM method gives the better result.
Fuzzy C-means for Diabetic Retinopathy Lesion Segmentation Fig. 4 Comparison chart of exudates segmentation
207
1.5 1
Kmeans
0.5
Fuzzy c Means
0
Table 1 Performance measures of exudates segmentation S. No
Method
mse
ssim
Sn (%)
Sp (%)
Acc (%)
1
K-means + Binary thresholding
0.52
0.15
78.07
67.57
83.21
2
Fuzzy C-means + Binary thresholding
0.59
0.23
97.92
28.57
95.05
6 Conclusion The segmentation of hard exudates has been executed with the FCM method. The implementation is done by resizing the image for standardization. Then the resized images are preprocessed using channel extraction and median filter which enhances the foreground from the background. For segmenting the hard exudates, two algorithms, namely K-means and FCM, have been applied. Among these two algorithms, FCM has performed better with an accuracy of 95.05%. In upcoming prospect, segmentation of DR can be done by reducing false positives to achieve higher precision. Also, the proposed FCM method depends on the initial choice of weights. Hence, different initializations lead to different results. So, there is necessity of intending, a statistically formalized method with maximization.
References 1. A. Pattanashetty, S. Nandyal, Diabetic retinopathy detection using image processing: a survey. Int. J. Comput. Sci. Network, pp. 661–666 (2016) 2. R. Shalini, S. Sasikala, A survey on detection of diabetic retinopathy, pp. 626–630 (2018). https://doi.org/10.1109/I-SMAC.2018.8653694 3. N.G. Ranamuka, R. Gayan, N. Meegama, Detection of hard exudates from diabetic retinopathy images using fuzzy logic. IET Image Process, pp. 121–130 (2012) 4. S.W. Franklin, S.E. Rajan, Diagnosis of diabetic retinopathy by employing image processing technique to detect exudates in retinal images. IET Image Process, pp. 601–609 (2013) 5. J.S. Lachure, A.V. Deorankar, S. Lachure, Automatic diabetic retinopathy using morphological operations. Int. J. Comput. Appl., pp. 22–24 (2015) 6. A. Elbalaoui, M. Fakir, Exudates detection in fundus images using meanshift segmentation and adaptive thresholding, in Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization (2018)
208
Shalini and Sasikala
7. A.L. Pal, S. Prabhu, N. Sampathila, Detection of abnormal features in digital fundus image using morphological approach for classification of diabetic retinopathy. Int. J. Innov. Res. Comput. Commun. Eng. pp. 901–909 (2015) 8. P. Hosanna Princye, V. Vijayakumari, Detection of exudates and feature extraction of retinal images using fuzzy clustering method. IET publications, pp. 388–394 9. J. Dileep, P. Manohar, Automatic detection of exudate in diabetic retinopathy using K-clustering algorithm. Int. J. Recent Innov. Trends Comput. Commun., pp. 2878–2882 (2015) 10. A. Sopharak, B. Uyyanonvara, S. Barman, Automatic exudate detection from non-dilated diabetic retinopathy retinal images using fuzzy C-means clustering. Sensors open access publications pp. 2148–2161 (2009). www.mdpi.com/journal/sensors 11. https://in.mathworks.com/help/vision/ug/interpolation-methods.html 12. https://www.sciencedirect.com/topics/engineering/median-filtering 13. ttps://en.wikipedia.org/wiki/Thresholding_(image_processing) 14. https://sites.google.com/site/dataclusteringalgorithms/k-means-clustering-algorithm 15. https://sites.google.com/site/dataclusteringalgorithms/fuzzy-c-means-clustering-algorithm
A Secured System for Tele Cardiovascular Disease Monitoring Azmi Shawkat Abdulbaqi, Saif Al-din M. Najim, Shokhan M. Al-barizinji, and Ismail Yusuf Panessai
Abstract Electrocardiogram (ECG) signals play an indispensable role in interpreting the heart’s effectiveness in the form of electrosignals to diagnose different types of cardiac problems. These vital signals should also be transmitted safely to avoid any interruptions in the data loss or noise that may lead to illness detection. As ECG signals are observed with a higher-dimensional scale, this should be compressed for leveraging accurate control and transportation. A method of lossless compression called Huffman-based discrete cosine transform (DCT) is performed in this manuscript for achieving the efficient transmission of ECG data. DCT and inverse discrete cosine transform (IDCT) are suggested for improving data privacy and lowering the data complexity. This manuscript concentrates on achieving a high level of accuracy ratio in the reconstruction upon compression and transportation of the original data (OrDa) unaccompanied any failure in the lowest computational time. During the first stages, preprocessing and sampling are performed to eliminate the sounds and transmission of OrDa. The DCT-based Huffman quantization approach achieved great performance measures as “distortion percentage”(PRD), “signal-tonoise ration” (SNR), “quality score” (QS), and “compression ration” (CR) when comparing with current approaches in different data transformations. Keywords DCT · Electrocardiogram (ECG) · Huffman coding (HufCod)
A. S. Abdulbaqi (B) · S. A. M. Najim · S. M. Al-barizinji College of Computer Science & Information Technology, University of Anbar, Ramadi, Iraq e-mail: [email protected] S. A. M. Najim e-mail: [email protected] S. M. Al-barizinji e-mail: [email protected] I. Y. Panessai Faculty of Arts, Computing and Creative Industry, UPSI, Tanjong Malim, Malaysia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_18
209
210
A. S. Abdulbaqi et al.
1 Introduction All over the world, ECG signals are utilized to track the heart activity and analyze different heart diseases. It demonstrates the findings that change the heart activity. ECG signals can be utilized to diagnose heart problems, infarcts, and contagions. ECG measures the activity of the heart muscle. Such signals take the broad space required to store and transmit ECG signals. In order to rectify this issue, ECG data compression (DaCo) presents a significant role in efficiency and requires minimum amounts of storage space. Based on the recommended smart e-health system, IoT is utilized to provide this scheme [1]. IoT device collects and processes health data such as “Blood-Pressure”, “Oxygen” and levels of “Blood Sugar”, “Weight”, and “ECGs”. In such IoT apps, the management of huge-data sensors is an instance of the major difficulties. Information protection for IoT is now extremely complex and challenging, as menaces are increasingly harder to identify [2]. Different techniques are utilized to compress ECG signals like, wavelet compression, filters band compression, predictions of ANN, compression of matching pursuit, and compression based on clustering. The wavelet compression output signal removes lowest significant variables and keeps full space memory. The filters-band approach works best efficiently from the other transformation of the wavelet. However, it is not accurate, as there are opportunities to filter necessary values. Yet those methods of compression have augmented the time/costs of consumption [3]. The optimized lossless data decomposition approach to improve the reliability of transmission of ECG signals was utilized. The recommended algorithm works-effectively for signals compression unaccompanied information loss. The lossless algorithm is tested, and it was found to have achieved a high rate of CR if compared to other methods. This manuscript inspects both the DCT and the IDCT. Figure 1 displays the initial ECG signal and is a method of electrophysiological control for the heart’s electric activity [4, 5] (Fig. 2).
2 System Goals The main goals of this manuscript are to improve the highest CR of the inputting ECG signals not accompanied by information loss, to determine the security and privacy of knowledge on the IoMT app, and to improve the decomposition representation of the lossless ECG signal.
3 Manuscript Organization The first section presents the ECG signals compression apps. The second section explains the various lossless ECG signals literature. The third section includes the recommended methodology. The fourth section explained the measured performance
A Secured System for Tele Cardiovascular …
211
Fig. 1 Electrocardiogram system with Holter machine [6] ECG Signals Sampling
Pre-processing ECG Signals Acquisi on Instrument
ECG Signals Amplifica on
ECG Signals Noise Removal
Compression ECG Signal Compression (DCT)
ECG Signal Encoding (Huffman)
ECG Signals Evalua on & Diagnosis
ECG Signal De-Compression (Inverse DCT)
Specialist Decision
ECG Signal Decoding (Inverse Huffman)
Fig. 2 Recommended system
Transmission
ReconstrucƟon
212
A. S. Abdulbaqi et al.
analysis. The last section mentioned to the recommended strategy and covering of the recommended method.
4 Literature Review The developments in the area of communication and electronics engineering lead to the utilization of big data (BiDa) with approximately all apps. The DaCo method acts a critical role in the development of information technologies that assist to manage the BiDa in a smart way. Manuscript [3] shows a new algorithm of ECG signals compression based on the adaptive Fourier decomposition (AFD). In any decomposition phase, it employs the Nevanlenna factorization and the highest choice of selection to obtain a rapid convergence ratio with more fidelity. That algorithm was practiced to the ECG signals and to gain the best findings. Manuscript [7] has developed an active procedure for QRS complexes detection based on the simplest analytical of the proposed temporary structure of ECG signals. In this procedure, the ECGs are prepared within some phases including detection the features, analysis the features, and remove the noises. The set of features gathered includes most of the ECGs information with needing to low data storage capacity, which develops a lossy ECG signals compressed form. Manuscript [8] concentrates on the prediction of adaptive linear associated with the adaptive coding scheme family called Golomb-Rice coding (GRC) to perform the compression of lossless data. The study does not utilize any RAM for data storing. VLSI has been implemented utilizing TSMC 0.18 µm technique. It indicates a general improvement in output parameters as compared to other applications. Manuscript [5] is an original method for ECG signals compression on little-power ECGs sensor nodes is improved to take advantage of insufficient ECG signals. The successful compression efficiency of the technique suggested offers a feasible solution for the design of wireless ECG sensor nodes to attain ultra-low power consumption. The recommended compression efficiency algorithm provides a solution for the architecture of a WSN ECG node to perform ultra-low power consumption. A DaCo novel and an IoT power reduction plan allowed by WSN are recommended. Manuscript [9] introduced system that utilizes a lossless and lossy approaches with compressed data to permit a mixed transmission mode, to support adaptive data rate selection, and to save wireless broadcast power. The applied approach to an ECG, the data is compressed first utilizing a method of loss compression with a higher CR. Use entropy coding, the error rate between the OrDa and the DeDa is stored, which enables lossless recovery of the specific information when required. Manuscript [10] describes an effective algorithm to compress the data for tele-cardiac-patients monitoring inside the agricultural zone, based on DCT and two mixed encodings approaches. This method presents an attractive CR with a little PRD values.
A Secured System for Tele Cardiovascular …
213
5 The Proposed Work 5.1 Pre-processing Preprocessing is critical in the ECG dataset; the signals are obtained without accurate representation from the heart and certain signals go missing (lost). The ECG data is extremely noisy and ECG signals are low. So we need to separate the desirable signals from the ECG data [9]. This method focuses on the elimination of the undesirable signal for further analysis. Here, the signals values are stowed in the numerical format [11].
5.2 Sampling The sampling principal goal is to decrease the work time, additionally the cost. The data sampling is determined with different measurements to improve the recommended method implementation. A sampling of data is selected for data efficiently transmission unaccompanied by any data losing [12].
5.3 DCT DCT is the transforming approach; it transforms a signal of TS into simple elements of the frequency. The DCT is utilized for the decrease of ECG dataset and the efficient extraction of functions [13]. The DCT exits with coupled signals providing a high energy compaction. The coefficients of transformation are zero or minimal and maximum coefficients are strong. The much more important information in ECG is compressed into coefficients first [14, 15]. To calculate DCT for the coefficients, asymmetric properties are utilized first, and then the asymmetric properties of those coefficients, and Y (u) is utilized in Eq. (1). DCT coefficients can be represented by [16]: N −1 2 π.(2x + 1).u f (x). cos Y (u) = .α(U ). N 2N x=0
(1)
when u = 0, 1, 2, 3, …, N − 1. IDCT is determined utilizing the symmetric and asymmetric properties of the coefficients; α(U ) is shown Eq. (2). Here, α(U ) is given by [17]:
214
A. S. Abdulbaqi et al.
⎛ 1
α(U ) = ⎝ 21/2 1
u=0
⎞ ⎠
(2)
u>0
when u = 0, 1, 2, 3, …, N − 1.
5.3.1
DCT Coefficients
The standard ECG dataset is composed of ECG signals. DCT matrix to compress signals and then the coefficient of the DCT is given was utilized [18].
5.4 Compression 5.4.1
Lossless Compression
Utilization of the compression approach of lossless, the process of obtaining a high CR be available. Without this, the highest CR cannot be achieved [19]. The main benefits of lossless compression are not lost any information, and allow getting the main file when decompress is achieved [20].
5.5 Huffman Coding (HufCod) HufCod is utilized to attain 20–90% compression effectiveness. The compression method involves encoding the message of the bit in binary bits format. After the encoding process, the decoding process will be done by tracking Huffman tree from the tree’s root according to the specified sequences [21]. The key benefit is to operate at low complexity computational by replacing each character with relative frequency of the character-based a variable-length symbol [22].
5.5.1
ECG DaCo
In HufCod, the combination of data must quantize the overall data file in this process [23].
A Secured System for Tele Cardiovascular …
5.5.2
215
Huffman Encoding
The number of data values has been reduced through HuCo. The symbol sequence is the input to this algorithm and coding and it requires encoded. Tracing the symbols is an essential step and every symbol location includes number indexing [24]. Encoded relies on the frequencies of the letters and creates an illegible data format [25].
5.5.3
Huffman Decoding
The decoding is processed by a fixed sequence and the symbols are assigned. It substituted every character by that character’s relative frequency in the text and minimizing the code’s average length based variable-length code based on [26].
5.6 Applying IDCT The IDCT receives the input as transformative coefficients Y (u) and transforms it into TS f (x). The IDCT applies to any set of coefficients. IDCT retransforms inside TS. DCT and IDCT provide very intense computational work [27, 28]. Y (u) =
π.(2x + 1).u 2 .α(U ). cos N 2N
(2)
6 Analysis of Performance This section discusses the detailed performance review of CR, PSNR, and MSE.
6.1 Comparison Enabling Dataset Description The ECG data collection was taken from MIT-BIH Arrhythmia database, and the source (“https://archive.physionet.org/cgi-bin/atm/ATM”) is utilized to implement our proposed compression algorithms [29]. The recording of the ECG was performed in different subjects. ECG dataset compression was utilized to solve restrictions related to security risk problems and limitations [30].
216
A. S. Abdulbaqi et al.
6.2 Compression Ration (CR) The ratio of DaCo is known as compression strength. The measure is utilizing the DCT algorithm to reduce the data size. This also measures dataset complexity [31]. Compression Ration (CR) =
Uncompressed Signal Size Compressed Signal Size
Table 3 indicates the contrast of the compression degree between the current methods and the recommended methods to the respective CR. We can conclude from Table 3 that the recommended method resulted in the highest value of 25.74 than other current CR values of 11.6, 14.9, 14.3, 15.1, 5.65, 16, and 7.8.
6.3 Signal-To-Noise Ratio (SNR) and Distortion Percentage (PRD) SNR calculates the ratio of overall positive signal power to the power of distorted noise [32].
SNR = 10x Log
N −1 (X (n) − mean(X ) 0 N −1 (X (n) − Y (n))2 0
(4)
PRD is an intermediate sum of distortion between the source and distorted the waveforms of ECG. PRD can be represented as follows by the below Eq. [33]: N (x(n) − x (n))2 PRD = n=1 N × 100 2 n=1 x (n)
(5)
xi refers to the original ECG signals and xiY refer to reconstructed signals.
6.4 The Score of the Quality (QS) This is providing from before percentage of CR with PRD. QS defined as a critical functioning monitoring measured that helps with the consideration of compensating for errors which occur in reconstruction to select the correct compression operation. For instances, when a loss type of compression operation is performed, the increased QS values may imply the robustness of the compression method taken up [34].
A Secured System for Tele Cardiovascular …
217
Fig. 3 Original and reconstructed ECG signals based on the recommended method with corresponding error signals
QS =
CR PRD
(6)
The findings are shown in Fig. 4 in graphical format to know the exact efficacy rate obtained by the proposed approach. This clearly shows that when compared with current methods, the recommended system gives a high CR and consistency resulting in a simple data transfer rate, and when comparing with another methods, PSNR values are equal to zero. It insists that the recommended approach has a highly optimized encryption and decryption framework with a powerful firewall and robust decryption system [35, 36] (Fig. 3). From the tabulation, it can be concluded that the PRD values provided by system proposed were higher with a value of 1.91 than the values produced by current methods such as 5.3, 5.83, 2.43, 2.5, 3.63, 1.973, 2.29, and 1.911, respectively.
6.5 Mean Squared Error (MSE) MSE calculates the error as average, which is the variation in the calculated values and the estimated value suggested [37]. MSE also tests estimate efficiency and offers an accepted response. The formula of Eq. (7) aids in measuring MSE.
218
A. S. Abdulbaqi et al.
Fig. 4 Comparison between the prosed method and the literatures based on CR, PRD, and QS
Table 1 A brief of overall performance Performance summary of the recommended technique based on Rec # 117 Performance metrics
Realized values
CR
25.74
QS
13.44
SNR
52.78
PRD
1.91
MSE
0.2
MSE =
n 1 (yi − yi )2 n i=1
(7)
RMS value for various current methods is shown from Table 7 and suggested. RMS is proved that the suggested approach gives as MSE the value of 0.2 which is the lowest error value as compared with another methods. The lowest error value improves model accuracy and provides added reliability [38]. Based on Table 2, optimum values were obtained by the proposed method, namely CR = 25.74%; PRD = 1.91%; and QS = 13.44% (Table 1).
7 Conclusion In order to improve CR in the e-health systems, a lossless compression method was introduced. The simplistic recommended approach reduces the need for memory space and the bandwidth of the network. The DCT function is clearly defined by the CR. Security and privacy are the two most important transactions on which IoMT
A Secured System for Tele Cardiovascular …
219
Table 2 Comparison of recommended algorithm performance with an actual lossless algorithm ECG records No.
CRs (%)
PRDs (%)
100
19.85
3.28
QSs (%) 6.05
SNRs (dB) 52.77
101
17.95
3.60
4.99
55.70
102
21.35
4.18
5.11
51.65
103
19.33
4.09
4.72
59.75
104
21.16
5.30
3.99
53.58
105
20.02
4.57
4.38
57.78
106
18.66
4.92
3.79
57.48
107
21.94
5.58
3.93
56.99
108
23.18
4.71
4.92
49.96
109
22.36
5.01
4.47
56.16
111
21.17
5.58
3.79
52.58
112
22.91
2.07
11.08
47.31
113
20.81
4.53
4.60
60.23
114
22.48
4.17
5.39
51.09
115
22.25
3.13
7.11
55.33
116
19.47
2.89
6.75
57.75
117
25.74
1.91
13.44
52.78
118
23.83
3.31
7.19
48.85
119
26.01
4.41
5.89
51.00
121
32.47
2.36
13.73
53.14
122
21.71
2.39
9.10
54.93
123
24.38
2.69
9.06
50.92
124
27.12
2.20
12.34
57.97
200
22.64
7.64
2.96
50.04
201
16.04
3.71
4.33
60.18
202
24.73
4.72
5.24
58.11
203
19.37
7.14
2.71
51.88
205
19.68
3.04
6.48
53.05
207
28.23
6.15
4.59
54.75
208
21.36
7.83
2.73
50.64
209
15.38
4.61
3.33
56.70
210
21.87
5.92
3.70
53.28
212
16.77
5.52
3.04
55.51
213
15.61
4.04
3.86
60.41
214
24.15
5.74
4.21
55.14
215
20.18
8.65
2.33
46.92 (continued)
220
A. S. Abdulbaqi et al.
Table 2 (continued) ECG records No.
CRs (%)
PRDs (%)
217
22.48
4.95
QSs (%) 4.54
SNRs (dB) 59.55
219
23.35
4.80
4.86
49.33
220
20.80
3.20
6.49
53.22
221
22.14
5.36
4.13
54.94
222
21.66
5.49
3.95
51.70
223
21.00
2.73
7.70
61.04
228
27.12
8.41
3.22
47.73
230
18.52
5.28
3.51
56.13
231
19.61
4.89
4.01
56.48
232
17.09
5.24
3.26
44.62
233
19.74
6.68
2.95
53.37
234
19.10
4.51
4.24
59.62
Average
21.56
4.65
5.38
54.17
apps depend. A high accuracy was obtained, as confirmed by the evaluation results, as well as a high security, utilizing the HufCod approach. This approach contains EncDec. In section of finding, different measurements with CR, MSE, and SNR with various extant techniques were compared. The recommended future work includes the development of apps utilized in the real-time compression processing in order to obtain the least computation time.
References 1. M. Elgendi, Less is more in biosignal analysis: compressed data could open the door to faster and better diagnosis. Diseases (2018) 2. S. Kalaivani, C. Tharini, Analysis and modification of rice Golomb coding lossless compression algorithm for wireless sensor networks. J. Theor. Appl. Inform. Technol. 96(12), 3802–3814 (2018) 3. C. Tan, L. Zhang, H.-T. Wu, A novelBlaschke unwinding adaptiveFourier-decompositionbased signal compression algorithm with application on ECG signals. IEEE J. Biomed. Health Inform. 23(2), 672–682 (2019) 4. A. Burguera, Fast QRS detection and ECG compression based on signal structural analysis (2019) 5. H. Huang, S. Hu, Y. Sun, ECG signal compression for low-power sensor nodes using sparse frequency spectrum features, in IEEE Biomedical Circuits and Systems Conference (BioCAS) (2018) 6. https://www.drugs.com/cg/heart-palpitations-in-adolescents.html 7. A. Burguera, Fast QRS detection and ECG compression based on signal structural analysis. IEEE J. Biomed. Health Inform. 23(1), 123–131 (2019) 8. A.S. Abdulbaqi, I.Y. Panessai, Designing and ımplementation of a biomedical module for vital signals measurements based on embedded system. Int. J. Adv. Sci. Technol. (IJAST) 29(3), 3866–3877 (2020)
A Secured System for Tele Cardiovascular …
221
9. C.J. Deepu, C.-H. Heng, Y. Lian, A hybrid data compression scheme for power reduction in wireless sensors for IoT. IEEE Trans. Biomed. Circuits Syst. 11(2), 245–254 (2017) 10. C.K. Jha, M.H. Kolekar, ECG data compression algorithm for telemonitoring of cardiac patients. Int. J. Telemed. Clin. Pract. 2(1), 31–41 (2017) 11. A.S. Abdulbaqi et al., Recruitment Internet of Things For Medical Condition Assessment: Electrocardiogram Signal Surveillance, Special Issue, AUS Journal, (Institute of Architecture and Urbanism, University of Austral de Chile, 2019), pp. 434–440 12. T.-H. Tsai, W.-T. Kuo, An efficient ECG lossless compression system for embedded platforms with telemedicine applications. IEEE (2018) 13. A.E. Hassanien, M. Kilany, E.H. Houssein, Combining Support Vector Machine and Elephant Herding Optimization for Cardiac Arrhythmias. arXiv:1806.08242v1[eee.SP], June 20, 2018 14. J. Dogra, M. Sood, S. Jain, N. Prashar, Segmentation of magnetic resonance images of brain using thresholding techniques,in 4th IEEE International Conference on signal processing and control (ISPCC 2017), Jaypee University of Information technology, Waknaghat, Solan, H.P, India, pp. 311–315, September 21–23, 2017 15. N. Prashar, S. Jain, M. Sood, J. Dogra, Review of biomedical system for high performance applications ,4th IEEE International Conference on signal processing and control (ISPCC 2017), Jaypee University of Information technology, Waknaghat, Solan, H.P, India, pp 300–304, September 21–23, 2017 16. A. Dhiman, A. Singh, S. Dubey, S. Jain, Design of lead II ECG waveform and classification performance for morphological features using different classifiers on lead II. Res. J. Pharmaceut. Biol. Chem. Sci. (RJPBCS) 7(4), 1226–1231 (2016) 17. B. Pandey, R.B. Mishra, An integrated intelligent computing method for the detection and interpretation of ECG based cardiac diseases. Int. J. Knowl. Eng. SoftData Paradigms 2, 182– 203 (2010) 18. A.S. Abdulbaqi, S.A.M. Najim, R.H. Mahdi, Robust multichannel EEG signals compression model based on hybridization technique. Int. J. Eng. Technol. 7(4), 3402–3405 (2018) 19. S. Kalaivani, I. Shahnaz, S.R. Shirin, C. Tharini, Real-time ECG acquisition and detection of ˙ anomalies, in Artificial Intelligence and Evolutionary Computations in Engineering Systems, ed. S.S. Dash, M.A. Bhaskar, B.K. Panigrahi, S. Das (Springer, Berlin, 2016) 20. J. Uthayakumar, T. Venkattaraman, P. Dhayachelvan, A survey on data compression techniques: from the perspective of data quality, coding schemes, data types, and applications. J. King Saud Univ.- Comput. Inform. Sci. (2018) 21. R. Gupta, S. Singh, K. Garg, S. Jain, Indigenous design of electronic circuit for electrocardiograph . Int. J. Innov. Res. Sci. Eng. Technol. 3(5), 12138–12145 (2014) 22. C.C. Chiu, T.H. Lin, B.Y. Liau, Using correlation coefficient in ECG waveforms for arrhythmia detection. Biomed. Eng. Appl. Basis Commun. 17, 147–152 (2005) 23. S. Jain, Classification of protein kinase B using discrete wavelet transform. Int. J. Inform. Technol. 10(2), 211–216 (2018) 24. N. Alajlan, Y. Bazi, F. Melgani, S. Malek, M.A. Bencherif, Detection of premature ventricular contraction arrhythmias in electrocardiogram signals with kernel methods. SIViP 8(5), 931–942 (2014) 25. Y. Hirai, T. Matsuoka, S. Tani, S. Isami, K. Tatsumi, M. Ueda, T. Kamata, A biomedical sensor system with stochastic A/D conversion and error correction by machine learning. IEEE Access 7, 21990–22001 (2019) 26. Ö. Yildirim, A novel wavelet sequence based on a deep bidirectional LSTM network model for ECG signal classification. Comput. Biol. Med. 96, 189–202 (2018) 27. A. Diker, D. Avci, E. Avci, M. Gedikpinar, A new technique for ECG signal classification genetic algorithm Wavelet Kernel extreme learning machine. Optik 180, 46–55 (2019) 28. J. Zhang, Z. Gu, Z.L. Yu, Y. Li, Energy-efficient ECG compression on wireless biosensors via minimal coherence sensing and weighted l1 minimization reconstruction. IEEE J. Biomed. Health Inform. 19(2), 520–528 (2015) 29. A. Singh, S. Dandapat, Block sparsity-based joint compressed sensing recovery of multichannel ECG signals. Healthcare Technol. Lett. 4(2), 50–56 (2017)
222
A. S. Abdulbaqi et al.
30. A. Singh, S. Dandapat, Exploiting multi-scale signal information in joint compressed sensing recovery of multi-channel ECG signals. Biomed. Signal Process. Control 29, 53–66 (2016) 31. H. Mamaghanian, G. Ansaloni, D. Atienza, P. Vandergheynst, Power-efficient joint compressed sensing of multi-lead ECG signals, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2014, pp. 4409–4412 (2014) 32. S. Kumar, B. Deka, S. Datta, Block-sparsity based compressed sensing for multichannel ECG reconstruction, in Pattern Recognition and Machine Intelligence. PReMI 2019. Lecture Notes in Computer Science, vol. 11942, ed. by B. Deka, P. Maji, S. Mitra, D. Bhattacharyya, P. Bora, S. Pal (Springer, Cham, 2019) 33. S. Eftekharifar, T.Y. Rezaii, S. Beheshti, S. Daneshvar, Block sparse multi-lead ECG compression exploiting between-lead collaboration. IET Sig. Process. (2018) 34. A. Sharma, A. Polley, S.B. Lee, S. Narayanan, W. Li, T. Sculley, S. Ramaswamy, A Sub-60µ A multimodal smart biosensing SoCwith>80-dB SNR, 35µAphotoplethysmography signal chain. IEEE J. Solid-State Circuits 52(4), 1021–1033 (2017) 35. Z. Zhang, J. Li, Q. Zhang, K. Wu, N. Ning, Q. Yu, A dynamic tracking algorithm based SAR ADC in bio-related applications. IEEE Access 6, 62166–62173 (2018) 36. M.K. Adimulam, M.B. Srinivas, A 1.0 V, 9.84 fJ/c-s FOM reconfigurable hybrid SAR-sigma delta ADC for signal processing applications. Analog Integr. Circ. Sig. Process 99(2), 261–276 (2019) 37. X. Zhang, Y. Lian, A 300-mV 220-NW Event-driven ADC with real-time QRS detection for wearable ECG sensors. IEEE Trans. Biomed. Circuits Syst. 8(6), 834–843 (2014) 38. Y. Hou, J. Qu, Z. Tian, M. Atef, K. Yousef, Y. Lian, G. Wang, A 61-NW level-crossing ADC with adaptive sampling for biomedical applications. IEEE Trans. Circuits Syst. II Express Briefs 66(1), 56–60 (2019)
Anomaly Detection in Real-Time Surveillance Videos Using Deep Learning Aswathy K. Cherian
and E. Poovammal
Abstract The real-time events are fast and occurring at highly dynamical moments. Hence, the important challenges are identifying the anomaly incidents properly. The specified methods and techniques are to be quick in identification for control and other measures of the events. In the proposed method, the anomalies are detected from the surveillance videos using the multiple instance learning and ID3 for extracting the features. The extracted features are then used as input to a deep neural network where the classification of the videos to anomalous and normal videos is done. The investigated dataset is with 128 hours of videos with ten percent of different realistic anomaly videos. The AUC of the proposed approach is 81. The proposed approach is most beneficial for the real-world anomaly recognition in surveillance videos. Keywords Video surveillance · Anomaly detection · Multiple instance learning · I3D algorithm · Deep learning
1 Introduction The monitoring of the real-world events simultaneously at many places are really becoming challenging due to higher volume of data. Recently, image processing has been gained momentum to provide societal society. The automatic and faster detection is essential to trace the events without human errors. Wei et al. [1] used a convolutional neural network (CNN) to carry out the scenes and objects to account for video anomalies’ spatial and temporal aspects. Abnormal vehicle behavior of live stream videos was detected using developed application and evaluation by Wang et al. [2]. Ahmadi et al. [3] used a sequential topic modeling for the traffic scene abnormalities through sparse topical coding. An optimized CNN and genetic algorithm A. K. Cherian (B) · E. Poovammal Department of Computer Science and Engineering, SRM Institute of Science and Technology, Kattankulathur, Chennai, India e-mail: [email protected] E. Poovammal e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_19
223
224
A. K. Cherian and E. Poovammal
were used for anomaly detection in surveillance videos, and the presented ML technique was helpful to detect the video anomaly [4, 5]. A multistage pipeline approach is used to detect the video anomaly with an incrementally trained Bayesian CNN. Mehta et al. [6] built a deep learning model for anomaly detection of gun and fire violence and the detection rate with 45 fps. They showed the accuracy of the model around 87–90%. Ullah et al. [7] used a pre-trained CNN to extract the spatiotemporal features from the series of frames using a multilayer and bidirectional memory. Pawar and Attar [8] investigated the frame-level identification of anomaly detection using a two-dimensional convolutional auto-encoder and radial basis function with two real-world datasets. Tang et al. [9] integrated reconstruction and prediction for detecting anomalies and demonstrated at 30 fps. Iqbal et al. [10] used canny edge detection with Hough transform for high-resolution cloud videos to anomaly detection. Kisan et al. [11] developed a model using a change in the motion vector to detect real-world anomalies. CNN methods are used to detect anomaly due to weather and galaxy visual changes using ML [12, 13]. There are plenty of approaches that are used to detect the anomaly in the surveillance videos as per the literature. Recently, [14, 15] adopted auto-encoders that developed a model for normal video classification. The anomalies are determined using a reconstruction loss. In the proposed model, the prediction of both normal and anomaly video is made even though the training data is weakly labeled.
2 Dataset Our proposed method uses the videos from the Sultani [16]. This is one of the largest real-time dataset available for anomaly detection. It contains around 1900 real-time surveillance videos. These video addresses are different scenarios of public life. The total length of the video is around 128 h with 13 different types of events. The events that are covered in this dataset include vandalism, shoplifting, fighting, explosion, assault, arrest, arson, burglary, stealing, robbery, fighting, abuse, and accidents. A sample screenshot of the video of anomalies is shown in Fig. 1. The figure represents around 14 anomalies with four frames of each from the dataset. The first set shows example of abuse followed by arson. The accident anomaly depicts a car hit by a bus. The last frames represent a normal video with no anomalies. It represents frames of a mall with people moving in and out. A total dataset is divided into two sets, one for training and the other for testing. The training consists of around 800 normal images and 810 testing images, whereas the testing contains 150 and 140 normal and anomalous videos, respectively. These videos are unlabeled raw data [17]. The number of videos in each event distributed for training and testing is given in Table 1.
Anomaly Detection in Real-Time Surveillance …
225
Fig. 1 Anomalies from the training and testing videos (sample sets) [16] Table 1 Distribution of videos for testing and training of anomalies
Anomaly
Training set
Testing set
Abuse
48
50
Fighting
45
50
Arrest
45
50
127
150
Accidents Arson
41
50
Robbery
145
150
Assault
47
50
Shooting
27
50
Burglary
87
100
Shoplifting
29
50
Explosion
29
50
Stealing
95
100
Vandalism
45
50
226
A. K. Cherian and E. Poovammal
3 Proposed Research Methodology The diagrammatic representation of the proposed method is shown in Fig. 2. The input video (set) is first divided into fixed number of segments, say 32 during the training. The segments also known as instances are separated into positive and negative set based on the MIL ranking loss function. The anomalous (positive) and normal (negative) is used to absorb features using the I3D network. Later, a classifier is used to predict the videos as anomalous or normal.
3.1 Multiple Instance Learning The videos of the given dataset are raw and unlabeled. Labeling of this huge dataset is tedious, time consuming, and human impossible. For this purpose, we adapt a methodology called the multiple instance learning. Here, we propose a deep MIL framework where each video is treated as set and small segments of the video are treated as instance of these videos. MIL considers that all videos do not have accurate temporal annotations. This framework unlike the state-of-the-art frameworks do not require the exact location of anomaly in videos, instead it only requires a video level labeling indicating the presence of anomaly. The video that does not contain anomaly is labeled/grouped as negative and the video that contains anomaly is labeled as positive. The negative set is represented as S m with n1 , n2 , …, nm as negative instance and Pm represents the positive instances with p1 , p2 , …, pm . These instances are segments of the video. Every set contains instances of the video which are non-overlapping segments of the training video. Normally, the number of instances from each video is 32. In each set, the optimized objective function with respect to maximum scored instance is represented as Eq. 1: ⎡
⎤ z 1 max 0, 1 − Y B j (max(w.φ(xi )) − b) ⎦ + w2 min⎣ w i∈B j z j=1
Fig. 2 Schematic representation of the proposed detection of anomalies [16]
(1)
Anomaly Detection in Real-Time Surveillance …
227
where the total number of sets is given by z, the classifier to be learned is w, the set level label is denoted by Y Bj , and φ(x) denotes feature representation of an image patch or a video segment.
3.2 Deep MIL Ranking Model The score of anomalous videos is assumed to be higher than the normal video considering anomaly detection to be a regression model. This makes the value of ranking loss function to be higher for anomalous video segments, that is f (V n ) < f (V a ) where V a and V n represent anomalous and normal video segments and f (V a ), f (V n ) are their corresponding ranking scores. As the annotations are only labeled during the training of the dataset, the above assumption is not suited. In such situations, we used the multiple instance ranking objective function: max i ∈ Ba f (Via ) > max i ∈ Bn f (Vin ),
(2)
where max is taken over all video segments in each set. The ranking is enforced only on two instances that are having maximum score in positive and negative sets. The segment in the negative set has the greatest anomaly score. This represents the video to be similar to an anomalous segment, but in real, it is a normal instance. There occurs a drawback of false alarm in anomaly detection which is occurred when the negative instance is considered as a hard instance. By using Eq. 2, we are pushing the positive instances and negative instances far apart in terms of anomaly score.
3.3 I3D Algorithm Feature extraction and classification is done by using the I3D algorithm and a threelayered FC neural network. Before extracting features all videos in the dataset is resized/trimmed to 240 × 320 pixels at a frame rate of 30 frames per pixel. The C3D network utilizes a single 3D network, whereas I3D network uses two 3D networks which works similar to a pre-trained 2D network in 3D. The architecture of I3D is shown in Fig. 3. C3D causes overfitting due to the unavailability of huge video dataset and the complicated structure of the 3D convolutional kernels. The proposed method utilizes the I3D (Fig. 3) [18] is the inception 3D CNN. In ID3 structure, the 2D convolutional kernels are inflated/expanded to 3D [19], that is, all the square filters (N2) are converted to cubic filters (N3) [20, 21]. The beginning of the network is found that with asymmetric filters for max-pooling, this helps to maintain time while pooling over the spatial dimension. It also analyzes the spatial information at various positions, and an average result is produced. I3D concentrates to grow the network wider instead of deeper. The overfitting of the model is avoided by training the model on the video dataset. This improves the accuracy of the model. The 1 ×
228
A. K. Cherian and E. Poovammal
Fig. 3 Architecture of I3D
1 x 1 convolution reduces the input channel from the 3 × 3 x 3, thus reducing the computational cost.
3.4 Classification The features extracted with I3D algorithm is fed to a three-layered fully connected neural network. The first fully connected layer has 512 neurons, and the activation function used is sigmoid [22]. The second layer contains 32 units followed by 1 unit. This is because of 60% dropout in each layer [23]. In the last layer, rectified linear unit (ReLU) is used as the activation function, and the initial learning rate recording was 0.0001 with Adagrad optimizer.
4 Discussion As per the anomaly detection described in the literature, our approach is validated with the proposed model, and the comparative results are discussed here. Table 2 Table 2 Comparison of the present results with literature
Method
AUC
Binary classifier
50.0
Sultani et al. [16]
75.02
Hasan et al. [15]
50.6
Lu et al. [24]
65.51
The proposed method
81.16
Anomaly Detection in Real-Time Surveillance … Table 3 Accuracy recognition of C3D and I3D (feature extraction techniques)
229
Method
C3D
I3D
Accuracy
23.0
25.9
provides the area under the curve (AUC) of the proposed method with other state-ofthe-art methods. Table 3 shows the quantitative accuracy result of the I3D and C3D feature extraction techniques.
5 Conclusions and Future Works The proposed approach is found to be deterministic to real-time situations. Thus, the investigated algorithm is most beneficial for the anomaly detection of the surveillance videos. Monitoring the real-world actions is certainly challenging to quantify as per space and time. The presented method is observed with an AUC of 81. The results are helpful for the classification of real-world anomalies in surveillance videos. The future work of this paper would be to introduce an algorithm which would extract features from the videos with more precision, which in turn improves the accuracy at which the videos are classified. Also, the classifier does not provide the correct output in case of blurred or noisy images. Fine-tuning of the parameter of the classifier is expected in this respect.
References 1. H. Wei, K. Li, H. Li, Y. Lyu, X. Hu, Detecting video anomaly with a stacked convolutional LSTM framework. Lect Notes Comput Sci 2019;11754 LNCS, pp. 330–342 2. C. Wang, A. Musaev, P. Sheinidashtegol , T. Atkison, Towards detection of abnormal vehicle behavior using traffic cameras. Lect Notes Comput Sci 2019;11514 LNCS, pp. 125–136 3. P. Ahmadi, E.P. Moradian, I. Gholampour, Sequential topic modeling for efficient analysis of traffic scenes, in 9th International Symposium on Telecommunication: With Emphasis on Information and Communication Technology, IST 2018 (2019) 4. D. Thakur, R. Kaur, An optimized CNN based real world anomaly detection in surveillance videos. Int. J. Innov. Technol. Explor. Eng. 8(9 Special Issue), 465–473 (2019) 5. A. Joshi, V.P. Namboodiri, Unsupervised synthesis of anomalies in videos: transforming the normal, in Proceedings of the International Joint Conference on Neural Networks (2019) 6. P. Mehta, A. Kumar, S. Bhattacharjee, Fire and gun violence based anomaly detection system using deep neural networks, in Proceedings of the International Conference on Electronics and Sustainable Communication Systems, ICESC 2020 (2020) 7. W. Ullah, A. Ullah, I.U. Haq, K. Muhammad, M. Sajjad, S.W. Baik, CNN features with bidirectional LSTM for real-time anomaly detection in surveillance networks. Multimedia Tools Appl. (2020) 8. K. Pawar, V. Attar, Deep learning-based intelligent surveillance model for detection of anomalous activities from videos. Int. J. Comput. Vis. Rob. 10(4), 289–311 (2020)
230
A. K. Cherian and E. Poovammal
9. Y. Tang, L. Zhao, S. Zhang, C. Gong, G. Li, J. Yang, Integrating prediction and reconstruction for anomaly detection. Pattern Recogn. Lett. 129, 123–130 (2020) 10. B. Iqbal, W. Iqbal, N. Khan, A. Mahmood, A. Erradi, Canny edge detection and Hough transform for high resolution video streams using Hadoop and Spark. Cluster Comput. 23(1), 397–408 (2020) 11. S. Kisan, B. Sahu, A. Jena, S.N. Mohanty, Detection of violence in videos using hybrid machine learning techniques. Int. J. Adv. Sci. Technol. 29(3), 5386–5392 (2020) 12. A.K. Cherian, A. Rai, V. Jain, Flight trajectory prediction for air traffic management. J. Crit. Rev. 7(6), 412–416 (2020) 13. A.K. Cherian, P. Kumar, P.S.K. Reddy, E. Poovammal, Detecting bars in galaxies using convolutional neural networks. J. Crit. Rev. 7(6), 189–194 (2020) 14. D. Xu, E. Ricci, Y. Yan, J. Song, N. Sebe, Learning deep representations of appearance and motion for anomalous event detection, in BMVC (2015) 15. M. Hasan, J. Choi, J. Neumann, A.K. Roy-Chowdhury, L.S. Davis, Learning temporal regularity in video sequences, in CVPR (June 2016) 16. W. Sultani, C. Chen, M. Shah, Real-world anomaly detection in surveillance videos, in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2018) 17. https://www.dropbox.com/sh/75v5ehq4cdg5g5g/AABvnJSwZI7zXb8_myBA0CLHa?dl=0 18. J. Duchi, E. Hazan, Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. (2011) 19. J. Carreira, A. Zisserman, Quo vadis, action recognition? a new model and the kinetics dataset, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6299–6308 20. S. Xie, C. Sun, J. Huang, Z. Tu, K. Murphy, Rethinking spatiotemporal feature learning: speed-accuracy trade-offs in video classification, in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 305–321 21. X. Wang et al., I3D-LSTM: a new model for human action recognition, in IOP Conference Series: Materials Science and Engineering (2019) 22. N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. (2014) 23. V. Nair, G.E. Hinton, Rectified linear units improve restricted boltzmann machines, in Proceedings of the 27 th International Conference on Machine Learning, Haifa, Israel (2010) 24. C. Lu, J. Shi, and J. Jia. Abnormal event detection at 150 fps in matlab, in ICCV (2013)
Convolutional Neural Network-Based Approach for Potholes Detection on Indian Roads Noviya Balasubramanian, J. Dharneeshkar, Varshini Balamurugan, A. R. Poornima, Muktha Rajan, and R. Karthika
Abstract In developing countries like India, soaring count of potholes on roads is a cardinal responsibility, as accidents can occur due to their presence. Thus, it is imperative to detect them to ensure the safety of people. Many research works have been carried out to easily detect them but most of them are uneconomical. Using deep learning algorithms to detect potholes has become popular in recent times. Convolutional neural networks are very effective in identifying objects, and hence, this approach has been adopted. A 4000 image dataset is created using data augmentation techniques on the already existing 1500 image dataset. The created dataset is trained by using FasterR-CNN, SSD, YOLOv3 tiny, and YOLOv4 tiny algorithms. Since, there is always a trade-off between latency and accuracy in object detection algorithms, different architectures are employed. The final results are compared based on mAP. SSD with MobileNetv2 and YOLOv4 tiny outperforms other methods with a precision of 76% and 76.4%, respectively. Keywords Pothole detection · R-CNN · SSD · Data augmentation · YOLO tiny · Image processing
1 Introduction Potholes on the roads are a noticeable huge constructional catastrophe in the plane of a road, which is caused by traffic and bad weather. It has now become a common nuisance on our Indian roadways. Not only do they make roads look rundown and unsightly, but also they pose a threat to the safety of people who travel on these roads. It not only causes damage to vehicles but also causes accidents and severe injuries to anyone who is involved. Potholes are dangerous for pedestrians, bicyclists, and road workers as well. Anyone who uses the road could be injured. According to most N. Balasubramanian · J. Dharneeshkar · V. Balamurugan · A. R. Poornima · M. Rajan · R. Karthika (B) Department of Electronics and Communication Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_20
231
232
N. Balasubramanian et al.
recent figures by a few state governments, around 30 deaths happen each day on the streets because of potholes [1]. The death rate in 2017 has increased by more than 50% from the toll in 2016, i.e. to 3597 cases annually [2]. To curtail the number of accidents and other related losses, it is crucial to accurately detect these potholes and repair them. It is not preferred to detect potholes manually since it charges an exorbitant rate and is time-consuming. Therefore, a plethora of research was carried out to evolve a technology that can expose potholes, which would be a milestone in improving the effectiveness of the survey and quality of pavement by prior inspection and speedy response. One of many reasons for such road accidents is human error since they are unable to see potholes and take impulsive decisions. To overcome this, advanced driver-assistance systems (ADAS) aid drivers in detecting hindrances and potential risks beforehand. ADAS is also responsible for regulating the control, balance, and mobility of the vehicle during precarious situations. ADAS has improved car and road safety through a protected cautious human–machine interface. Technologies like alerting the driver about potential risks, applying safeguards, and even taking control of vehicles during snag have been fabricated to improve safety properties and to get clear steer of road fatalities [3]. Vibration-based methods, three-dimensional (3D) reconstruction-based methods, and vision-based methods are prevailing methods for pothole detection [2]. Each of them has its own downside like high cost or low accuracy. While comparing these three, vision-based methods are superior to other methods in all aspects. The 3D reconstruction needs cost-intensive laser scanners, whereas the vibration-based method is undependable on certain vibration producing surfaces [4]. Besides, the detection of potholes is comparatively bit challenging because of its arbitrary shape and complex geometry. Our paper proposes a vision-based detection method based on convolutional neural network (CNN) using single-shot multibox detector (SSD), FasterR-CNN, YOLOv3 tiny, and YOLOv4 tiny algorithms for the same. The paper is organized as follows: Sect. 2 describes related works in object detection, Sect. 3 introduces the methodology which explains four different algorithms, and the process carried out for data augmentation. In Sect. 4, experiments are conducted to evaluate the proposed methods of object detection and the results are analysed. Finally, in Sect. 5, we present the conclusion and the future work.
2 Related Work In recent times, a surfeit of research has been carried out to evolve a technology that can expose potholes using deep learning methods. CNN is one of the cardinal deep learning methods of object detection. Krizhevsky et al. [5] trained a deep neural network with ImageNet architecture and the results conveyed that CNN is efficacious for large datasets using supervised learning. Thus, the convolutional neural network (CNN) approach is adopted for detecting the objects. Jiménez [2] used
Convolutional Neural Network-Based Approach …
233
Table 1 Comparison of existing methodologies Method
Description
CNN
Efficacious for large dataset using supervised learning. But, inference time is high
FR-CNN
Gives reasonable precision but it is not preferred for real-time usage for object detection
YOLO
Efficient for real-time usage and faster than traditional CNN algorithms. The speed comes with a trade-off, i.e. accuracy
SSD
Higher accuracy compared to YOLO. Performance is poor for minuscule objects
pothole and non-pothole thermal images as input to CNN-based ResNet models for detecting potholes. To subdue inference time in CNN, a more substantial gain is obtained with the inception of a region-based convolutional neural network (RCNN) [6]. Following the emergence of R-CNN, plenty of enhanced models have been propounded, including FasterR-CNN, SSD, YOLO, etc. The structure of the FasterRCNN model comprises two modules. The first, being the regional proposal network (RPN), is a completely convolutional network that fosters object proposals that will be passed onto the next module. Fast R-CNN detector distinguishes proposals from previous modules and reverts a bounding box around the object [7]. Chebrolu et al. [8] used a deep learning model, to execute and spot pedestrians at all times during a day. The work was done using FasterR-CNN. The technique achieved a reasonable precision. Gokul et al. [9] compared the performance of FasterR-CNN and YOLO for spotting traffic lights. The results showed that FasterR-CNN with Inceptionv2 and ResNet-101 outperformed YOLO. But, for real-time usage, YOLO performed better. YOLO has more advantages compared to other object detection algorithms. Unlike other models, YOLO uses the complete image for training and directly enhances prediction performance [10]. An altered DL model for spotting vehicles on roads during blockage times was implemented by Haritha and Thangavel [11]. They have also carried out the comparison between the implemented model with YOLOv2 and also with multiscale ConvNets. Dharneeshkar et al. [12] generated a dataset and trained it to detect potholes using different versions of YOLO. This model detected potholes with reasonable accuracy. YOLOv3 tiny showed the highest precision of 0.76 compared to other models. The precision values of YOLOv3, YOLOv2, and YOLOv3 tiny are 0.69, 0.46, and 0.76, respectively. Though YOLO is faster than FasterR-CNN, SSD runs faster than YOLO and is also more accurate. Liu et al. [13] used a single-shot multibox detector which is a rapid single-shot object detector for various classes with a feed-forward neural network to detect objects in the images. The main motive of SSD is to use midget convolutional filters put in to feature maps for forecasting scores of each category and box offset for a predetermined set of default bounding boxes. The results showed that SSD has good accuracy in contrast to other object detection algorithms. Silvister et al. [14] compare SSD and deep neural network (DNN) and conclude that SSD gives fast real-time detection with better accuracy due to its tolerance in aspect ratio, unlike other DNN.
234
N. Balasubramanian et al.
Each algorithm has its own merits and demerits as shown in Table 1. CNN is efficient for large dataset but is slow. FR-CNN is faster than CNN but is not preferred for real-time usage. YOLO is fast and it can be used for real-time usage as it detects potholes with a good accuracy than the traditional CNN algorithms. SSD gives better accuracy than other algorithms with a good speed. To overcome the demerits of each algorithm, we have trained it using FR-CNN, YOLO and SSD. The proposed work detects potholes with a good accuracy.
3 Methodology In order to carry out pothole detection, an available dataset [12] of 1500 images of potholes is augmented and apportioned into two sets: 80% for training (train set) and 20% for testing (test set). The training set is used as an input to a few proposed CNN algorithms, namely FasterR-CNN, SSD, YOLOv3 tiny, and YOLOv4 tiny for training and validated with test set. Figure 1 presents a schematic flow of the proposed experiment.
Fig. 1 Block schematic of proposed experimentation process
Convolutional Neural Network-Based Approach …
235
4 Dataset Potholes in our country are unique and are different from potholes that are seen elsewhere in the world. Because of this, a dataset of 1500 images has been created [12]. The authors created this dataset by driving in and around Coimbatore. The dataset is a miscellany of pothole images at different angles, distances, lighting, and climatic conditions, which were captured from the dashboard of a car. The images are of good standard. The images are resized to 1024 * 768. Some images are shown in Fig. 2. With these dataset, additionally data augmentation is performed on the 1500 image dataset to increase the size of the dataset and annotation is performed using LabelImg. Data Augmentation: Data augmentation is a technique used to expand the amount of dataset by newly creating data from existing data. It helps us to increase the size of the dataset and introduce variety in the dataset without collecting new data. The neural network treats these images as distinct images. Data augmentation helps
Fig. 2 Images from the [12] dataset
236
N. Balasubramanian et al.
reduce overfitting [2]. Hence, a few amendments have been made with the available data [12] to obtain more data. Various augmentation techniques have been carried out to convert the dataset into a larger dataset to make it better in training and also to prevent overfitting. The data augmentation carried out include: (i) Rotating the image (Fig. 3a): Image of a pothole after the rotation will look like a pothole and will look as if the image was taken from a different gradient [2]. We have rotated the image clockwise by 30°. (ii) Horizontal Flipping (Fig. 3b): It leaves the dimensions of the layer and the pixel information unchanged. Horizontal flip is used for images that appear almost identical if flipped horizontally. (iii) Adding random noise (Fig. 3c): Gaussian noise is added to the image for data augmentation. Gaussian noise has zero mean. It has data points in all frequencies and this distorts the high-frequency features. Adding the right quantity of noise improves the learning capability [15].
(a)
(c) Fig. 3 Some images after data augmentation
(b)
(d)
Convolutional Neural Network-Based Approach …
237
(iv) Blurring the image (Fig. 3d): Blurring of an image involves taking neighbouring pixels and averaging them [2]. This reduces detail and makes the image blur. The dataset size increases and the training of the model become better. (v) Translation of image: Translation of the image means moving the image in a particular direction. It is very useful as objects can be found anywhere in the image. This makes the convolutional neural network to search everywhere and makes the training better. By carrying out the above data augmentation techniques, the 1500 image dataset is converted into 4000 image datasets and object detection algorithms have been used to detect the potholes.
4.1 Proposed CNN Models The train set after splitting is then passed to different CNN-based feature extraction architectures and applied four different algorithms, FR-CNN, SSD, YOLOv3-tiny, YOLOv4-tiny, and trained. Faster Region-based Convolutional Neural Network (FR-CNN): A FasterR-CNN object detection network consists of a feature extraction network followed by two subnetworks. Usually, a pre-trained CNN networks such as ResNet-50 act as the feature extractor. The first among the two subnetworks that follow the feature extraction network is the region proposal network (RPN), which generates proposals to indicate the area of the image in which the object is likely to exist. The next subnetwork is trained to predict the class to which the object belongs as depicted in Fig. 4. Here, FasterR-CNN with ResNet50 is used in which ResNet refers to residual neural networks. Residual networks are preferred because they use skip connections to skip over layers that help prevent the problem of disappearing gradients. This process is done until the weights from the previous layer are learned [7].
Fig. 4 FasterR-CNN architecture [7]
238
N. Balasubramanian et al.
Fig. 5 SSD architecture [13]
Single-Shot Multibox Detector (SSD): As shown in Fig. 5, SSD approach is a simple, single deep neural network, which eliminates all the pre-processing techniques and encapsulates both localization and detection tasks in a single forward sweep of the network. This makes SSD facile to train and simple to consolidate into systems that need detection. The base convolution is derived from an existing image classification architecture, MobileNet-version2, a lightweight deep convolutional neural network that uses depth and point-wise separable convolution that will provide lower-level feature maps. The below mentioned formula is used for computing the scale of the default boxes for all feature maps. Scale (minimum), i.e. the lowest layer, has a value of 0.2 and Scale (maximum), i.e. the highest layer has a value of 0.9. fm feature maps are used for prognostication. Scalek = Scalemin +
Scalemax − Scalemin (k − 1), k ∈ [1, fm] fm − 1
(1)
Therefore, SSD with MobileNetv2 carries out less computation than typical convolution (VGG-16) with only a small reduction in accuracy. During auxiliary convolution, the top of the base network incorporates predictions from a collection of default bounding boxes with respective feature map cells. Prediction convolution is to locate and identify objects in these feature maps by generating a set of fixed-size bounding boxes over different scales to match the shape of the object and scores for the existence of each object category in the respective default box [13]. You Only Look Once (YOLO): YOLO [16, 17] is an object detection system in real time that recognizes various objects in a single enclosure, as illustrated in Fig. 6. Moreover, it identifies objects more rapidly and precisely than other recognition systems. FasterR-CNN has proved to be accurate; however, it is comparatively slow. So, YOLO was built to boost the speed and also to obtain super real-time performance. Since it is completely based on convolutional neural networks, it isolates a particular image into regions and envisions the confined-edge box and probabilities of every region. In YOLO, bounding box and class predictions are done concurrently, and this objective makes YOLO different from other conventional systems. The probability
Convolutional Neural Network-Based Approach …
239
Fig. 6 YOLO architecture [10]
of an object in every bounding box is known as confidence score, which can be calculated as: C = P(object) × IOUtruth predictions .
(2)
The term IOU refers to intersection over union. The value of IOU being close to 1 indicates that predicted bounding box is near to ground truth. YOLOv3 tiny is a simplified version of YOLOv3, with less number of convolutional layers. Hence, the running speed is increased remarkably with a minuscule reduction in accuracy. YOLOv3 tiny uses a pooling layer for feature extraction with Darknet-53 architecture. However, its convolutional layer structure still uses the same structure as YOLOv3 but with reduction in the dimensions of the convolutional layer [18]. Similarly, YOLOv4 tiny is a simplified version of YOLOv4, an object detector that is typically pre-trained on ImageNet classification. The feature extraction in YOLOv4 tiny is done by encapsulating a few convolutional and max-pooling layers from YOLOv4 in a single layer. Based on different parameters, CSPDarknet53 is chosen as the network backbone of YOLOv4 architecture.
5 Experiment The experiment was set up in collaboratory, which is a free cloud service by Google research, based on Jupyter notebooks with free-of-charge access to a robust GPU Nvidia K80s, T4s, P4s, and P100s [19]. The experiment is executed using different modern CNN algorithms such as FasterR-CNN, SSD, YOLOv3 tiny, and YOLOv4 tiny. Each architecture has its own trade-off between latency and accuracy. In order to get a nominal compromise between mean average precision and execution time, different architecture models are chosen and trained. TensorFlow frameworks [20] are used for training models. For FasterR-CNN, ResNet50, a pre-trained weight of 50
240
N. Balasubramanian et al.
convolutional layers and for SSD, MobileNetv2, pre-trained weight of 53 convolutional layers are imported. For YOLOv3 tiny and YOLOv4 tiny, the reduced configuration of Darknet-53 and CSPDarknet53, respectively, is the pre-trained weight for convolutional layers. The number of classes is changed to 1 as we have only one class, i.e. pothole. Due to the low graphic capability of the collab, the batch size is fixed as 6. After amending these changes in the configuration, training is established. For every 200 iterations, the weight file is updated and metric values, i.e. average loss and mAP values are plotted using a TensorBoard. Once the loss graph stabilizes, training can be stopped by interrupting the runtime.
6 Result Loss is a penalty for an incorrect prediction. It is calculated to enhance the algorithm, i.e. finding if the model is overfit, goodfit or underfit. Here in Fig. 7, the plot of loss drops to a stable plot indicating good fit. The exponentially declining graph of average loss while training SSD with MobileNetv2, FasterR-CNN with ResNet-50, and YOLOv4 tiny is shown in Fig. 7a–c, respectively. The amount of distortion in the graph is primarily based on the batch size used. Here, the batch sizes used are 6, 16 and 64 for SSD, FR-CNN and YOLO. Considering the capability of GPU, batch sizes are fixed accordingly. The model is trained until the graph stabilizes to a constant value. Further, the lastly updated weight file is selected for evaluation. Generally, loss is used to compute the model’s performance. Similarly to calibrate the model, mean average precision (mAP) is used to work out the model’s prediction accuracy which can be computed by: Average Precision =
n
Precision( j) × Recall change( j)
(3)
j−1
Here, the mean average precision (mAP) at an intersection over union (IoU) threshold of 50 is examined. Intersection over union is a way to evaluate the area of the bounding box overlapping the ground truth box. The more the IoU, the more the possibility that the object is inside the predicted box. The results, after training the 4000 image dataset on different architectures, are reported in Table 2. The FasterR-CNN with ResNet50 model is trained for 90,000 steps which attains a mean average precision (mAP)@50IoU value of 74% as shown in Fig. 8a and the SSD with MobileNetv2 model is trained for 60,000 steps which attains a mean average precision (mAP)@50IoU value of 76% as shown in Fig. 8b. Similarly, YOLOv3 tiny and YOLOv4 tiny are trained using Darknet and mAP@50IoU of the models are 68% and 76.4%, respectively. The mAP@50IoU values for different algorithms, FasterR-CNN, SSD, YOLOv3 tiny, and YOLOv4 tiny are represented in Table 2. FasterR-CNN with ResNet50 theoretically shows good accuracy, but the accuracy is reduced with 50 convolutional
Convolutional Neural Network-Based Approach …
241
(a)
(b)
(c) Fig. 7 Number of iterations versus average loss a SSD b FasterR-CNN c YOLOv4 tiny
Table 2 mAP comparison of different algorithms
Model
mAP@50 (%)
FasterR-CNN + ResNet50
74
SSD + MobileNetv2
76
YOLOv3 tiny
68
YOLOv4 tiny
76.4
layers. YOLOv3 tiny is a fast object detector yet its accuracy is low compared to other detectors like SSD and FR-CNN. Nevertheless, SSD with MobileNetv2 provides a better speed-accuracy trade-off compared to other object detectors as they use different aspect ratios at a reasonable speed. Besides, YOLOv4 tiny is a fast detector with almost the same mAP value of SSD. Hence, Fig. 8 and Table 2 manifest that SSD and YOLOv4 tiny perform better compared to other object detection algorithms with 76% and 76.4% mAP, respectively. Figure 9a–d shows sample detection results.
242
N. Balasubramanian et al.
(a)
(b)
Fig. 8 Number of iterations versus mAP values a FasterR-CNN b SSD
(a)
(c) Fig. 9 a–d sample detection results
(b)
(d)
Convolutional Neural Network-Based Approach …
243
7 Conclusion Detection of potholes using object detection algorithms will be of great use as it would prevent road accidents. For this motive, we have developed a model to detect potholes using convolutional neural networks (CNN). In this work, four different algorithms are proposed and evaluated against each other. The CNN algorithms we used are FasterR-CNN, SSD, YOLOv3 tiny, and YOLOv4 tiny. In this work, the 1500 image dataset is transformed into a 4000 image dataset using data augmentation. The configuration used for FasterR-CNN is ResNet50 and for SSD is MobileNetv2. For YOLOv3 tiny and YOLOv4 tiny, the reduced configuration of Darknet-53 and CSPDarknet53, respectively, is the pre-trained weight for convolutional layers. Each technique has its own assets and liabilities with different pathways, where SSD with MobileNetv2 and YOLOv4 tiny transcend other architectures in terms of accuracy. Further, the proposed method can be used by integrating a live front view camera with ADAS to detect potholes and alert the driver. It can be implemented in real time by using a front view camera that is attached to the bonnet of a car. The camera starts recording when the car starts moving and divides the video into individual frames which can then be fed to the CNN model that will detect the presence of a pothole and warn the driver. Since SSD and YOLO have proven to be one among the fastest object detection algorithms, with an accuracy of 76.4% and 76% respectively, potholes can be easily detected considering the varying speed of the car. But the proposed approach confronts some limitations too. Potholes may go undetected due to few reasons such as water covered potholes, dark lightning conditions and high speed of the vehicle. Also, potholes can be incorrectly predicted considering the shadow types and the variety of shapes, a pothole can acquire. Thus, in order to overcome such limitations and to predict the presence of potholes more accurately, it is vital to use multiple cameras and to append more features favourable to the suggested model. This can also be integrated with an application to detect the positions of potholes and indicate it on the Google Maps for suggesting the paths with a comparatively lesser number of potholes.
References 1. T. Kim, S.-K. Ryu, Review and analysis of pothole detection methods. J. Emerg. Trends Comput. Inform. Sci. 5(8), 603–608 (2014) 2. Y. Bhatia et al., Convolutional neural networks based potholes detection using thermal imaging. J. King Saud Univ.-Comput. Inform. Sci. (2019) 3. F. Jiménez et al., Advanced driver assistance system for road environments to improve safety and efficiency. Transp. Res. Proc. 14, 2245–2254 (2016) 4. A. Akagic, E. Buza, S. Omanovic, Pothole detection: an efficient vision based method using RGB color space image segmentation, in 2017 40th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) (IEEE, New York, 2017) 5. A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks. Adv. Neural Inform. Process. Syst. (2012)
244
N. Balasubramanian et al.
6. R.L. Galvez et al., Object detection using convolutional neural networks, in TENCON 2018– 2018 IEEE Region 10 Conference (IEEE, New York, 2018) 7. S. Tu et al., Passion fruit detection and counting based on multiple scale FasterR-CNN using RGB-D images. Precis. Agric. pp. 1–20 (2020) 8. K.N.R. Chebrolu, P.N. Kumar, Deep learning based pedestrian detection at all light conditions, in 2019 International Conference on Communication and Signal Processing (ICCSP) (IEEE, New York, 2019) 9. R. Gokul et al., A comparative study between state-of-the-art object detectors for traffic light detection, in 2020 International Conference on Emerging Trends in Information Technology and Engineering (IEEE, New York, 2020) 10. J. Redmon et al., You only look once: unified, real-time object detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016) 11. H. Haritha, S.K. Thangavel, A modified deep learning architecture for vehicle detection in traffic monitoring system. Int. J. Comput. Appl. pp. 1–10 (2019) 12. J. Dharneeshkar et al., Deep learning based detection of potholes in Indian roads using YOLO, in 2020 International Conference on Inventive Computation Technologies (ICICT) (IEEE, New York, 2020) 13. W. Liu et al., SSD: single shot multibox detector, in European Conference on Computer Vision (Springer, Cham, 2016) 14. S. Silvister et al., Deep learning approach to detect potholes in real-time using smartphone, in 2019 IEEE Pune Section International Conference (PuneCon) (IEEE, New York, 2019) 15. C. Shorten, T.M. Khoshgoftaar, A survey on image data augmentation for deep learning. J. Big Data 6(1), 60 (2019) 16. Y. Lu, L. Zhang, W. Xie YOLO-compact: an efficient YOLO network for single category realtime object detection, in 2020 Chinese Control and Decision Conference (CCDC) (IEEE, New York, 2020) 17. R. Zhang et al., An algorithm for obstacle detection based on YOLO and light filed camera, in 2018 12th International Conference on Sensing Technology (ICST) (IEEE, New York, 2018) 18. P. Adarsh, P. Rathi, M. Kumar, YOLO v3-tiny: object detection and recognition using one stage improved model, in 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS) (IEEE, New York, 2020) 19. T. Carneiro et al., Performance analysis of google colaboratory as a tool for accelerating deep learning applications. IEEE Access 6 (2018) 20. M. Abadi et al., Tensorflow: large-scale machine learning on heterogeneous distributed systems (2016). arXiv preprint arXiv:1603.04467
An Efficient Algorithm to Identify Best Detector and Descriptor Pair for Image Classification Using Bag of Visual Words R. Karthika and Latha Parameswaran
Abstract Object identification and classification are important application in the computer vision and image processing domain. In this work, an attempt has been made to analyze various detectors and descriptors for extracting the best features and using bag of visual words (BoVW) for object classification. Experiments have been conducted on three significant different datasets COIL 100, CALTECH 101 and WANG. Features extracted from various detectors and descriptors have been used for classifying the images based on the objects. The best detector descriptor (FAST-SIFT) and optimum cluster size have been empirically determined to achieve high classification accuracy. The results obtained are used for performing further object retrieval. Keywords Bag of visual words · Detectors · Descriptors
1 Introduction Object classification is an important problem in computer vision as indicated by the amount of research on it. The goal is to develop an image classification system reducing the amount of manual supervision required as well as to reduce the computational cost in learning the parameters for classification; the need is to optimize between efficiency and performance. These characteristics are crucial for enabling classifiers to function in real-world applications. Research in vision-based object classification is one that has been growing for many years with different techniques ranging from fully manual to automated. R. Karthika (B) Department of Electronics and Communication Engineering, Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, Coimbatore, India e-mail: [email protected] L. Parameswaran Department of Computer Science and Engineering, Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, Coimbatore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_21
245
246
R. Karthika and L. Parameswaran
Automatic classification of images is challenging or extremely computational, especially in the presence of occlusion, background clutter, intra-class variation, different pose and intensity conditions. Global features are not able to address these challenges. Hence, an improvement of local-invariant features was necessary. The bag of visual words (BoVW) is widely used for keypoint-based representations. In BoVW, similar keypoints are grouped to form a cluster. The number of clusters denotes the size of the vocabulary, which can vary. Therefore, an image is represented as BoVW as a set of keypoints called visual words. Image classification has many applications including security (specifically face recognition), searching images, censoring images and robotics.
2 Literature Survey Many researchers have contributed to feature detection and description on digital images. Some of the significant work has been presented here. The local features of images are identified by using boundaries, regions and points, out of which pointbased feature extraction takes place in two steps—(1) keypoint detection and (2) feature descriptor generation. Harris et al. [1] identified keypoint based on two major directions specified by the Eigen values of the second-order matrix. Harris corner significantly helps in locating the corners by computing the first-order derivatives in the matrix. Lindeberg et al. [2] proposed that the image structures can be represented by a scale space representation at different levels of an image and developed a detector for blob-like features. They have built a scale space representation by smoothing high resolution images with Gaussian-based kernel at different sizes. Features from accelerated segment test (FAST) which increases the computational speed to an extent, which is required for the detection of corners, are proposed by Trajkovi´c et al. [3]. It is based on the properties exhibited by the corners, which characterizes the change in the intensity of the image. A multigrid approach is used to compute the change in arbitrary direction. The improved speed and suppression of false corner are due to the use of multigrid technique. All the features have a common property with the neurons, in the temporal cortex, which are used in the primate vision. Difference of Gaussian (DoG) is used because it accelerates the process of computation. A different approach for the interest point’s detection is proposed by Mikolajczyk et al. [4], and these interest points are not invariant to affine and scale transformations. The detector is based on the results extracted by Harris detector on interest points, the characterized scale by a local extremum for the local structures and the affine shape representation for a point in the neighborhood. Their scale detector enumerates a multi-scale representation for the interest points detected by the algorithm at which a local measure is maximum over scale. Lowe et al. [5, 6] used scale-invariant feature transform (SIFT) for extracting a unique invariant feature from the images which is used to match different views of the object. The features extracted are not invariant to scale and rotation of an image.
An Efficient Algorithm to Identify Best Detector …
247
Ke et al. [7] improved SIFT by adding a principal component analysis (PCA). The local descriptors based on PCA are more distinctive and unique. The PCAbased SIFT makes the vector small, when compared to the vector used in SIFT. Mortensen et al. [8] presented feature descriptor that can supplement SIFT with a vector that can add curvilinear shape information from bigger region. This reduces the mismatches when two or more local descriptors are similar. It provides a better and a concrete method for the 2D non-rigid transformations, at a global scale which can be effectively matched as individuals. This establishes a vector which contains two parts: one is a SIFT descriptor and the other one is a global texture vector. Color provides vital information in the matching task and object description. Abdel-Hakim et al. [9] enhance the SIFT descriptors so that it can be used in a colorinvariant space. A descriptor called speeded up robust features (SURF) to improve the speed is proposed by Bay et al. [10]. Using the integral images and the existing descriptors reduces the complexity of the methods. Each extreme point is split into numerous 4 × 4 subregions, and then a Haar wavelet response is computed for all the subregions. Each and every keypoint is defined with a 64-dimensional feature vector. Morel et al. [11] enhanced ASIFT (Affine SIFT) which simulates a set of sample views of an image obtained by varying two camera axis orientation parameters. This was done because SIFT could not perform to its capacity when it came to images containing affine changes. The ASIFT covers all the six orientation parameters, namely longitudinal and latitudinal angles. It also covers all the six parameters of the affine transform. Matas et al. [12] proposed maximally stable extremal regions (MSER) which is used as a method of blob detection in images. Correspondence is found between image elements from various images. The main contribution is the setup of new regions called extreme regions. Leutenegger et al. [13] developed a BRISK detector which is scale as well as rotation invariant. The speed is improved by using the AGAST corner detector without compromising the detection performance. For scale invariance, BRISK performs a non-maxima suppression and interpolation by detecting keypoints. Rosten et al. [14] developed FAST-n detector has its basics related to wedge model type of a detector. The model is used to train a decision tree classifier and in turn applied to various images. The FAST-enhanced repeatability allows the option for repeatability for the detector. Patch-based image representation is divided into two types such as dense sampling and interest points. In dense sampling, patches of fixed shape and size are mounted on a regular grid. For dense interest points, [10] cover the entire image and compute a spatial relation between features. A combination of both the local feature and the sampling schemes is proposed by Tamaki et al. [15]. DoG and sampling schemes are used to extract feature. Leonardis [16] developed kernel-based recognition method computes the geometric correspondence using pyramid matching scheme, which is introduced by Grauman et al. [17]. In this technique, the image is subdivided into various histograms and finally computing the histograms of all the local features at an increasingly fine resolution. Sivic et al. [18] matches are pre-computed based on descriptors, inverted file systems and documented rankings are used. The results that are obtained interpret
248
R. Karthika and L. Parameswaran
that the retrieval is immediate returning a bunch of key frames ranked in the manner followed by google. Senthilkumar et al. [19] used SURF to classify the logos in vehicles. A method for object recognition based on region of interest (ROI) and optimal bag of words was proposed by Li et al. [20]. The Gaussian mixture model is used to generate the visual codebook. Bosch et al. [21] used the concept of spatial pyramid to the development of an image signature known as pyramid histogram of oriented gradient (PHOG).The image representation is found to be effective in facial emotion recognition and facial component-based bag of words. Karthika.et al. [22] developed a face recognition system invariant to pose and orientation using Gabor wavelet features. Visual vocabularies of varying sizes with SIFT and SURF are evaluated by Schaeffer et al. [23]. The vocabularies constructed using K-means clustering and self-organizing map are compared in terms of accuracy for their own dataset. Performance of two keypoint descriptors in the context of pedestrian detector is discussed by Kottman et al. [24]. FREAK achieved a maximum accuracy using bag of words model. Based on the literature survey, it is observed that many researchers concentrated to develop detectors and descriptors to extract features from images. This proposed work aims to develop a technique to find the best detector and descriptor combination for image classification.
3 Proposed Algorithm The proposed algorithm to find the best detector–descriptor combination is explained here. The notations used are as follows: Set of key points K = K 1 , K 2, K 3 . . . K n Feature VectorFv = {F11 , F12 , F13 , . . . F1m } {F21 , F22 , F23 , . . . F2m } ... {Fn1 , Fn2 , Fn3 , . . . Fnm Algorithm for image classification using bag of visual words: For every detector i e {BRISK, DENSE, MSER, FAST, GRID, SIFT, SURF}. For every descriptor j e {SIFT, SURF}. 1. 2. 3.
Extract the keypoints (K) of all the images in (I) using the detector i Obtain the descriptor feature vector F v for each key point in K using the descriptor j For each cluster size C = 10, 20, 40, … maximum cluster size
An Efficient Algorithm to Identify Best Detector …
249
Fig. 1 Block diagram representation of BoVW algorithm
(a)
(b)
(c)
Fig. 2 a Keypoint detection using SIFT detector. b Keypoint detection using FAST detector. c Keypoint detection using BRISK detector
a.
Input feature descriptor (F v) are clustered for the set of images (I) using the K means clustering to construct the vocabulary of visual words {V = V 1, V 2, V 3 . . . V w}
b.
Perform a histogram plot for the occurrence of vocabulary of Words (V ) using the feature descriptor (F v )
4. 5.
Apply SVM classifier using RBF kernel. Analyze results to identify best detector–descriptor pair for image classification.
Figure 1 shows different stages in the BoVW algorithm. Figure 2a–c shows the keypoint detection using different detectors.
3.1 Contribution in This Work Is Given Below Detector and Descriptor: There have been several benchmark studies on the detectors and descriptors. In this work, we have extensively studied and extracted the image features using all possible detector and descriptor combinations for classifying the images which has shown the efficacy in classification accuracy. An efficient way to select the right combination of detectors and descriptor is addressed.
250
R. Karthika and L. Parameswaran
Nature of images: Most authors have tested the algorithms on set of images of a particular type or nature; whereas, in this work, we have experimented on three different datasets with varying background and large inter-class variations of object. Cluster size: In the bag of visual words, construction identifying the best cluster size for different detectors and descriptor is addressed.
3.2 Evaluation Metrics A confusion matrix is constructed with four essential parameters: true positive (TP), true negative (TN), false positive (FP) and false negative (FN). These four parameters allow more detailed analysis of the classification system. We have selected two parameters such as true positive rate and accuracy of each detector and descriptor combination, for analysis and decision making. True positive rate (TPR) is the proportion of positive cases that were correctly identified TPR =
TP TP = P TP + FN
(1)
Accuracy (ACC) is the proportion of the total number of predictions that were correct. ACC =
TP + TN TP + TN + FP + FN
(2)
4 Experimental Results For this experimentation, three datasets, namely COIL-100 [25], CALTECH-101 [26] and WANG [27] have been used. Testing has been done using the identified detectors and descriptors by varying the cluster size of bag of visual words. The first dataset used is the Columbia Object Image Library COIL-100 dataset [25]. It contains a black background that has 100 objects which are placed on a motorized turntable. This table is rotated from the angles 0˚ to 360˚ at the intervals of 5˚, thus taking 72 images per object. Hence, this database contains 7200 images of the 100 objects. For our experiment, we have used 57 samples per class for training and 15 samples per class for testing. All the 100 classes were tested and trained. The second dataset used in this experimentation is CALTECH-101 [26], which contains 101 objects in a different background. Each object consists of 31–800
An Efficient Algorithm to Identify Best Detector …
251
images. In this experiment, ten classes have been chosen randomly from CALTECH101 for object classification. In that, 70 images were used for training and 30 images were used for testing. The third dataset used in the experiment is the Wang dataset [27] which is a subset of 1000 images taken from the Corel stock photo database. These images have been manually selected to form ten classes of 100 images each. From this, 80 samples were used for training and 20 samples were chosen for testing. The training has been done using support vector machine (SVM) with the RBF kernel.
4.1 Inference Based on Accuracy of Classification The following observations have be made based on the dataset from Table 1. The most accurate classifications for the WANG dataset, which contains 1000 images in ten classes, occur when the FAST detector is used with SIFT/SURF as the descriptor. On the other hand, the GRID detector produces the most inaccurate results of the chosen detectors. Its most accurate results are achieved with a minimum cluster size of 160. The BRISK, MSER and SIFT detectors produce slightly better results, while the accuracy of the DENSE detector is almost on par with the FAST detector for the WANG dataset. The COIL-100 dataset, which contains 7200 images of 100 classes, is classified with higher accuracy by all of the given detector and descriptor combinations and by varying cluster sizes. The FAST detector produces the most accurate results with both SIFT and SURF detectors, with the DENSE detector once again coming in a close second. The accuracy of the detectors peaks with a cluster size of 320, showing no noticeable increase with corresponding increase in cluster size. The CALTECH-101 dataset, containing 1000 images in ten classes, is classified best by the FAST detector with SIFT/SURF as the descriptors. The BRISK and DENSE detectors used in combination with the SIFT/SURF descriptor produce poor results as does the MSER detector when used with the SURF descriptor. The SIFT/SURF detector produces its most accurate results with a minimum cluster size of 320/160, respectively. The following observations have be made based on the detectors from Table 1. The accuracy is higher for the SIFT/SURF detector seen in the case of the COIL100 dataset, with both SIFT and SURF descriptors and with a minimum cluster size of 160. For the WANG dataset, the SIFT/SURF detector produces higher accuracy in combination with the SIFT descriptor and with a maximum cluster size of 320. The GRID detector produces better accuracy for the COIL dataset with a minimum cluster size of 160. While classifying the CALTECH-101 dataset, the GRID detector performed better in combination with the SIFT descriptor and with a maximum cluster size of 640. As with the other detectors, the DENSE detector produced better accuracy for the COIL dataset, with a minimum cluster size of 160. Classification by the DENSE detector of the WANG dataset is appreciable when used in combination
* Denotes
0.89
0.88
SIFT
SURF
0.99
SURF
160
640
320
320
640
1280
0.88
0.89
0.99
0.99
0.92
0.94
320
320
160
160
160
640
Cluster size
0.91
0.92
0.99
0.99
0.93
0.94
Value
FAST
640
1280
320
320
320
1280
Cluster size
SIFT as a detector and descriptor and SURF as a detector and descriptor
CALTECH
0.99
SIFT
0.88
SURF
COIL
0.90
SIFT
Value
WANG
DENSE
Value
Cluster size
BRISK
Descriptor
Dataset
Table 1 Accuracy of classification on the three datasets
0.89
0.91
0.99
0.99
0.86
0.89
Value
GRID
160
640
160
160
160
160
Cluster size
0.88
0.91
0.99
0.99
0.88
0.92
Value
MSER
1280
640
160
160
160
640
Cluster size
0.91
0.91
0.99
0.99
0.90
0.91
Value
SIFT/SURF*
160
320
160
160
320
320
Cluster size
252 R. Karthika and L. Parameswaran
An Efficient Algorithm to Identify Best Detector …
253
with the SIFT descriptor and with a maximum cluster size of 640. Despite being less accurate than the other detectors, the BRISK detector produces better results when classifying the COIL dataset with the SIFT descriptor. While classifying the CALTECH-101 and WANG datasets, the BRISK detector produces higher results in combination with the SIFT descriptor and with cluster sizes of 640 and 1280, respectively. The FAST detector consistently shows better accuracy value across the three datasets when compared to the other detectors, classifying the COIL 100 dataset, and the highest accuracy is obtained with a minimum cluster size of 320. When classifying the WANG and CALTECH 101 datasets, the FAST detector produces its better results in both cases with the SIFT descriptor and with maximum cluster sizes of 1280. The MSER detector produces results of middling accuracy compared to the other detectors, with its own best performance being classifying the COIL dataset with a minimum cluster size of 160. It shows better results with the CALTECH 101 dataset when used with the SIFT descriptor and with a maximum cluster size of 1280.
4.2 Inference Based on TPR of Classification The following observations have been made based on dataset from Table 2. As far as the classification of the WANG dataset is concerned, FAST detector gives the highest TPR, with the DENSE detector showing the second highest. Using SURF as the descriptor, DENSE detector shows high TPR in cluster of size as small as 160. GRID detector performs poorly when classifying the WANG dataset regardless of the descriptor used. The COIL dataset is classified with high TPR by all the detectors with the FAST and GRID detectors performing best and worst, respectively. SIFT descriptor produces a low TPR for this dataset when paired with the GRID detector. The SIFT/SURF detector–descriptor combination performs well with this dataset, achieving the second highest TPR after the FAST detector. The FAST detector has the highest TPR of all the detectors when classifying the CALTECH dataset. The SIFT descriptor shows low TPR when used with the BRISK detector. The SURF descriptor performs poorly in combination with the MSER detector. When SURF is used as both detector and descriptor, it has a TPR almost as high as that of the FAST detector and achieves this at cluster size as low as 160. The following observations are made based on detectors: The BRISK detector shows excellent TPR in classifying the COIL dataset with SIFT as the descriptor, with cluster size of 1280. For both CALTECH and WANG, the BRISK detector achieved good TPR for each at cluster size as low as 160 when the SURF descriptor is also used. The DENSE detector performs well in general producing its highest TPR with the COIL dataset and the SIFT descriptor. It shows high TPRs at cluster sizes as
CALTECH
0.42
SURF
0.40
0.46
SURF
SIFT
0.53
SIFT
0.41
SURF
COIL
0.52
SIFT
WANG
160
640
320
1280
160
1280
0.43
0.47
0.73
0.77
0.62
0.66
Value
Value
Cluster Size
DENSE
BRISK
Descriptor
Dataset
320
320
160
160
160
640
Cluster Size
Table 2 True positive rate for classification on the three datasets
0.55
0.62
0.84
0.84
0.65
0.70
Value
FAST
640
1280
320
320
320
1280
Cluster Size
0.45
0.52
0.25
0.70
0.27
0.47
Value
GRID
160
640
320
320
160
160
Cluster Size
0.40
0.53
0.67
0.72
0.55
0.6
Value
MSER
1280
640
320
640
640
640
Cluster Size
0.53
0.53
0.82
0.82
0.52
0.55
Value
SIFT/SURF*
160
320
640
320
320
320
Cluster Size
254 R. Karthika and L. Parameswaran
An Efficient Algorithm to Identify Best Detector …
255
low as 160 for both the COIL and CALTECH datasets. For the Wang dataset, the DENSE detector has a high TPR for cluster sizes up to 640 when used alongside the SIFT descriptor. The FAST detector had the highest TPR of all the detectors, performing its best with the COIL dataset. It hits its highest TPR for this dataset at cluster sizes starting from 320. When used along the SIFT descriptor for the WANG and CALTECH datasets, the FAST detector had high TPR with cluster sizes of up to 1280. The GRID detector had its best and poor performances with the COIL dataset, when used alongside the SIFT and SURF descriptors, respectively. The GRID detector produces low results for the WANG dataset with the combination of SUFF descriptor. The GRID detector works better with the SIFT descriptor, showing a high TPR at a cluster size as low as 160 with the WANG dataset and as high as 640 with the CALTECH dataset. When classifying the COIL dataset, the MSER detector produced the highest TPR when used with the SIFT descriptor, and with the SURF descriptor, it reached its highest TPR at cluster size starting from 320. With the CALTECH dataset, the detector showed high TPR with cluster size up to 1280 when used with the SURF descriptor. When classifying the COIL dataset, the SIFT/SURF detector shows the same TPR with both SIFT and SURF descriptor. It shows its highest TPR for the WANG dataset at a cluster size of 320 irrespective of descriptor. For the CALTECH dataset, SURF detector achieved its highest TPR with the SURF descriptor and with cluster size as low as 160.
5 Discussion In all the three datasets, SIFT descriptor performance is better than SURF descriptor. In SIFT descriptor, the features are invariant to image rotation, scaling and partially invariant to different illumination. The unique interest points are identified by using difference of Gaussian function which lead to invariance in scale and orientation. Keypoints are selected based on their stability. The cascade filtering approach reduces the cost of extracting the features. COIL dataset performed well with all the evaluation metric parameters because it does not contain the background and all the detectors are able to classify the objects with different rotation. In all the three datasets, FAST detector outperformed all the detectors. We can infer that the FAST detector performs superiorly when compared to other detectors. If the intensities of 16 contiguous pixels are darker or brighter beyond a threshold, then the region will be defined as a corner and these points lie on distinctive highcontrast regions. Corners are very important feature because of their ability to show two-dimensional change, and FAST detector prioritizes the detection of corners over edges by providing the advantage of a good computational efficiency.
256
R. Karthika and L. Parameswaran
Hence, it may be inferred that FAST could be the best detector and SIFT as the best descriptor to extract features from images. Even though the cluster size is varying, there is no much significance in the accuracy achieved. These experiments have been conducted by varying the cluster size from 10 to 2560 in steps of product of 2 (10, 20, 40, 80, 160, 320, 640, 1280, 2560). It is observed that when the cluster size is 1280, the evaluation metric parameters converge. These parameters are achieved with larger or equal cluster sizes using SIFT as a descriptor than the SURF descriptor. Yang et al. [28] using compression network that encodes SIFT-based object histogram achieved an accuracy of 95% in COIL dataset [29]. COIL 100 with eight views exhibit an accuracy of 92.5% using nearest prime simplical complex approaches. An accuracy of about 99% has been achieved by using FAST detector and SIFT/SURF descriptor combination.
6 Conclusion Expressing images using features is significant to many applications including object recognition and retrieval. However, many researchers have developed feature detectors and feature descriptors for various purposes, and there is not much of work to find best detector and descriptor to classify images of various types. This research work has attempted to identify the best feature descriptor–detector to represent the given image so that it is useful for image classification. We have achieved the accuracy of 99% in COIL dataset. As an extension of this work, it can be used for recognizing objects in a given image and retrieving images based on similarity. These feature descriptors and detectors can be designed as a filter in convolutional neural network to extract features as required, and hence, it will serve as the basis for deep learning.
References 1. C. Harris, M. Stephens, A combined corner and edge detector, in Alvey Vision Conference, vol. 15, No. 50 (1988), pp. 10–5244 2. T. Lindeberg, Feature detection with automatic scale selection. Int. J. Comput. Vision 30(2), 79–116 (1998) 3. M. Trajkovi´c, M. Hedley, Fast corner detection. Image Vis. Comput. 16(2), 75–87 (1998) 4. K. Mikolajczyk, C. Schmid, Scale & affine invariant interest point detectors. Int. J. Comput. Vision 60(1), 63–86 (2004) 5. D.G. Lowe, Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004) 6. D.G. Lowe, Object recognition from local scale-invariant features, in Computer Vision, 1999. The Proceedings of the Seventh IEEE International Conference on, vol. 2 (IEEE, 1999), pp. 1150–1157 7. Y. Ke, R. Sukthankar, PCA-SIFT: a more distinctive representation for local image descriptors, in Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on, vol. 2 (IEEE, 2004), pp. II–II
An Efficient Algorithm to Identify Best Detector …
257
8. E.N. Mortensen, H. Deng, L. Shapiro, A SIFT descriptor with global context, in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 1 (IEEE, 2005), pp. 184–190 9. A.E. Abdel-Hakim, A.A. Farag, CSIFT: a SIFT descriptor with color invariant characteristics, in Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, vol. 2 (IEEE, 2006), pp. 1978–1983 10. H. Bay, A. Ess, T. Tuytelaars, L. Van Gool, Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110(3), 346–359 (2008) 11. J.M. Morel, G. Yu, ASIFT: a new framework for fully affine invariant image comparison. SIAM J. Imaging Sci. 2(2), 438–469 (2009) 12. J. Matas, O. Chum, M. Urban, T. Pajdla, Robust wide baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22(10), 761–767 (2004) 13. S. Leutenegger, M. Chli, R.Y. Siegwart, BRISK: binary robust invariant scalable keypoints, in Computer Vision (ICCV), 2011 IEEE International Conference on (IEEE, 2011), pp. 2548–2555 14. E. Rosten, R. Porter, T. Drummond, Faster and better: a machine learning approach to corner detection. IEEE Trans. Pattern Anal. Mach. Intell. 32(1), 105–119 (2010) 15. T. Tamaki, J. Yoshimuta, M. Kawakami, B. Raytchev, K. Kaneda, S. Yoshida, Y. Takemura, K. Onji, R. Miyaki, S. Tanaka, Computer-aided colorectal tumor classification in NBI endoscopy using local features. Med. Image Anal. 17(1), 78–100 (2013) 16. A. Leonardis, H. Bischof, A. Pinz, Computer Vision–ECCV 2006 9th European Conference on Computer Vision, Graz, Austria, May 7–13, 2006, Proceedings, Part IV. In Conference proceedings ECCV (2006), p. 333 17. K. Grauman, T. Darrell, The pyramid match kernel: discriminative classification with sets of image features, in Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, vol. 2 (IEEE, 2005), pp. 1458–1465 18. J. Sivic, A. Zisserman, Video Google: a text retrieval approach to object matching in videos. In: Null (IEEE, 2003), p. 1470 19. T. Senthilkumar, S.N. Sivanandam, Logo classification of vehicles using SURF based on low detailed feature recognition. Int. J. Comput. Appl. 3, 5–7 (2013) 20. W. Li, P. Dong, B. Xiao, L. Zhou, Object recognition based on the region of interest and optimal bag of words model. Neurocomputing 172, 271–280 (2016) 21. A. Bosch, A. Zisserman, X. Munoz, Representing shape with a spatial pyramid kernel, in Proceedings of the 6th ACM International Conference on Image and Video Retrieval (ACM, 2007), pp. 401–408 22. R. Karthika, L. Parameswaran, Study of Gabor wavelet for face recognition invariant to pose and orientation, in Proceedings of the International Conference on Soft Computing Systems (Springer, New Delhi, 2016), pp. 501–509 23. C. Schaeffer, A Comparison of Keypoint Descriptors in the Context of Pedestrian Detection: FREAK Vs (SURF vs. BRISK, Cité En p, 2013), p. 12 24. M. Kottman, Performance evaluation of visual vocabularies for object recognition. Inform. Sci. Technol. Bulletin ACM Slovakia4(2) (2012) 25. www.cs.columbia.edu/CAVE/software/softlib/coil-100.php 26. www.vision.caltech.edu/Image_Datasets/Caltech101/ 27. https://wang.ist.psu.edu/docs/related/ 28. A.Y. Yang, et al., Multiple-view object recognition in smart camera networks, inDistributed Video Sensor Networks (Springer, London, 2011), pp. 55–68 29. J. Zhang, Z. Xie, S.Z. Li, Nearest prime simplicial complex for object recognition (2011)arXiv: 1106.0987
GUI-Based Alzheimer’s Disease Screening System Using Deep Convolutional Neural Network Himanshu Pant, Manoj Chandra Lohani, Janmejay Pant, and Prachi Petshali
Abstract Brain is the prime and complex organ of the nervous system in all osseous and boneless organisms. Generally, it is positioned close to the sensory organs in the head. Brain is made up of trillions of networks called synapses that are used to connect and communicate with more than 100 billion cells in the human body. When the brain cells are required to be degenerated, and about to die, then Alzheimer’s disease might come into the existence. Alzheimer’s disease (AD) is a progressive disorder that will destroy the brain nerves. This order will profoundly lead to a continuous memory loss, which will adversely affect the mental health and daily routines. Also, Alzheimer’s disease remains as one of the main causes for dementia. So, early and accurate detection and diagnosis of Alzheimer’s disease remain as the most significant research area. Normally, doctors will identify and observe the AD from the visualization of the brain magnetic resonance imaging (BMRI) with naked eye. To resolve these challenges, authors have proposed and designed a graphical user interface-based AD screening system that select, detect, and predict Alzheimer’s disease classes by using deep convolutional neural networks (DCNN). For this purpose, brain magnetic resonance imaging (BMRI) has been performed. These MRI images are categorized into four classes: mild demented, moderate demented, very mild demented, and nondemented Alzheimer’s disease. In this paper, deep convolutional neural networks (DCNN) offer a saner solution in numerous disease controls for neurological disorder with a high precision on validation accuracy. GUI-based screening system will identify the healthy or non-demented class accurately so that anyone can easily diagnose the accurate type of Alzheimer’s disease. Keywords Alzheimer’s disease · Neurological disorder · MRI · Convolutional neural network · Disease screening system
H. Pant (B) · M. C. Lohani · J. Pant · P. Petshali Graphic Era Hill University Bhimtal, Nainital, Uttarakhand, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_22
259
260
H. Pant et al.
1 Introduction Brain tissues damage is the main cause of Alzheimer’s disease. It has a certain progressive pattern that shrinks the size of hippocampus and medullary wrapper of the brain and also expands the ventricles [1]. When the hippocampus size is reduced then spatial memory and episodic memories which are the parts of the brain is damaged. These two memories link the connections between the brain and rest of the body. Anymore neurons cannot communicate to the other cells via synapses. It is appraised that in the developing country the dispersion of this Alzheimer’s disease to be around 5% after 65 years and approximate 30% after 85 years old. It is expected that around 65 crore peoples will be detected with AD by 2050 [2]. Alzheimer’s is the most common cause of dementia. It is a general term for neurological brain disorder and cognitive and short-term memory loss. Due to this uncertainty, it imposes a wicked impact on the performance of routine life tasks such as writing, speaking, and reading. Till now there is no cure of AD but patients can reduce the disease symptoms by early stage diagnosis. Brain magnetic resonance imaging (BMRI) is the one of the most appropriate tools to diagnose the AD [3]. It is a common practice for Alzheimer’s disease diagnosis in clinical research. When the human brain disorder starts, then all the indicators of this disease slowly grow affectively with the passage of time. Alzheimer’s disease (AD) detection and classification is a challenging task because probably brain MRI report of Alzheimer’s disease person may be found normal. Prompt detection of Alzheimer’s disease can help to diagnose properly and hence might be prevent brain tissue damage. Early detection of Alzheimer’s disease is taking out from the bunch of brain MRI report data along with standard healthy MRI report of older people. Magnetic resonance imaging (MRI) provides the feasibilities to study neurotic brain tissues changes that are associated with AD in vivo [4]. There are four major stages in Alzheimer’s disease—non-demented, mild demented, very mild demented, and moderate demented Alzheimer’s disease. Alzheimer’s disease (AD) is not accurately detected until a patient reaches in moderate demented stage. But proper treatment and protect brain tissue damage in early stage can prevent the critical condition of this disease. Earlier Alzheimer’s disease is carried out by the clinical experts in elegant ways by means of powerful magnetic resonance imaging techniques which are impossible to detect by human’s naked eyes. It is necessary to identify and classify the infected MRI images from the group of demented and non-demented Alzheimer’s MRI dataset in early stage. But due to the lack of early detection of the disease diagnosis and screening system, persons tolerate a huge expansion loss in every year [5]. Therefore for the common person, an accurate, very fast, an automated, precise, and proficient disease screening system is required. Numerous valuable research works have been done to diagnose and classify different stages of Alzheimer’s disease [6]. Recently in image classification and
GUI-Based Alzheimer’s Disease Screening System …
261
computer vision, the deep learning-based convolutional neural networks (CNNs) algorithms have been applied widely [7]. Joint hippocampal segmentation and pathological score regression using brain MRI scans model were proposed on multi-task deep learning (MDL) method [8]. To extract the features from three-dimensional clinical brain MRI images for the classification, deep 3D CNNs were used [9]. Brain MRI dataset requires enormous computational resources and a huge amount of labeled data to train a deep convolutional neural network. Usually, the size of available brain MRI datasets is typically very small compared with the required datasets used in computer vision techniques for disease diagnosis. Due to this a lesser amount of the dataset, it becomes a major challenge to learn the large number of trainable parameters in deep convolutional neural network [10]. In this research, researchers had considered the four types of demented Alzheimer’s disease along with the non-demented disease and proposed a graphical user interface screening system based on deep convolutional neural network for Alzheimer’s disease (AD) identification. This paper also proposed a disease detection and estimation model that moderated the imprecise and erroneous manual disease diagnose. Proposed screening model improves the training and validation accuracy, viability and proficiency of the algorithm. This paper recommended a graphical user interface-based screening system which detect the various types of Alzheimer’s disease from brain MRI and classify in four different Alzheimer’s demented classes. Medical experts professionally analyze the brain MRI report data and then distinguish the category of Alzheimer’s disease present in the MRI report. This paper inspirations by using numerous computer vision techniques to detect and classify four different set of demented Alzheimer’s disease from the brain MRI images. Three block baseline convolutional neural networks (BCNN) are applied for the same. The primary objective of this consideration was the collection of a sufficient brain MRI (BMRI) images data from various existing open source datasets. This collected MRI dataset is further divided in four different categories and labels. The work flow describes the proposed GUI-based system Alzheimer’s disease screening system architecture as shown in Fig. 1. This research paper follows the given structure: In Sect. 1, authors described the theoretical foundations and related works for the proposed Alzheimer’s disease screening model. Collected dataset descriptions and data preprocessing are provided in material and methods category in Sect. 2. Section 3 delivers the result orientedbased experimental performance on AD image dataset using deep convolutional neural network. In Sect. 4, authors proposed and create a graphical user interfacebased disease screening model to predict the correct category of AD using feature extracting technique with their hidden layers in CNN model. Conclusion of proposed screening model and future scope for this project and research is scheduled in Sect. 5.
262
H. Pant et al.
Fig. 1 Alzheimer’s disease detection system design
2 Methodology 2.1 Research Design In this proposed methodology, the neural network model depends on three prime steps. The first step is to collect the brain MRI data, preprocessing, and augmentation on that data. The second phase is to extract the features from input MRI images, and the third step is to classify and detect one of the four demented AD class. The various steps performed by the disease detection model describe in the given work flow architecture as shown in Fig. 2. This paper established a baseline convolutional neural network approach based on VGG16 architecture for AD classification and prediction. The experimental dataset is based on the brain magnetic resonance imaging.
2.2 Dataset Descriptions Precise data collection for the research purpose plays a stimulating and significant role in deep convolutional neural network. For that purpose, Alzheimer’s disease dataset was acquired from the one of the authorized open source database, which is publicly available for the disease detection competition on the website [11]. These datasets were categorized in four different types of demented AD classes. These categorized dataset have to build in four labeled form. These four labels are the four types of demented AD class which were used to classify and predict Alzheimer’s disease. The finishing dataset consists of four classes of AD having 6423 brain MRI images. The MRI images dataset distributes in training and testing phases, respectively, describe in Fig. 3. For the disease detection task, brain MRI image dataset is randomly split into 80% for training and 20% for testing phase. The proposed work used four sets of demented brain MRI images, namely mild demented AD, moderate demented AD, non-demented AD, and very mild demented AD.
GUI-Based Alzheimer’s Disease Screening System …
263
Fig. 2 Work flow architecture for AD detection
Fig. 3 Distribution of different AD MRI images in classes
2.3 Data Preprocessing The collected brain MRI reports are initially in the form of un-annotated and unlabeled. Four multiple directories for four types of AD are associated with it. Authors performed the labeling and annotation process of all the images of four categories by their directory name with a word “MILD,” “MOD,” “NonD,” and “VMD,” respectively. Initially, the four types of collected MRI are in the form of RGB color channels. All the collected MRI images have diverse measurements, sizes, and figures. It is difficult to extract the features from group of different sizes of images. So it is required to reshape all the brain MRI images in the matching size, form, and dimensions in image processing. Annotation is applied for all the brain images for the classification
264
H. Pant et al.
Mild Demented AD
NON- Demented AD
Moderate Demented AD
Very Mild Demented AD
Fig. 4 Sample images for MRI demented AD
and detection. Python Keras API and tensor flow interface are used to accomplish the standard same and equal size of images in the measurement of 200 × 200 pixels photographs [12]. Image edge enhancement and noise reduction can be done by using image filtering techniques of image processing. It improves the accuracy of the model after preprocessing on the training images. Digital image filter for the noise reduction can be applied by taking the convolution of the images with the small kernal of the size n × n square matrix (where n is any odd number). Some standard fix pixel-sized sample images in all the four categories are described in Fig. 4. Image data generator class and flow_from_directory API tools of keras are used to separate all four brain MRI class into respective directories. This API class generates a discrete train and test folders within same directory in which all four subdirectory of each class is formed [13]. The experimental performances of brain MRI for Alzheimer’s disease dataset were assessed by applying three block VGG-16 model-based baseline convolution neural network model. This VGG-16-oriented model classifies and identifies four types of demented AD from the bunch of brain MRI reports data. Authors proposed a deep learning-based image classification technique that simulate brain MRI data and recognize disease spots from MRI.
GUI-Based Alzheimer’s Disease Screening System …
265
In this disease detection screening system, experimental setup was accumulates by using 3 × 3-sized convolutional layers filters. These filters are used to learn the image features and different portions from an image. This filter matrix will slide from left to right and top to bottom. After this convolutional filter, a max pooling layer is applied. This max pooling layer is used to perform an operation that calculates maximum values of each feature map with respect to each patch. Single convolutional filters and single maximum pooling layer together form a single convolutional block. When we increase the number of filters in the neural network for AD, the number and size of each convolutional block are increased. These convolutional blocks can recall in the entire neural network. The CNN kernel is used to process addition of pixels in an input image with the help of padding technique. Authors are motivated from the VGG-16 architecture and then applied the classification technique on brain MRI dataset for the four types of demented Alzheimer’s disease (AD). This disease screening system accomplishes a classification task and forecast the types of AD based on brain MRI image data. After performing linear computation, every layer is triggered with a nonlinear activation function. In this proposed baseline convolutional neural network, model is activated by rectified linear unit (ReLU) function. The uniform weights from the range (−limit to +limit) are initialized by using “he_uniform” weight initializer to set the initial random weights in the proposed Keras model layers. To detect Alzheimer’s disease (AD) using baseline CNN, the total available parameters in trainable and non-trainable set were described in Fig. 5. This sequential model has total 10,334,021 parameters in this set. Stochastic gradient descent optimizer is an iterative algorithm that is used in activation function to jumps from a stochastic (random) point on a nonlinear activation function and moves its slope in down phases until the points reaches the lowermost point of that activation function [14]. Author used a standard 0.001 Conservative learning rate and 0.9 momentums to train this sequential model. Brain MRI images are trained by using train iterator method, and dataset can be validate by applying test iterator method of keras to fit the disease screening system for classification and prediction. 81 steps are used in every 200 epoch for the training and testing the deep convolutional sequential neural model. Each training steps can
Fig. 5 Sequential model parameters and neural network layers
266
H. Pant et al.
be calculated with respect to one epoch used in model by dividing total number of training and total number of testing brain MRI AD images in all four categories by the described batch size [15].
3 Results VGG-16-based baseline sequential Alzheimer’s disease detection screening system can be evaluated by performing unseen testing and validating data over training MRI dataset. This performance is evaluated by measuring validation accuracy with respect to training accuracy. This validation accuracy is calculated by applying three blocks VGG-16 architecture. Three-block VGG-16 architecture is the extended version of one- and two-block architecture. This block architecture is achieved by increasing convolutional layers and pooling layers from one-block and two-block VGG models [16].
3.1 Research Findings In this proposed architecture, the size of dataset is comparatively small to train and test the model. So it is required to augment the dataset from the existing dataset without changing the MRI image themes. This modification in the images is done by image data augmentation technique of image processing. This technique is performed by applying numerous operations on existing brain MRI dataset like cropping the images, image padding, image transformation techniques (rotation, scaling, and translation), and flipping to generate new dataset [17]. This technique is performed by shifting the existing images randomly in either X-direction or Y-direction in a plane. After applying augmentation on the existing image dataset, the performance of this baseline convolutional classification model is represented by the mathematical arrangements of rows and columns, i.e., a matrix. This matrix is known as classification matrix or confusion matrix [18]. This classification matrix is used to visualize the performance in the form of number of rows and columns. On the basis of confusion matrix, convolutional model can also describe the classification report to achieve the highest performance in the terms of precision, recall, and F1 score [19]. Precision = (True Positive)/(True Positive + False Positive)
(1)
Recall = (True Positive)/(True Positive + False Negative)
(2)
F1 Score = 2 ∗ (Precision ∗ Recall)/(Precision + Recall)
(3)
GUI-Based Alzheimer’s Disease Screening System …
267
Table 1 Multiclass confusion matrix and classificatıon report
The multiclass confusion matrix and classification report for AD detection and classification of this proposed screening system were described in Table 1. To achieve excellent performance of three block sequential model, hyperparameters and fine tuning is performed [20]. Model gets 97.16% training accuracy and 94.91% validation accuracy with respect to 200 epoch and 81 steps as shown in Fig. 6. Once this sequential model gets trained by input images, there may be a chance of overfitting or underfitting. This can be controlled and visualized by exploration of the classification accuracy and cross-entropy loss with respect to each epoch in Fig. 7. This classification accuracy is much better than other traditional machine learning and convolutional models performed on Alzheimer’s disease detection and classification [21].
Fig. 6 Training and validation classification accuracy
268
H. Pant et al.
Fig. 7 Classification model performance and cross-entropy loss with respect to number of epoch
3.2 GUI-Based Alzheimer’s Disease Screening System Once the proposed convolutional model is got developed, then it required to deploy to the end-user. Different types of demented Alzheimer’s disease can be detected and select accurately using a web-based graphical user interface applications. This web application for disease screening system is developed using keras model and flask API. Screening system is trained by sequential convolutional model and then makes graphical user interface for AD type classification. This graphical user interfacebased screening system is categorized in three submodules. These modules are: • Alzheimer’s Disease home page • Brain MRI Random Image selection dialog box • Predict desired demented AD detection page. 3.2.1
Alzheimer’s Disease Home Page
After development of baseline sequential model for training and validating, authors proposed a HTML web page for AD detection using flask API to interact with training and testing databases [22]. In the coding part, authors set the home page using the name index.html as shown in Fig. 8. Flask API framework has an inbuilt lightweighted web server; hence, it required minimal configuration to execute [23]. Flask server is internally managed by the own model.
GUI-Based Alzheimer’s Disease Screening System …
269
Fig. 8 Web-based AD screening system home page
3.2.2
Brain MRI Random Image Selection Dialog Box
Once the home page GUI for AD screening system is developed, end-user can select any brain MRI image randomly from four described categories or anyone can select the particular unseen testing images from our MRI demented dataset which describe in Fig. 9.
3.2.3
Predict Desired Demented AD Detection Page
After selecting a random brain MRI image from the one of the four desired demented testing class or from unknown source, disease screening model must predict the accurate demented class so doctors or patients can easily predict and classify the target disease whether it is mild demented, moderate demented, non-demented, or very mild demented Alzheimer’s disease. In this paper, proposed screening system predicts approximate 98% training accurate result. The prediction of the desired image describes the types belong to respective AD as shown in Fig. 10.
270
H. Pant et al.
Fig. 9 Web-based AD screening system select MRI page
Fig. 10 Web-based AD predict class page
4 Conclusion This research paper applied VGG-16-oriented baseline convolutional neural network to select, detect, and predict accurate Alzheimer’s disease from four different
GUI-Based Alzheimer’s Disease Screening System …
271
demented classes. Different researchers are doing research on early detection of Alzheimer’s disease to prevent the dementia with the help of different brain MRI images. Web-based graphical user interface are used to interact and select brain MRI directly so that end-user easily predict the types of demented Alzheimer’s disease class. Authors described three block CNN architecture to train and fit the screening system. Apart from training and fitting the model, authors also developed graphical user interface-based AD screening system using python flask and keras API in a real world. The estimated neural network model and GUI-based disease screening detector are perfectly classified and predict demented Alzheimer’s disease in early phase. Proposed model has given training classification accuracy on brain MRI images with 97.16% and validation accuracy with 94.906%. The response time for the disease detection of the proposed model is 0.07421755790710449 ms. The proposed architecture detects and classifies different types of demented AD and also performed outstanding operations with compared to other classification algorithms by means of various statistical measures.
5 Future Scopes Validation accuracy of the disease screening system can be improved by using largescale training MRI dataset. The accuracy of different types of AD prediction and classification can be improved by applying numerous pre-defined image classification models like VGG 16, VGG 19, AlexNet, and ImageNet. Along with these algorithms, authors will apply other machine learning and computer vision algorithms for different disease detection in future. Disease prediction and detection from the infected MRI images can be performed in other brain-related diseases. Moreover in future an android-based mobile application for the medical disease detection will be developed so that the entire layman can easily predict the disease on the human body and self-diagnose on the basis of mobile and web applications.
References 1. S. Sarraf, J. Anderson, G. Tofighi, Deepad: Alzheimer’s disease classification via deep convolutional neural networks using MR˙I and FMR˙I, p. 070441 (2016) 2. R. Brookmeyer, E. Johnson, K. Ziegler-Graham, H.M. Arrighi, Forecasting the global burden of Alzheimer’s disease. Alzheimer’s Dement 3(3), 186–191 (2007) 3. P. Vemuri, C.R. Jack Jr., Role of structural MRI in Alzheimer’s disease. Alzheimers Res Ther 2, 23 (2010) 4. M. Ewers, R.A. Sperling, W.E. Klunk, M.W. Weiner, H. Hampel, Neuroimaging markers for the prediction and early diagnosis of Alzheimer’s disease dementia. Trends Neurosci. 34(8), 430–442 (2011). https://doi.org/10.1016/j.tins.2011.05.005 5. W.W. Chen, X. Zhang, W.J. Huang, Role of physical exercise in Alzheimer’s disease. Biomed Rep. 4(4), 403–407 (2016). https://doi.org/10.3892/br.2016.607
272
H. Pant et al.
6. L. Feng, J. Li, J. Yu, et al., Prevention of Alzheimer’s disease in Chinese populations: status, challenges and directions. J. Prev. Alzheimers Dis. 5, 90–94 (2018). https://doi.org/10.14283/ jpad.2018.14 7. J. Wang, B.J. Gu, C.L. Masters, Y.J. Wang, A systemic view of Alzheimer disease—insights from amyloid-β metabolism beyond the brain [published correction appears. Nat. Rev. Neurol. 13(10), 612–623 (2017). https://doi.org/10.1038/nrneurol.2017.111 8. J. Cao, J. Hou, J. Ping, D. Cai, Advances in developing novel therapeutic strategies for Alzheimer’s disease. Mol. Neurodegener. 13(1), 64 (2018). Published 2018 Dec 12. https:// doi.org/10.1186/s13024-018-0299-8 9. E. Hosseini-Asl, M. Ghazal, A. Mahmoud, et al., Alzheimer’s disease diagnostics by a 3D deeply supervised adaptable convolutional network. Front. Biosci. (Landmark Ed) 23, 584–596 (2018). https://doi.org/10.2741/4606 10. K. Ebrahimi, M. Jourkesh, S. Sadigh-Eteghad, et al., Effects of physical activity on brain energy biomarkers in Alzheimer’s diseases. Diseases 8(2), 18 (2020). https://doi.org/10.3390/diseas es8020018 11. Brain MRI dataset. Available at: https://www.kaggle.com 12. J. Islam, Y. Zhang, Brain MRI analysis for Alzheimer’s disease diagnosis using an ensemble system of deep convolutional neural networks. Brain Inf. 5, 2 (2018). https://doi.org/10.1186/ s40708-018-0080-3 13. H. Pant, M.C. Lohani, A. Bhatt, J. Pant, A. Joshi, Soil quality analysis and fertility assessment to improve the prediction accuracy using machine learning approach. Int. J. Adv. Sci. Technol. 29(3), 10032 (2020). https://sersc.org/journals/index.php/IJAST/article/view/27039 14. Stochastic Gradient Descent, https://towardsdatascience.com/stochastic-gradient-descent-cle arly-explained-53d239905d31 15. T. Brosch, R. Tam, A.D.N. Initiative, et al.: Manifold learning of brain MRIS by deep learning, in International Conference on Medical Image Computing and Computer-Assisted Intervention (Springer, 2013), pp. 633–640 16. R. Chelghoum et al., Transfer learning using convolutional neural network architectures for brain tumor classification from MRI images. Artif. Intell. Appl. Innov. 583, 189–200 (2020). https://doi.org/10.1007/978-3-030-49161-1_17 17. S. Deepak, P.M. Ameer, Brain tumor classification using deep CNN features via transfer learning. Comput. Biol. Med. 111, 103345 (2019). https://doi.org/10.1016/j.compbiomed. 2019.103345 18. J. Islam, Y. Zhang, A novel deep learning based multi-class classification method for Alzheimer’s disease detection using brain MRI data, in International Conference on Brain Informatics (Springer, 2017), pp. 213–222. https://doi.org/10.1007/978-3-319-70772-3_20 19. H. Pant, et al., Impact of physico-chemical properties for soils type classification of OAK using different machine learning techniques. Int. J. Comput. Appl. (0975–8887) 177(17) (2019) 20. Z.N.K. Swati et al., Brain tumor classification for MR images using transfer learning and fine-tuning. Comput. Med. Graph. 75, 34–46 (2019). https://doi.org/10.1016/j.compmedimag. 2019.05.001 21. K. Oh, Y. Chung, K.W. Kim et al., Classification and visualization of Alzheimer’s disease using volumetric convolutional neural network and transfer learning. Sci. Rep. 9, 18150 (2019). https://doi.org/10.1038/s41598-019-54548-6 22. All the myths you need to know about Alzheimer’s disease. Available at: https://solmeglas. com/wp-content/uploads/2019/07/alzheimers-disease-presenilin-protein-1.jpg 23. PDE-9 inhibitors: Potential therapeutics for the treatment of Alzheimer’s disease. Available at: https://medium.com/@Innoplexus/pde-9-inhibitors-potential-therapeutics-for-the-tre atment-of-alzheimers-disease-c3866c2e12b6
Performance Analysis of Different Deep Learning Architectures for COVID-19 X-Ray Classification K. S. Varshaa, R. Karthika, and J. Aravinth
Abstract A chest radiograph is a chest projection radiograph that has been used to diagnose the disorders that affect the chest, its contents and structures in the vicinity. The chest X-ray of a pneumonia affected COVID-19 patient differs from a healthy person’s chest X-ray. Differentiating between them is difficult for the untrained human eye. But deep learning networks can learn to distinguish these distinctions. This paper analyses the performance of seven different models: Xception, VGG-16, ResNet-101-V2, ResNet-50-V2, MobileNet-V2, DenseNet-121 and Inception-ResNet-V2, when differentiating between COVID-19 and normal chest X-rays. The experiment’s results on the COVID-19 chest X-ray dataset inferred that the Xception model performed the best. Inception-ResNet-V2 worked fine after Xception. Keywords COVID-19 · Pneumonia · Artificial intelligence · Chest X-ray · Deep learning architectures
1 Introduction COVID-19 is a respiratory illness that has taken more than half a million lives since it is started spreading around the world. It is caused by (SARS-CoV-2) which is a type of Corona Virus. It is highly contagious among human beings. In adults who were hospitalized, the fatality rate of the disease ranged from 4 to 11%. It is estimated that the overall fatality rate is between 2 and 3% [1]. It is spread through droplets from an infected person to a non-infected person. The infected person does not need to be symptomatic. They can be asymptomatic as well. K. S. Varshaa · R. Karthika (B) · J. Aravinth Department of Electronics and Communication Engineering, Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, Coimbatore, India e-mail: [email protected] J. Aravinth e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_23
273
274
K. S. Varshaa et al.
The usual symptoms of COVID-19 are fatigue, fever, myalgia, dry cough, dyspnoea and anorexia. Pneumonia can also manifest within three weeks of the infection. In fact, COVID-19 is mainly a respiratory disease. Pneumonia can be diagnosed with the help of chest X-rays [2]. However, radiologists are needed to read these Xrays. Due to the exhaustion of hospital resources in many places, radiologists are not always readily available. Artificial intelligence can be the solution to this problem. It is fast and time-efficient. Deep learning is an emerging but significant field in artifical intelligence [3]. In this paper, we compare different deep learning models capable of differentiating between COVID-19 positive and negative X-rays. The models we are going to be using are Xception, VGG-16, ResNet-101-V2, ResNet-50-V2, MobileNet-V2, DenseNet-121 and Inception-ResNet-V2. The remainder of the paper is organized as follows. Section 2 describes related work; Sect. 3 provides the proposed models; Sect. 4 describes results and analysis; and Sect. 5 provides the conclusion.
2 Related Work In [4], a deep neural network in combination with support vector machine (SVM) was proposed. SVM was preferred over a deep learning model because the latter requires a large dataset for training. A feature of the input image was extracted by the convolutional neural layer and was fed into the SVM. The SVM classifies the input as one of the three categories, COVID-19, pneumonia and normal. In [5], a convolutional neural network in combination with a modified AlexNet was presented. A pre-processed X-ray and CT scan dataset are fed into this network. The network classifies the images in the dataset as either COVID-19 or normal. In [6], a new technique called COVID-ResNet is proposed. It involves fine tuning a pre-trained ResNet-50 model. This results in reduced trained time and an increase in the performance of the model. It is a three-step technique consisting of progressive resizing of input images of dimensions 229 × 229 × 3, 224 × 224 × 3 and 128 × 128 × 3. The network is fine-tuned at each stage. COVID-ResNet technique also employs automatic learning rate selection. In [7], a completely automated tuberculosis identification system is proposed. Template matching is used to identify tuberculosis positive candidates. It is based on a two-dimensional Gaussian model. Image enhancement involving a Hessian matrix is used for feature extraction and cavity segmentation. In [8], the use of fuzzy colour technique during pre-processing was proposed. Two sets of COVID-19 datasets were stacked together and were used for the experiment. The dataset was trained with two neural networks, SqueezeNet and MobileNetV2. Social mimic optimization method is used to process the features extracted by the models. Efficient features were extracted from the images and combined. SVM is used for the classification of the features. In [9], usage of multiple convolutional neural networks (CNNs) to evaluate the input image (chest X-ray) was proposed. The
Performance Analysis of Different Deep Learning Architectures …
275
network is trained to look for any abnormalities in the input image. It classifies the Xray as either normal or abnormal. A method to interpret the obtained results was also proposed. In [10], the performance of VGGNet and CNN for classifying the Fashion MNIST dataset was reviewed and the metrics were compared. [11] proposes and compares the performance of different deep learning networks in biometric security, specifically raw ECG data. [12] proposes a decision support system (based on image retrival) to help a physician diagonse breast cancer.
3 Proposed Models 3.1 VGG16 This model replaces the large-sized kernels of traditional convolutional networks with smaller 3 × 3 kernels. This increases model performance because deep nonlinear layers have the ability to learn more complex features. VGG16 has multiple convolution layers, max-pooling layers and dense layers. A sequence of convolutional layers is followed by three dense layers. Softmax is the final layer. The hidden layers have the ReLU activation function. With every pooling layer, the network width increases by a factor of 2. The maximum network width is 512 [13].
3.2 ResNet-50-V2/ResNet-101-V2 Deeper networks are harder to train. With the depth of the network increasing, model accuracy begins to saturate and degrade rapidly. But overfitting is not the cause of the deterioration in accuracy. Increasing the number of layers inside the network increases the error in training. However, deeper models produce better results in image recognition. Residual networks or ResNets are one of the solutions to the degradation problem. They present the feed-forward shortcut connections between the layers. The feed-forward connections perform identity transformation which is added to the outputs of the stacked layers. The deep residual network is easier to optimize and has increased accuracy gains due to the significantly increased depth [14]. The 50/101/152-layer ResNet architectures have more accuracy than the 34-layer ResNet architecture. ResNet-V2 is an improved version of the ResNet-V1 architecture. ResNet-V2 performs batch normalization and ReLU activation on the input before convolution. But in ResNet-V1, batch normalization and ReLU activation is followed by convolution. ResNet-V1 has a second ReLU activation function at the end of its network but ResNet-V2 does not. ResNet-V2 improves generalization and makes training easier [15].
276
K. S. Varshaa et al.
3.3 DenseNet-121 In DenseNet architectures, there are feed-forward connections between the layers of the network. In regular CNN’s, each layer is attached to only two other layers. But in dense convolutional networks, each layer is attached to (L + 1)/2 layers. The activation maps of the previous layers are concatenated and given as the input to each layer. In deep neural networks, the information given at the input can vanish before it reaches the end of the network. This is called the vanishing gradient problem. This can be reduced by using the DenseNet architecture. DenseNet strengthens the flow of information from the start to the end of the network. Each layer of the network receives the outputs of every other preceding layer. Every layer has access to the features extracted by the preceding layers. This encourages feature reuse and also decreases the number of parameters in the network. ResNet has similar layer connections as DenseNet, but ResNet sums up the feature maps it receives from the precedent layers while DenseNet concatenates them. The width of the DenseNet layers is also narrower. The strength of the architecture lies in the feature reuse, resulting in a parameter-efficient, easily trainable model. Due to the efficient use of the parameters, DenseNet is less likely to overfit [16].
3.4 MobileNet-V2 The fundamental block of this architecture is a residual bottleneck depth-separable convolution. Traditional convolutional layers are more computationally expensive than effective depthwise convolution. MobileNetV2 performs 3 × 3 depthwise separable convolution. This decreases the computational cost with only a small decrease in accuracy. The initial layers are dense layers with 32 filters. The dense layers are followed by residual bottleneck layers. ReLU6 is the activation function for the network. Along with decreasing the computational cost, MobileNet architecture also decreases the memory needed for the operation [17].
3.5 Inception-ResNet-V2 Residual connections in Inception networks have been found to increase the performance of the model slightly. Residual connections also decrease the training time. Inception-ResNet-V2 is therefore a combination of Inception and ResNet architectures. In Inception-ResNet-V2, the filter concatenation stage is replaced with residual connections. A 1 × 1 convolutional layer follows each inception block. This compensates for the dimensionality reduction produced by the inception block. The inputs to the regular layers undergo batch normalization. But the input to the summation
Performance Analysis of Different Deep Learning Architectures …
277
layers does not. Inception architectures with residual connections have a better recall and a smaller training period [18].
3.6 Xception Xception is the expanded version of the Inception architecture. In Inception networks, convolutions are factorized into multiple blocks. Xception is similar to depthwise separable convolution but there are some notable differences. Depthwise separable convolution first performs channel-wise spatial convolution and then performs 1 × 1 convolution, whereas Xception performs 1 × 1 convolution followed by spatial convolution. After the operation, there is a ReLU nonlinearity in Xception but this is not the case in the depthwise separable convolution. Xception architecture contains 36 convolutional layers. These convolutional layers extract features from the input. Xception architecture contains 14 modules. Each of these modules contains convolutional layers. With the exception of the corner modules, all the other modules have linear residual connections [19].
4 Results and Discussion 4.1 Dataset This dataset was created by university researchers (Qatar University and the University of Dhaka) with the help of medical doctors and other collaborators from Pakistan and Malaysia. This database contains Normal, COVID-19, and Viral Pneumonia affected chest X-rays. There are 219 COVID-19 positive images, 1341 normal images and 1345 viral pneumonia images. For our experiment, we have used 219 COVID-19 and 600 normal chest X-ray images (Fig. 1).
4.2 Metrics We have used four metrics, namely accuracy, precision, F1-score and recall. Accuracy: It is the ratio of correct predictions out of all the predictions. In highly unbalanced datasets, the accuracy score is misleading. In our dataset, we have 219 COVID-19 images and 600 normal images. For roughly every COVID-19 image, there are four normal images, making it an unbalanced data set. Hence, we use other metrics. Accuracy = (TP + TN)/(TP + TN + FP + FN)
(1)
278
K. S. Varshaa et al.
(a)
(b)
(d)
(c)
Fig. 1 a and c are normal chest X-rays. b and d are COVID-19 chest X-rays
(or) Accuracy = (No of correct predictions)/(Total number of predictions) Precision: It is the ratio of True Positives (TP) to the sum of True Positives (TP) and False Positives (FP). Precision represents the percentage of correct positive predictions out of all positive predictions. Precision = TP/(TP + FP)
(2)
Performance Analysis of Different Deep Learning Architectures …
279
It essentially calculates the accuracy of the minority class. But it does not take into consideration the number of False Negatives (FN). Therefore, we require another metric. Recall: It is the ratio of True Positives (TP) to the sum of True Positives (TP) and False Negatives (FN). Recall represents the percentage of correct positive predictions out of all possible positive predictions. Recall = TP/(TP + FN)
(3)
A model can have high recall while having low precision and vice versa. As such, looking at recall and precision as isolated metrics fail to give us the complete picture. To rectify this, we use the F1-score metric. F1-score: F1-score encapsulates the properties of both precision and recall metrics. It combines precision and recall using the formula given below. F1-score = (2 ∗ Precision ∗ Recall)/(Precision + Recall)
(4)
4.3 Experimental Analysis Given below are the results of our experiment with the seven models Xception, VGG16, ResNet-101-V2, ResNet-50-V2, MobileNet-V2, DenseNet-121 and InceptionResNet-V2 when trained and tested with COVID-19 chest X-ray dataset (Figs. 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 and 12).
Fig. 2 Accuracy comparison for different models trained using COVID-19 dataset
280
K. S. Varshaa et al.
Fig. 3 Recall comparison for different models trained using COVID-19 dataset
Fig. 4 Precision comparison for different models trained using COVID-19 dataset
Fig. 5 F1-score comparison for different models trained using COVID-19 dataset
4.4 Inference The size of the training set and test set is 573 and 246, respectively. The training set contains 145 COVID-19 and 428 normal images. The test set contains 74 COVID-19 and 172 normal images. The batch size is 16. The learning rate is 0.001, and it is
Performance Analysis of Different Deep Learning Architectures …
281
Fig. 6 a Accuracy versus epochs. b Loss versus epochs for Xception architecture
Fig. 7 a Accuracy versus epochs. b Loss versus epochs for Inception-ResNet-V2 architecture
Fig. 8 a Accuracy versus epochs. b Loss versus epochs for ResNet-101-V2 architecture
282
K. S. Varshaa et al.
Fig. 9 a Accuracy versus epochs. b Loss versus epochs for DenseNet-121 architecture
Fig. 10 a Accuracy versus epochs. b Loss versus epochs for MobileNet-V2 architecture
Fig. 11 a Accuracy versus epochs. b Loss versus epochs for ResNet-50-V2 architecture
constant for the first 8 epochs. After that, it reduces by 10% for every epoch. The size of the input images is 224 × 224.
Performance Analysis of Different Deep Learning Architectures …
283
Fig. 12 a Accuracy versus epochs. b Loss versus epochs for VGG16 architecture
Looking at the results obtained above, it can be inferred that Xception and Inception-ResNet-V2 architectures have the highest accuracy scores. The architectures with slightly lower accuracy scores are DenseNet-121 and ResNet-101-V2. ResNet-50-V2 and VGG16 had the lowest accuracy scores among all the trained models. Considering the recall metric, Xception and DenseNet-121 performed the best. A model with a high recall percentage generates fewer false negatives. With respect to the precision metric, Xception, ResNet-101-V2 and Inception-ResNet-V2 have the highest scores. A model with a high precision percentage generates fewer false positives. Since it is a highly unbalanced dataset, F1-score is the most important performance metric. Considering F1-score, Xception, ResNet-101-V2 and Inception-ResNet-V2 have the highest score. They are closely followed by DenseNet-121. ResNet-50-V2 and VGG-16 had by far the lowest scores. Xception was able to achieve its high accuracy and F1-score with only 8 epochs. Inception-ResNet-V2 also achieved comparable scores with only 8 epochs. The other models required 15 or more epochs to achieve the accuracy and F1-scores noted above. VGG16 exhibited the worst performance out of the seven architectures. It is to be noted that VGG16 has the least number of layers (19 layers) among all the models. It also does not have residual connections. ResNet-50-V2 exhibited slightly better performance than VGG16. It also has a smaller number of layers (190 layers) as compared to the other models. It is to be noted that Inception-ResNet-V2 has 780 layers while the best performing model, Xception has only 132 layers.
284
K. S. Varshaa et al.
5 Conclusion COVID-19 is a global pandemic that is straining hospital resources in many countries. Automating the reading of chest radiographs is a small step in alleviating the strain. Given adequate training, computers can achieve a high accuracy in reading the chest radiographs. A performance study of a few state-of-the-art deep learning models is presented in this paper. From our experimentation, we observed that Xception model achieves the highest accuracy and F1-score despite the small size of the dataset. It had a 0.81% accuracy improvement over Inception-ResNet-V2 and ResNet-101V2, 1.22% over DenseNet-121, 1.66% over MobileNet-V2 and 2.44% over both ResNet-50-V2 and VGG-16. It had a 1.37% F1-score improvement over InceptionResNet-V2 and ResNet-101-V2, 1.99% over DenseNet-121, 2.74% over MobileNetV2, 4.1% over ResNet-50-V2 and 4.05% over VGG-16. It was also trained in the least time, with the least number of epochs. It would be the best model for this work. In the future, with improvement in the quality and quantity of the datasets for various respiratory diseases including but not limited to COVID-19, we can automate the process of reading radiography results to a large extent.
References 1. T. Singhal, A review of coronavirus disease-2019 (COVID-19). Indian J. Pediatr. 87, 281–286 (2020) 2. World Health Organization, Pneumonia Vaccine Trial Investigators’ Group & World Health Organization. Standardization of interpretation of chest radiographs for the diagnosis of pneumonia in children/World Health Organization Pneumonia Vaccine Trial Investigators’ Group. World Health Organization (2001) 3. R. Ramachandran, D.C. Rajeev, S.G. Krishnan, P. Subathra, Deep learning—an overview. Int. J. Appl. Eng. Res. 10, 25433–25448 (2015) 4. B. Sethy, P.K. Ratha, S.K. Biswas, Detection of Coronavirus Disease (COVID-19) Based on Deep Features and Support Vector Machine. Preprints (2020) 5. H.S. Maghdid, A.T. Asaad, K.Z. Ghafoor, A.S. Sadiq, M.K. Khan, Diagnosing COVID-19 Pneumonia from X-Ray and CT Images Using Deep Learning and Transfer Learning Algorithms (2020) 6. M. Farooq, A. Hafeez, COVID-ResNet: A Deep Learning Framework for Screening of COVID19 from Radiographs (2020) 7. T. Xu, I. Cheng, M. Mandal, Automated cavity detection of infectious pulmonary tuberculosis in chest radiographs, in Annual International Conference of the IEEE Engineering in Medicine and Biology Society (2011), pp. 5178–5181 8. M. Togacar, B. Ergen, Z. Comert, Covid-19 detection using deep learning models to exploit social mimic optimization and structured chest x-ray images using fuzzy color and stacking approaches. Comput. Biol. Med. 121 (2020) 9. P.N. Kieu, H.S. Tran, T.H. Le, T. Le, T.T. Nguyen, Applying multi-CNNs model for detecting abnormal problem on chest x-ray images, in 10th International Conference on Knowledge and Systems Engineering (KSE) (2018), pp. 300–305 10. B. Saiharsha, B. Diwakar, R. Karthika, M. Ganesan, Evaluating performance of deep learning architectures for image classification, in 5th International Conference on Communication and Electronics Systems (ICCES) (2020), pp. 917–922
Performance Analysis of Different Deep Learning Architectures …
285
11. Y. Muhammed, J. Aravinth, CNN based off-the-person ECG biometrics, in International Conference on Wireless Communications, Signal Processing and Networking. WiSPNET (2019), pp. 217–221 12. K.K. Vijayan, Retrieval driven classification for mammographic masses, in Proceedings of the 2019 IEEE International Conference on Communication and Signal Processing ICCSP 2019 (2019), pp. 725–729 13. K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition (2014) 14. K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 770–778 15. K. He, X. Zhang, S. Ren, J. Sun, Identity mappings in deep residual networks, in European Conference on Computer Vision (2016), pp. 630–645 16. G. Huang, Z. Liu, L. van der Maaten, K.Q. Weinberger, Densely connected convolutional networks, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), pp. 2261–2269 17. M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L.C. Chen: MobileNetV2: inverted residuals and linear bottlenecks, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2018), pp. 4510–4520 18. C. Szegedy, S. Ioffe, V. Vanhoucke, Inception-v4 inception-resnet and the impact of residual connections on learning. Thirty-First AAAI Conf. Artif. Intell. 131, 262–263 (2016) 19. F. Chollet, Xception: deep learning with depthwise separable convolutions, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017), pp. 1251–1258
Random Grid-Based Visual Cryptography for Grayscale and Colour Images on a Many-Core System M. Raviraja Holla
and Alwyn R. Pais
Abstract The traditional visual cryptography (VC) is a technique used to encrypt the secret images in several shares with minimum computation so that there will be no processing for decryption. A conventional random grid-based VC is a different cryptosystem that generates an encoded grid one pixel at a time based on the pixel in the original image by using a single master grid. These encrypted grids are of the same size as the original image, unlike traditional VC. No other matrices are required to generate these grids. Despite its simplicity and flexibility, it is inefficient for the real-time applications due to its reduced efficiency in the large image sizes. Thus, it is necessary to exploit the currently prevalent many-core computing power to elevate this cryptosystem for achieving a better efficiency. This paper proposes a novel (2, 2) random grid-based VC that exploits a many-core system’s computational capability applied to grayscale and colour images. For more efficiency, this approach uses Compute Unified Device Architecture (CUDA) constant memory. This approach finds significance from the perspective of efficiency demand and the rapid growth in many-core computing. Experimental results proved that, the proposed method on a many-core system outperforms the normal random grid-based VC with an improved speedup. Keywords Many-core · Image · Random grid · Cryptography · Shares · Speedup · Cuda
M. Raviraja Holla (B) · A. R. Pais Information Security Research Lab. Department of Computer Science and Engg., National Institute of Technology Karnataka, Surathkal 575025, India e-mail: [email protected] A. R. Pais e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_25
287
288
M. Raviraja Holla and A. R. Pais
1 Introduction In the days when the cryptography started, there was no attention for the future utility. There was a belief that more security should rely on more computation. Now the information is not just letters. Picture, sound and video are all accepted as information. The availability of such information elsewhere should be made safely available in real time. The image is considered as a precious medium-sized information. Conventional cryptography of extensive computation may not be more effective in achieving the efficiency. Therefore, image cryptography has been started with an objective of little processing during the encryption for medium-sized information. Image security models have new characteristics and capabilities [8]. The visual cryptography that Naor and Shamir [16] engendered is continuously appealing to researchers. This field is now with a large corpus of literature in the research [4]. The variants of this scheme cost additional storage overhead for the basis matrices and the larger shares. The dealers also transmit shadows. Hence, they consume additional traffic bandwidth. Moreover, this scheme requires cumbersome effort in designing appropriate basis matrices [2]. Besides, the image encryption proposed in [9] attracted many of the researchers in the last two decades. The inventions based on this method are generally called random grid (RG)-based techniques. The investigators came to the view that the RG models remedy the drawbacks of [16]. The first proposed refinement in [17] is an extension of the technique proposed in [9] to the colour images. But innovations in image encryption, while effective, do not place much emphasis on efficiency even with multicore central processing units (CPUs). These CPU-based models do not fit for real-time applications. The sequential model even leads to resource under-utilization [18]. The computing technology has evolved from multicore to many-core processing units. These many-core processing units are throughput optimized devices in contrast with the latency optimized multicore processing units. Although the processing elements in a many-core device are simple, their massive quantity is conducive to data-parallel tasks. Besides, each such processing unit has the multitask capability. The general-purpose graphics processing unit (GPGPU) or merely the graphics processing unit (GPU) is a dataparallel many-core system optimized for throughput. This paper reformulates the RG image encryption proposed in [17] to leverage the computing power of GPGPU. Such reformulations are novel approaches in research [7, 19]. Security brings value to the information—the faster the safety, the higher the value of it.
2 Background and Motivation 2.1 Efficiency Considerations in VC The unique developments and features in the VC models have driven their efficiency demand. The computational complexity of the VC is of two kinds. More complex
Random Grid-Based Visual Cryptography for Grayscale …
289
operation VC systems are one category. So the investigators in [1, 3, 5, 10, 20] used mechanisms to simplify the complexity. The calculation is more because there is more data in another category. The VC methods that allow sharing of more than one secret image require more computation on enormous data [8]. In traditional VCs, the pixel expansion takes exponential growth rapidly if the number of participants increases [12]. The premise of the image encryption is that there should be less computation in encryption and no computation in decryption. But some systems also include computation in decryption [22]. In such a situation, efficiency cannot be underestimated. The VC schemes require efficient design and implementation in cloud-based applications [15]. A generalized general access structure (GGAS) VC approach in [24] focuses on efficiency while recovering the secrecy, particularly in real-time applications. The scheme proposed in [21] reduces the pixel expansion problem and also provides flexible sharing solution. In achieving these objectives, the random grid, XOR operation for the colour pixels and GGAS are combined. Increasing visual quality in the cryptosystem [2, 6, 23] also highlights the need for efficiency. Similarly, the need for rapid communication [11] also indirectly puts demand for efficiency.
2.2 Trends in Many-Core Systems and CUDA as a Platform The many-core systems are available in fused and discrete architectures [14]. The multicore and many-systems integrated into a single chip are the fused design. Peripheral component interconnect express (PCIe) connecting these two systems integrated into separate chips is the discrete architecture [25]. Table 1 shows future trend in multicore and many-core computing. Therefore, the cryptosystem needs to be upgraded for better use of new technology. The system needs to be changed to recognize parallel work. In this paper, there is the novelty of increasing the efficiency of the RG cryptosystem with the optimum resource utilization using a discrete GPGPU architecture.
Table 1 Future trends in multicore and many-core systems [14] Hardware attributes Multicore system 1. Number of transistors 2. Number of cores 3. Size of LLC 4. 3D integrated circuit 5. Interconnectivity bandwidth
Many-core system
Crosses 10 billion 8 billion Crosses 60 3072 96 MB 2048 KB Exists Exists 5–12 times bandwidth compared to PCIe-Gen3 by NVLink
290
M. Raviraja Holla and A. R. Pais
3 Random Grid-Based VC for Grayscale and Colour Images This section explains the primary random grid for the grayscale and colour images. Three basic grayscale random grid models presented in [9] generate an encrypted or encoded random grid E 2 pixels using a master random grid E 1 based on the pixel values in the binary converted plain image I . Equations (1), (2) and (3) represent these three models. The [p, q] in each equation indicates the pixel position. The E 1 is preset to have a 50% probability of zeros and ones. E 1 and E 2 when overlaid on each other reveal the plain image. Otherwise, one cannot discern any secrecy from grids. E 2 [ p, q] =
E 1 [ p, q] I [ p, q] = 0 E 1 [ p, q] otherwise
E 1 [ p, q] I [ p, q] = 0 random(0 or 1) otherwise random(0 or 1) I [ p, q] = 0 E 2 [ p, q] = E 1 [ p, q] otherwise E 2 [ p, q] =
(1) (2) (3)
The research in [17] is an extension of RG model in [9] for colour images. There are three primary independent colours in the subtractive colour model: they are cyan (C), magenta (M) and yellow (Y). Other colours can be considered the linear combination of these basic colours. Alternatively, a pixel in the given image decomposed to its equivalent three monochromatic coloured-grey levels. These three coloured-grey level images are converted to coloured halftone images. Halftone equivalent of the grayscale saves memory with quality [13]. Three submaster random grids are generated. Three subencoded random grids are generated using a technique proposed in [9]. Then the corresponding colour components of subrandom grids are mixed to obtain the two coloured master and encoded random grids. When these two random grids are superimposed, the colour secret image gets revealed.
4 Performance of Sequential Random Grid-Based VC for Grayscale and Colour Images The performance aspects of sequential (2, 2) random grid-based VC is studied using a system with dual-core Intel Core i3-2370M Processor and 4 GB RAM. The model is tested with the grayscales and the colour images of different sizes. Table 2 lists the execution times for the grayscale images of four different resolutions.
Random Grid-Based Visual Cryptography for Grayscale …
291
Table 2 Execution time for the grayscale image Image size Execution time (s) T1 T2 256 × 256 512 × 512 720 × 576 1024 × 1024
0.893 3.260 4.812 11.300
Total
0.327 1.436 2.386 5.163
1.220 4.696 7.198 16.463
Note T1—time for halftoning, T2—time for generating share Table 3 Execution time for the colour image Image size Execution time (s) T0 T1 256 × 256 512 × 512 720 × 576 1024 × 1024
0.561 2.251 3.252 8.304
2.5897 9.454 13.9548 32.77
T2
Total
0.936 3.825 6.014 15.36
4.0867 15.53 23.3308 56.437
Note T0—time for decomposing, T1—time for halftoning, and T2—time for generating share.
T1 is the execution time for generating a halftone image in seconds. The execution time for creating two shares labelled T2. Last column T otal is the total execution time. The data transfer between RAM and secondary storage is not considered in obtaining these times. Hence, they are only the processing times when the images available in RAM. Similarly, Table 3 shows the execution times for the colour images at various stages while processing. An additional column labelled T0 is the time to decompose the colour image into its CMY image components. The run time to obtain corresponding three halftone images is labelled T1. The execution time to generate six shares is T2. The Total is the sum of these time components. Figure 1 depicts the execution times for generating the halftone images for grayscale and colour images. As evident from the figure, the execution time in producing the halftone colour images increases drastically with the image size. The processing time to generate shares for grayscale and colour images is shown in Fig. 2. Again, the time for creating shares for colour images reaches prohibitive with the increase in image size. In summary, the total execution time for the VC of colour images is computationally expensive as evident from Fig. 3. The total time includes halftone and shares generating time typical to both grayscale and colour images. These two components consume more execution time. For the grayscale images, these two components take 70% and 30% of the overall execution time. In colour images, they are 61% and 25%, respectively. These percentages remain constant independent of image sizes. The time to decompose the colour image into its CMY components is an additional time component only for the colour images. Moreover, this time component occupies only 14% of the total execution time, irrespective of the image size. This paper optimizes the process of creating halftone and share images using a many-core system.
292
M. Raviraja Holla and A. R. Pais
Fig. 1 Execution times for generating halftone images for grayscale and colour images in the sequential algorithm
Fig. 2 Execution times for generating shares for grayscale and colour images in the sequential algorithm
5 The Proposed Random Grid-Based (2, 2) Algorithm for Grayscale and Colour Images on a Many-Core System Figure 4 shows the block diagram of the proposed many-core random grid model. The input to the model is a colour or grayscale image. The halftone image transforms it into the corresponding halftone image. The encryption block generated encoded grid using a master grid based on the pixel values of the halftone image. As the master grid is preset to contain 50% randomness of 0s and 1s, the produced encoded grid using it also introduces randomness. Decryption requires the stacking of the master and the encoded grids to reveal the secret image. The halftoning and the encryption blocks utilize a many-core system to exploit the data-parallel tasks resulting in an
Random Grid-Based Visual Cryptography for Grayscale …
293
Fig. 3 Total execution times for grayscale and colour images in the sequential algorithm
Fig. 4 Block diagram of the proposed system
improved speed. The generated encoded grid resulting from encryption is a share and hence shares generating process also refers to the encryption. Algorithm 1 on a host launches Algorithms 2 and 3 to perform halftoning and share generation on a manycore system. A host module is for the multicore system to execute. This module acts as a staging area where the necessary allocations, data transfers between multicore and many-core memories, thread creations and kernel invocation done. The CUDA API is used in the host module to perform such tasks. A device module in C language
294
M. Raviraja Holla and A. R. Pais
intended to execute on a many-core system. It is also called a kernel. It also uses CUDA primitives to obtain global unique thread indices. The global device memory is accessed using these indices. The number of pixels to be processed is high. So, the kernel is launched with multiple blocks and threads. Algorithm 1 is the main host module. It relaunches the same kernels shown by Algorithms 2 and 3 for the grayscale and the colour images. These invocations are asynchronous and independent. The CUDA constant memory initialized in step 2 of Algorithm 1 with the constants that are reused by all threads. This memory facilitates efficient accessing by appropriate caching and broadcasting. The grayscale image gets processed from steps 3 to 16. Step 9 launches the kernel for generating the halftone image. It creates the number of threads equal to the height of the image. The rationale is to process each row of the image with a thread. The step 13 launches share generating kernel with the number of threads equal to the size of the image. The timer records execution times in these two stages separately. The colour image is handled from step 18 to step 33. The colour image is decomposed into C, M, Y components in step 18. Each component is converted to halftone image by reusing the same kernel in steps 22, 23 and 24. The steps 28, 29 and 30 re-invoke the kernel used for generating shares. The allocated device memory is released in step 34. Algorithm 2 generates a halftone image. The loop in step 2, provided for a thread to iterate through each row of the image. It uses an error diffusion technique to generate halftone images. The process of generating shares given in Algorithm 3. Each thread processes a pixel by device memory read and writes. Step 1 in Algorithms 2 and 3 generates global thread indices.
6 Experimental Results The proposed scheme implemented in CUDA with OpenCV, executed on a multicore PARAM Shavak supercomputer with the Intel(R) Xeon(R)-E5-2670 CPU. This supercomputer contains a many-core Nvidia Tesla K40c GPGPU. Table 4 shows the hardware configuration. Many grayscale and colour images with various resolutions taken as test samples. Figure 5a shows an input grayscale image. Figure 5b shows the corresponding halftone image. Figure 5c, d are the random master grid and the
Table 4 PARAM Shavak supercomputer environment Attributes Multicore Intel(R) Xeon(R)-E5-2670 Processing elements Memory Cache Clock speed
2 CPUs each having 12 cores 8 TB 30,720 KB 2.30 GHz
Many-core Tesla K40c 2880 cores 12 GB L1-64 KB, L2-1.5 MB 745 MHz
Random Grid-Based Visual Cryptography for Grayscale …
295
Algorithm 1: Allocate memory, perform data transfer, create threads, and launch kernel to generate a halftone image (Host module)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
Input: The secret grayscale image I 1 and colour image I 2 of size (h × w). Output: Reconstructed grayscale image R1 and colour image R2 of size (h × w) Declare device constant memory using __constant__ f loat r [4] Copy 7/16, 5/16, 3/16, 1/16 to the constant memory r [4] using cuda MemcpyT oSymbol(). Use cuda Malloc() to allocate device memory for I 1 and I 2. Use cuda Malloc() to allocate device memory for the halftone images H G, H C, H M, H Y . Use cuda Malloc() to allocate device memory for the share images SG1,SG2, SC1,SC2, S M1, S M2, SY 1,SY 2. Initialize SG1, SC1, S M1, and SY 1 with 50% probability of binary values. Set timer = 0 Use cuda Memcpy() to transfer I 1 to device memory d_I 1. Launch H al f tone_ker nel(d_I 1, H G, width, height) to generate halftone image with the number of threads equal to the height. Use cuda Memcpy() to transfer H G to CPU memory. Store timer value. Set timer = 0. Launch Shar es_ker nel(H G, SG1, SG2, si ze) with the number of threads equal to the si ze. Use cuda Memcpy() to transfer SG1 and SG1 to CPU memory. Store timer value. Stack SG1, SG2 to output reconstructed grayscale image R1. Set timer = 0. Decompose I 2 into I 2c, I 2m, and I 2y. Store timer value. Set timer = 0. Use cuda Memcpy() to transfer I 2c, I 2m, and I 2y to device memory d_I 2c, d_I 2m, and d_I 2y respectively. Launch H al f tone_ker nel(d_I 2c, H C, width, height) to generate halftone image with the number of threads equal to the height. Launch H al f tone_ker nel(d_I 2m, H M, width, height) to generate halftone image with the number of threads equal to the height. Launch H al f tone_ker nel(d_I 2y, H Y, width, height) to generate halftone image with the number of threads equal to the height. Use cuda Memcpy() to transfer H C, H M, and H Y to CPU memory. Store timer value. Set timer = 0. Launch Shar es_ker nel(H C, SC1, SC2, si ze) with the number of threads equal to the si ze. Launch Shar es_ker nel(H M, S M1, S M2, si ze) with the number of threads equal to the si ze. Launch Shar es_ker nel(H Y, SY 1, SY 2, si ze) with the number of threads equal to the si ze. Use cuda Memcpy() to transfer SC1, SC2, S M1, S M2, SY 1,and SY 2 to CPU memory. Store timer value. Stack SC1, SC2, S M1, S M2, SY 1, SY 2 to output reconstructed image R2 Use cuda Fr ee() function to free all allocated device memory. End
296
M. Raviraja Holla and A. R. Pais
Algorithm 2: H al f tone_ker nel(I, H, width, height): Generate halftone image in device memory (Device module)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Input: The secret grayscale image I , Device allocated halftone image H , height, width. Output: Computed halftone image H . tid = block I d x.x ∗ block Dim.x + thread I d x.x For j ∈ [0 . . . , width − 1] do: if (I [tid × width + j] + err [tid × width + j] < 128) if (I [tid × width + j] + err [tid × width + j] < 128) H [tid × width + j] = 0 else H [tid × width + j] = 255 di f f = arr [tid × width + j] + err [tid × width + j] − crr [id × width + j] if ( j + 1 < width) err [tid × w + j + 1] = err [tid × w + j + 1] + di f f × r [0] if (tid < height − 1) err [tid × width + j + 1] = err [tid × width + j + 1] + di f f × r [1] if (tid < height − 1 and j − 1 ≥ 0) err [tid × width + j] = err [tid × width + j] + di f f × r [2] if (tid < height − 1 and j + 1 < width) err [tid × width + j + 2] = err [tid × width + j + 2] + di f f × r [3] Return H
Algorithm 3: Shar es_ker nel(H, G1, G2, si ze): Generating shares for the halftone colour images (Device module)
1 2 3 4 5 6 7 8
Input: Halftone images H , Mastster grid G1, Allocated encoded grid G2, and si ze. Output: Encoded grid G2. tid = block I d x.x ∗ block Dim.x + thread I d x.x while (tid < si ze) if (H [tid] = 0) G2[tid] = G1[tid] else G2[tid] = G1[tid] tid = tid + block Dim.x × grid Dim.x Return G2 device global memory.
encoded random grid, respectively. They do not reveal the secrecy. Upon stacking these two grids, the resultant secret image is in Fig. 5e. Similarly, Fig. 6a shows a secret colour image. The derived C, M and Y decomposed images displayed in Fig. 6b, c and d, respectively. Figure 6e–g are their corresponding halftone images. Figure 7a, b, d, e, g, h, respectively, show the generated two shares corresponding to each halftone image. Figure 7c, f, i are the images upon stacking their respective shares. The overlaying of all shares reveal the secret image shown in Fig. 7j. The achieved speedups with the proposed scheme is listed in Tables 5, 6, and 7. In all these tables, the execution time labels begin with the letter ‘T’. They are in seconds. In Table 5, T1 is the execution time to generate the halftone image in the conventional scheme whereas T2 in the proposed scheme for the grayscale and
Random Grid-Based Visual Cryptography for Grayscale …
297
Fig. 5 a Sample grayscale test image Lenna (256 × 256). b Halftone image (256 × 256). c Master share (256 × 256). d Encoded share (256 × 256). e Reconstructed image (256 × 256)
Fig. 6 a Sample colour test image Lenna (256 × 256). b C component (256 × 256). c M component (256 × 256). d Y component (256 × 256). e Halftone of C component. f Halftone of M component. g Halftone of Y component
298
M. Raviraja Holla and A. R. Pais
Fig. 7 a Master grid for C halftone. b Encoded grid for C halftone. c Stacking of C grids. d Master grid for M halftone. e Encoded grid for M halftone. f Stacking of M grids. g Master grid for Y halftone. h Encoded grid for Y halftone. i Stacking of Y grids. j Recovered image. Note Size of all images is (256 × 256) Table 5 Speedup of the proposed system during halftoning Image size Grayscale image Colour image T1 T2 Speedup T1 T2 256 × 256 512 × 512 720 × 576 1024 × 1024
0.893 3.26 4.812 11.3
0.002984 0.003992 0.005021 0.005798
299 817 958 1949
2.5897 9.454 13.9548 32.77
0.003753 0.005621 0.006953 0.008975
Speedup 690 1682 2007 3651
Note T1—time for halftoning (s), T2—time for generating share (s), and Speedup is unitless
the colour images. The speedup is the ratio of these two execution times. Figure 8 depicts the increase in these speedups as the image size grows. Table 6 lists the execution times for generating shares for the grayscale and the colour images in a sequential-RG and the proposed RG methods. The speedups recorded accordingly. Figure 9 shows the speedup in generating shares for the grayscale and colour images. The total speedup considering both the execution times for generating halftone and colour images in sequential and the proposed techniques shown in Table 7. The
Random Grid-Based Visual Cryptography for Grayscale …
299
Table 6 Speedup of the proposed system in generating share Image size Grayscale image Colour image T1 T2 Speedup T1 T2 256 × 256 512 × 512 720 × 576 1024 × 1024
0.327 1.436 2.386 5.163
0.00381 0.00497 0.00575 0.00763
86 289 415 677
0.936 3.825 6.014 15.363
0.004932 0.006031 0.007641 0.008932
Speedup 190 634 787 1720
Note T1—time for halftoning (s), T2—time for generating share (s), and Speedup is unitless Table 7 Total speedup of the proposed system Image size Grayscale image T1 T2 Speedup 256 × 256 512 × 512 720 × 576 1024 × 1024
1.22 4.696 7.198 16.463
0.006794 0.008962 0.010771 0.013428
180 524 668 1226
Colour image T1 T2
Speedup
3.5257 13.279 19.9688 48.133
406 1140 1368 2688
0.008685 0.011652 0.014594 0.017907
Note T1—time for halftoning (s), T2—time for generating share (s), and Speedup is unitless
Fig. 8 Speedup in generating halftone images for grayscale and colour images
total speedup is displayed in Fig. 10. The execution time for generating the halftone images is considerably greater than that of generating the shares in a multicore random grid method. The number of memory accesses and computations is higher in obtaining the halftone images than the shares. Accordingly, the speedup achieved is also more in generating the halftone images than in producing the shares. The
300
M. Raviraja Holla and A. R. Pais
Fig. 9 Speedup in generating share images for grayscale and colour images
Fig. 10 Total speedup in generating halftone and share images for grayscale and colour images
experimental results reveal that the speedups achieved increase considerably as the image size grows. The performance of the proposed many-core random grid method is 406–2688 times better than the random grid in a multicore system.
Random Grid-Based Visual Cryptography for Grayscale …
301
7 Conclusion This paper brings the efficiency considerations to the conventional VC to qualify it to the real-time applications. Less computation is very characteristic of any VC. However, due to the massive data to be processed, sequential VCs cannot be used in real-time applications. This work is significant in that it optimizes two stages of conventional random grid-based (2, 2) VC. These stages are generating halftone images and encrypted images using a many-core system. The use of constant memory in the kernel offers efficiency with the caching and broadcasting benefits. The reformulation of a solution to fit into a many-core system is a novel approach in the era of evolving many-core systems. The experimental results proved the efficiency of the proposed scheme over the traditional (2, 2) random grid method. The improved speedup is significant with the increased image resolutions. The proposed method outperforms 3651× and 1720× in generating halftone and share images, respectively, for a colour image size 1024 × 1024. For this image, the total performance gain is 2688× more over the conventional method. The future work focuses on applying maturing CUDA catalysts to the proposed scheme and analysing the performance implications therein. It is possible to extend this work to the existing traditional VCs by considering other efficiency possibilities.
References 1. C.C. Chen, W.J. Wu, J.L. Chen, Highly efficient and secure multi-secret image sharing scheme. Multimed. Tools Appl. 75(12), 7113–7128 (2016) 2. T.H. Chen, K.H. Tsao, Threshold visual secret sharing by random grids. J. Syst. Softw. 84(7), 1197–1208 (2011) 3. T.H. Chen, C.S. Wu, Efficient multi-secret image sharing based on Boolean operations. Signal Process. 91(1), 90–97 (2011) 4. P. D’Arco, R. De Prisco, Visual cryptography, in International Conference for Information Technology and Communications (Springer, 2016), pp. 20–39 5. K.M. Faraoun, Design of a new efficient and secure multi-secret images sharing scheme. Multimed. Tools Appl. 76(5), 6247–6261 (2017) 6. T. Guo, F. Liu, C. Wu, Threshold visual secret sharing by random grids with improved contrast. J. Syst. Softw. 86(8), 2094–2109 (2013) 7. R. Holla, N.C. Mhala, A.R. Pais, Gpgpu-based randomized visual secret sharing (grvss) for grayscale and colour images. Int. J. Comput. Appl. 1–9 (2020) 8. S. Kabirirad, Z. Eslami, Improvement of (n, n)-multi-secret image sharing schemes based on Boolean operations. J. Inform. Sec. Appl. 47, 16–27 (2019) 9. O. Kafri, E. Keren, Encryption of pictures and shapes by random grids. Opt. Lett. 12(6), 377– 379 (1987) 10. S. Kukreja, G. Kasana, A secure reversible data hiding scheme for digital images using random grid visual secret sharing, in 2019 Amity International Conference on Artificial Intelligence (AICAI) (IEEE, 2019), pp. 864–869 11. K.S. Lin, C.H. Lin, T.H. Chen, Distortionless visual multi-secret sharing based on random grid. Inf. Sci. 288, 330–346 (2014) 12. F. Liu, C. Wu, L. Qian et al., Improving the visual quality of size invariant visual cryptography scheme. J. Visual Commun. Image Rep. 23(2), 331–342 (2012)
302
M. Raviraja Holla and A. R. Pais
13. D.C. Lou, H.H. Chen, H.C. Wu, C.S. Tsai, A novel authenticatable color visual secret sharing scheme using non-expanded meaningful shares. Displays 32(3), 118–134 (2011) 14. S. Mittal, J.S. Vetter, A survey of CPU-GPU heterogeneous computing techniques. ACM Comput. Surv. (CSUR) 47(4), 69 (2015) 15. M. Mohanty, W.T. Ooi, P.K. Atrey, Secret sharing approach for securing cloud-based preclassification volume ray-casting. Multimed. Tools Appl. 75(11), 6207–6235 (2016) 16. M. Naor, A. Shamir, Visual cryptography, in Workshop on the Theory and Application of of Cryptographic Techniques (Springer, 1994), pp. 1–12 17. S.J. Shyu, Image encryption by multiple random grids. Pattern Recogn. 42(7), 1582–1596 (2009) 18. D. Suma, et al., Pipelined parallel rotational visual cryptography (pprvc), in 2019 International Conference on Communication and Signal Processing (ICCSP) (IEEE, 2019), pp. 0109–0113 19. H. Wang, H. Peng, Y. Chang, D. Liang, A survey of GPU-based acceleration techniques in MRI reconstructions. Quant. Imaging Med. Surg. 8(2), 196 (2018) 20. J. Wang, X. Chen, Y. Shi, Unconstraint optimal selection of side information for histogram shifting based reversible data hiding. IEEE Access (2019) 21. X. Wu, Z.R. Lai, Random grid based color visual cryptography scheme for black and white secret images with general access structures. Signal Process. Image Commun. 75, 100–110 (2019) 22. X. Wu, W. Sun, Random grid-based visual secret sharing with abilities of or and xor decryptions. J. Visual Commun. Image Represent. 24(1), 48–62 (2013) 23. B. Yan, Y. Xiang, G. Hua, Improving the visual quality of size-invariant visual cryptography for grayscale images: an analysis-by-synthesis (abs) approach. IEEE Trans. Image Process. 28(2), 896–911 (2019) 24. X. Yan, Y. Lu, Generalized general access structure in secret image sharing. J. Visual Commun. Image Represent. 58, 89–101 (2019) 25. Y. Yang, P. Xiang, M. Mantor, H. Zhou, CPU-assisted GPGPU on fused CPU-GPU architectures, in IEEE International Symposium on High-Performance Comp Architecture (IEEE, 2012), pp. 1–12
A Generic Framework for Change Detection on Surface Water Bodies Using Landsat Time Series Data T. V. Bijeesh
and K. N. Narasimhamurthy
Abstract Water is one of the important natural resources that requires immediate attention from a sustainability perspective. Depletion of surface water bodies due to various reasons has been remaining as a major concern for all growing cities across the globe. Change detection on the water bodies over the years can help the concerned authorities to implement strategies and solutions that can conserve our water bodies for future generation. This paper presents an image processing-based water body change detection method using Landsat multispectral images over the past 25 years. A hybrid level set-based segmentation algorithm is used for delineating the water bodies from the multispectral images. The surface area of the water bodies is then computed for the delineated water bodies, and a machine learning model is used for forecasting the future change from the past data. This paper also explores the possibility of building a dataset for training a deep learning-based image-to-image regression network that can forecast the shape and surface area of the water bodies. Keywords Water body change detection · Level set-based segmentation · Image-to-image regression · Deep learning
1 Introduction Water has always been the backbone of any civilization since time immemorial. Rapid urbanization has resulted in abrupt depletion of water bodies across all major cities in the world. Surface water bodies are the worst victims of urban developments and failure to focus immediate attention to the issue can have long-lasting effects on the future of our planet. Surface water bodies not only act as a direct source of water for the civilization, but also recharge the ground water. Disappearing and depleting water bodies hence lead to depletion in groundwater. Another important role the water bodies play is flood water mitigation and as our lakes and rivers are encroached in T. V. Bijeesh (B) · K. N. Narasimhamurthy School of Engineering and Technology, CHRIST (Deemed to be University), Bangalore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_26
303
304
T. V. Bijeesh and K. N. Narasimhamurthy
the name of development, heavy floods are witnessed even with the moderate rain. Bangalore, a fast growing city in India, has witnessed this depletion of water bodies in an alarming rate. A recent study conducted by Down to Earth, a publication of the Center for Science and Environment, Bangalore’s number of water bodies, has reduced by 79% in the last four decades. The main reasons for this alarming reduction are unplanned urbanization and population growth. South Africa’s Cape Town had run out of drinking water and according to a BBC report, Bangalore is the next city in line. These facts prove that water body change detection studies are extremely important to analyze the surface water depletion and to subsequently device strategies to slow down this depletion thereby conserving them for the future generations. Researchers have been working extensively on change detection studies using remote sensing images for the last few decades. Water body detection and delineation from the satellite images is the first step for change detection studies. Water body detection has been performed using techniques like spectral water indices, machine learning, spectral unmixing, and active contours. A comprehensive review on various methods of detecting and delineating water bodies from satellite images can be found in [1]. Water detection and delineation is the first step in performing change detection and temporal analysis on water bodies. Once water body delineation method is developed successfully, it is applied on multitemporal data to study the temporal changes happened to the water body over the years. Duan et al. presented a quadratic exponential smoothing-based lake area analysis using time series Landsat images. They forecasted future lake area based on the past area values without considering any external parameters [2]. The most frequently used data for change detection studies is the Landsat time series data. Researchers have already established the usefulness of Landsat time series data for performing change detection studies on water bodies [3]. A water mapping product, Water Observations from Space (WofS), was developed by Muller et al. that provides a continentally consistent model for studying and analyzing surface water bodies across the Australian subcontinent both spatially and temporally. Landsat data for the past 25 years were taken into account for developing this product. Decision tree and logistic regression were used for mapping water bodies using Landsat multispectral data [4]. An unsupervised change detection method was proposed by Acharya et al. based on spectral indices NDVI, NDWI, and MNDWI to detect the change of lakes in Pokhara city of Nepal using Landsat data of 25 years duration from 1988 to 2013. After preprocessing, a model was created in the ArcGIS for calculating the positive and negative change in the surface water of the lakes. For smaller lakes, the method was not as effective but for larger lakes the method proved to be very effective [5]. This paper proposes a generic change detection framework using level set theorybased water body delineation and deep learning-based image regression for forecasting the shape and surface area of water bodies from Landsat time series data for past 25 years.
A Generic Framework for Change Detection on Surface Water … Table 1 Landsat 7 and landsat 8 bands and details Landsat 5 Bands Band 1
Wavelength (μm) Resolution (m) Blue (0.45–0.52) 30
Band 2 Band 3
Green (0.52–0.60) Red (0.63–0.69)
30
Band 4
NIR (0.77–0.90)
30
Band 5
30
Band 8
SWIR1 (1.55–1.75) TIR (10.40–12.50) SWIR2 (2.09–2.35) Not applicable
Band 9
Not applicable
Band 10
Not applicable
Band 11
Not applicable
Band 6 Band 7
30
30 30
305
Landsat 8 Wavelength (μm) Ultra Blue (0.435–0.451) Blue(0.452– 0.512) Green(0.533– 0.590) Red(0.636– 0.673) NIR(0.851– 0.879) SWIR1(1.566– 1.651) SWIR2(2.107– 2.294) PAN(0.503– 0.676) Cirrus(1.363– 1.384) TIRS1(10.60– 11.19) TIRS2(11.50– 12.51)
Resolution (m) 30 30 30 30 30 30 30 15 30 100 * (30) 100 * (30)
2 Materials and Methods 2.1 Dataset and Study Area The proposed framework utilizes the freely available Landsat time series images for the past 25 years. The Landsat program is a series of Earth satellite missions operated collaborately NASA and United States Geological Survey. Researchers have been relying on Landsat images since 1972 to remotely study various natural resources and environmental phenomenon. In this work, time series data from Landsat 5 and Landsat 8 have been used to perform change detection on water bodies. The details of various EM bands at which images are acquired in sensors aboard Landsat 5 and Landsat 8 are presented in Table 1 [6]. Bands 1 and 9 are new addition in Landsat 8, and are, respectively, useful in costal/aerosol studies and cirrus cloud detection. Bands 10 and 11 are thermal bands and are used with greater precision for obtaining surface temperature information. Landsat 8 images are ideal for water-related studies because it consists of NIR and SWIR bands along with the visible bands that are utilized by most of the water
306
T. V. Bijeesh and K. N. Narasimhamurthy False Colour Image
Red Channel
Green Channel
Green Channel
NIR Image
SWIR Image
Fig. 1 Sample landsat image with important channels
detection algorithms. It also has a cirrus band for effectively detecting and removing cloud pixels from the image. Landsat 5 and Landsat 8 images are used for the study and are downloaded from the Earth Resource and Observation Science (EROS) Center-USGS Web site. Landsat 7 was initially considered but was not included because of the scan line error induced by the Scan Line Corrector (SLC) failure aboard the satellite. EROS runs the Landsat program together with NASA and maintains an immense archive of Earth’s land surface satellite imagery. Implementation of all algorithms in the study is performed on a Linux computer with dual core Intel processor using MATLAB software. The study area chosen is Bellandur Lake of Bangalore, India, with latitude and longitude 12.9354◦ N, 77.6679◦ E and Varthur lake, Bangalore (12.9389◦ N, 77.7412◦ E). Landsat multispectral images from the year 1987 to the year 2019 are used to develop the change detection algorithm. The images used are of the same month of every year so that the changes in water body due to seasonal changes do not affect the analysis. Data for a few years are missing in the study as cloud-free image of the area under study was unavailable in the USGS portal. Figure 1 presents sample Landsat images in false color and red, green, blue, NIR and SWIR Channels. A false color image is an image created by using channels other than red, green, and blue and the image will look different from the corresponding RGB image. False color images are normally used to enhance or suppress some of the features in the image scene.
A Generic Framework for Change Detection on Surface Water …
307
2.2 Preprocessing It was established by researchers that near infrared (NIR) and shortwave infrared (SWIR) bands are the most suitable for water detection applications. This is due to the fact that water absorbs electromagnetic (EM) signals that come beyond NIR range in the EM spectrum [7]. The pixel values have to be normalized before applying level set algorithm because the satellite images are captured under varying lighting conditions and thus has variations in pixel intensity value of the same ground point based on these conditions. The proposed framework uses histogram equalization and normalized difference water index (NDWI) as a preprocessing step to normalize the pixel values so that the inconsistencies due to varying lighting conditions can be nullified. MNDWI is a spectral index that utilizes the green and SWIR bands of the Landsat image. The formula to compute NDWI is given in Eq. 1. MNDWI =
Green − SWIR Green + SWIR
(1)
2.3 Level Set-Based Water Body Delineation A level set is an implicit depiction of a curve. In level set segmentation, our level set is first assigned to the image arbitrarily, and then gradually evolve the level set according to an external force. Typically the force which controls the evolution of the level set is the curvature of the level set. It has been started by arbitrarily setting the contour C0 (called the initial contour) and evolve it according to a force function. An image property is used as the force function that drives the evolution of the level set which is normally the gradient of the image [9, 10]. If φ(x, y, t) is the level set function where t is the time parameter introduced to model the temporal behavior of the level set function, the evolution of the level set is governed by Eq. 2. ∂φ = F |∇φ| ∂t
(2)
The driving force F which is the curvature of the level set curve acts in a direction normal to that of the level set and is the gradient of the level set function. To perform image segmentation using level set evolution method, a new term called edge stopping function is added to the level set evolution PDE. So the new PDE is given as Eq. 3. ∂φ = g (x, y) F |∇φ| (3) ∂t Here the function g(x, y) = 1 + |∇ f |2 , where f is the image, is called the edge stopping function. The value of this function becomes nearly equal to zero at
308
T. V. Bijeesh and K. N. Narasimhamurthy
the object boundaries because of the fact that the image gradient is relatively higher at the boundaries. Therefore, becomes zero at the boundaries and thus the level set evolution stops. Based on active contour techniques, two models are obtained for boundary detection: edge-based model and region-based model. Edge-based methods usually use a measure of the changes across an edge, such as the gradient or other partial derivatives. Such methods utilize image gradient to construct an edge stopping function (ESF) to stop the contour evolution on the object boundaries. But edge-based method is not considered the best as the edges may not be sharp due to fading of ink or degradation which can prevent the gradient value at the edges from being a high value. Another disadvantage of these models is that they are very sensitive to the location of the initial contour. Edge-based models are said to have local segmentation property as they can segment the objects only if the initial contour is placed surrounding the object. Region-based model does not work on the discontinuity in the image, rather it partitions the image into “objects” and “background” based on pixels intensity similarity. Region-based models utilize the statistical information inside and outside the contour to control the evolution, which are less sensitive to noise and have better performance for images with weak edges or without edges. This method is not very sensitive to the location of the initial contour and can detect interior and exterior boundaries at the same time. Therefore, region-based models are said to posses global segmentation property [11, 12]. The change detection framework proposed in this work employs a hybrid level set which is a combination of both edge-based and region-based techniques and hence has combined advantages of both. The final formulation of the PDE is given in Eq. 4. The terms μ and λ are weight terms and can be fine-tuned and customized based on the smoothness and texture properties of the image. ∂φ = μδ (φ) div ∂t
∇φ |∇φ|
+ λ(u 0 − c1 )2 + λ(u 0 − c2 )2 + 1/1 + |∇φ| p ; p > 1 (4)
2.4 Multitemporal Analysis for Change Detection The water body delineation algorithm discussed in the previous section was then applied on multitemporal Landsat data starting from 1985 to 2020 to obtain the changes on the lake. The surface area of the lake was computed from the delineated image for all these years to provide a quantitative measure of temporal changes happened to the lake. Simple machine learning-based time series forecasting can be used to provide a futuristic measure in the expected change in surface area. This paper also proposes to experiment the usefulness of image-to-image regression networks to forecast not only the change in surface area but also the shape of the lake in the
A Generic Framework for Change Detection on Surface Water …
309
Fig. 2 Proposed generic change detection framework for surface water bodies
near future. Image-to-image regression has been widely used for image processing applications like denoising, age prediction, image generation, etc. [13–15], but is yet to be experimented for change detection using multitemporal multispectral images. The proposed framework for water body change detection is presented in Fig. 2.
310
T. V. Bijeesh and K. N. Narasimhamurthy
(a)
(b)
Fig. 3 Segmentation result for Bangalore area a contours juxtaposed on the SWIR image, b only the contour lines
3 Results and Discussions A level set-based water body detection method is proposed and is applied on multitemporal Landsat data for change detection studies on lakes. Figure 3 shows the delineation output after applying the proposed delineation method on lake Bellandur, Bangalore. After some preprocessing on the obtained output and removing the unwanted contours, the binary output of the delineated water body is obtained as presented in Fig. 4. The water body delineation algorithm is then applied on Landsat multispectral images of last 25 years to visualize the monitor the changes in surface area of the water bodies. The delineation output using the proposed algorithm for lake Bellandur and lake Varthur in Bangalore from the year 1987 to 2019 is presented in Fig. 5 and Fig. 6, respectively . Images for certain years may be missing in the output as cloudfree images of the study area for the year was not available in the USGS EROS portal. Bangalore is an Indian city that is growing drastically and it is evident from the results that the surface area of the lakes under study has come down steeply from 1987 to 2019 and will continue to deplete if appropriate measures are not taken by the authorities concerned. This paper also proposes to apply the algorithm to multitemporal data for multiple lakes in Bangalore to prepare a dataset that can be used to train a deep learning imageto-image regression model that can forecast the future shape and the surface area of a given water body. The proposed image-to-image regression model takes an image cube as input where each channel corresponds to the delineated water body image for a given year and the output is a single channel image which is the forecasted
A Generic Framework for Change Detection on Surface Water …
311
Fig. 4 Segmentation result for lake Bellandur, Bangalore a contour over the extracted water body. b Binarized output image where white pixels correspond to water pixels and black pixels correspond to non-water pixels
Fig. 5 Delineated output for lake Bellandur from the year 1989 to the year 2019
312
T. V. Bijeesh and K. N. Narasimhamurthy
Fig. 6 Delineated output for lake Varthur from the year 1989 to the year 2019
Fig. 7 Proposed image-to-image regression network that can be trained using past images to forecast future image of the water body
output. An abstract representation of the proposed image-to-image regression model is presented in Fig. 7. Even though the framework proposed is generic in nature and can be applied on any water body, the major focus will be given to the lake in Bangalore in order to maintain the parameters like rainfall, snowfall constant. This research is still in the initial stage and once the dataset is prepared and the forecast model is built, it can help us convince the policymakers on the importance of taking important steps in conserving our fast depleting water bodies.
A Generic Framework for Change Detection on Surface Water …
313
4 Conclusion and Future Scope This paper presents a generic level set theory-based framework for surface water change detection studies using Landsat time series data the past 25 years. A level set-based segmentation was applied on multitemporal Landsat multispectral images of water bodies in Bangalore to visualize the changes happened to the water bodies in terms of surface area. The results clearly shows how fast the lakes under study have depleted in surface area thus emphasizing the need for taking measures to slow it down. This work has also proposed the possibility of building a deep learning image-to-image regression network that can be trained using the dataset prepared using the level set-based water delineation algorithm. The model can also be made robust by taking into consideration parameters like rainfall and temperature if the data can be collected for the last 25 years. The main challenge in building the model is the lack of available dataset and this work also proposes to prepare the dataset and the build the deep learning model. Water is an essential natural resource that needs to be used wisely and conserved for the future generations. Many of the urban cities in the world are facing acute shortage in surface water and ground water. This research is an initial stage of developing a generic framework for fully automated monitoring and forecasting the changes happening to the surface water bodies.
References 1. T.V. Bijeesh, K.N. Narasimhamurthy, Surface water detection and delineation using remote sensing images: a review of methods and algorithms. Sustain. Water Res. Manage. 64, 1–23 (2020) 2. Gonghao Duan, Ruiqing Niu, Lake area analysis using exponential smoothing model and long time-series landsat images in Wuhan, China. Sustainability 101, 149 (2018) 3. W. Pervez et al., Landsat-8 operational land imager change detection analysis. Int. Arch. Photogram. Remote Sens. Spatial Inf. Sci. 42, 607 (2017) 4. N. Mueller, et al.: Water observations from space: mapping surface water from 25 years of Landsat imagery across Australia. Remote Sens. Environ. 174, 341–352 (2016) 5. T.D. Acharya, et al., Change detection of lakes in Pokhara, Nepal using landsat data. Multidiscipl. Digital Publ. Inst. Proc. 1(2) (2016) 6. J.A. Barsi, et al., The spectral response of the landsat-8 operational land imager. Remote Sens. 6, 10232–10251 (2014) 7. T. Lillesand, R.W. Kiefer, J. Chipman, Remote Sensing and Image Interpretation (Wiley, 2015) 8. H. Xu, Modification of normalised difference water index (NDWI) to enhance open water features in remotely sensed imagery. Int. J. Remote Sens. 2714, 3025–3033 (2006) 9. M. Kass, A. Witkin, D. Terzopoulos, Snakes: active contour models. Int. J. Comput. Vision 1, 321–331 (1988) 10. S. Osher, J.A. Sethian, Fronts propagating with curvature-dependent speed: algorithms based on Hamilton-Jacobi formulations. J. Comput. Phys. 791, 12–49 (1988) 11. K. Zhang, et al., Active contours with selective local or global segmentation: a new formulation and level set method. Image Vision Comput. 28(4), 668–676 (2010) 12. Tony F. Chan, Luminita A. Vese, Active contours without edges. IEEE Trans. Image Process. 10(2), 266–277 (2001)
314
T. V. Bijeesh and K. N. Narasimhamurthy
13. S. Pathan, Y. Hong, Predictive image regression for longitudinal studies with missing data (2018). arXiv:1808.07553 14. David Eigen, Predicting Images using Convolutional Networks: Visual Scene Understanding with Pixel Maps (New York University, Diss, 2015) 15. V. Santhanam, V.I. Morariu, L.S. Davis, Generalized deep image to image regression, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
A Machine Learning Approach to Detect Image Blurring Himani Kohli, Parth Sagar, Atul Kumar Srivastava, Anuj Rani, and Manoj Kumar
Abstract The advent of smart mobile phones and cameras has unprecedently increased the number of photographs captured by people. Despite the hype, image blur remains as one of the distortions or quality degradation parameters, which can be raised by various parameters during the image capturing to processing phase. In the proposed paper, a machine learning-based detection technique is suggested to detect the image blur region and classify them into two categories. Image pre-processing has been applied to improve the textual information. Further, a pre-trained convolutional neural network model is used to classify the image blur. A fully automatic approach for blur detection as well classification is suggested in this paper. Three different image datasets are used to check the performance of the proposed method. Various parameters are estimated to demonstrate the results. Keywords Blur detection · Convolution neural network (CNN) · Classification · Image blur · Laplacian enhancement
1 Introduction Image blur arises from different natural photos due to camera shake, motion blur, defocus blur, artificial blur for highlighting important features as per the requirements. Thus, it is not required because it sometimes affects the important regions H. Kohli (B) · P. Sagar Department of Computer Science, Amity University, Noida, India A. K. Srivastava (B) Amity University, Tashkent, Uzbekistan e-mail: [email protected] A. Rani Department of Computer Science, G L Bajaj Institute of Technology and Management, Greater Noida, India M. Kumar School of Computer Science, University of Petroleum and Energy Studies, Dehradun, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_27
315
316
H. Kohli et al.
and makes it less sharp, while in other cases, it is good to be blurred. Thus, people do it purposely to make photographs better by popping out the main content in the image and blurring the unimportant region. Many further estimated procedures have been there and tested, but with limited efforts need an enhancement in their algorithm. Understanding, blur desirability is not trivial but a challenging task to further distinguish the blur image and no-blur image. We can consider the image quality and give a scale of good, bad and ok blur type. The desirability depends on the image whether it needs to be blurred or needs to remove blur. Further, understanding the image in terms of high-level and low-level blur estimation, we proposed a novel approach training large-scale blur dataset consisting of more than 9500 images which is further categorized in natural, artificial and undistorted images. The dataset comprises of collection of different datasets which is trained and give result in terms of net accuracy. Firstly, given an image, we pre-process the image using Laplacian operator and CLAHE. After pre-processing, our dataset is trained and evaluated using a sequential CNN model with max pooling and Swish which is end-to-end trainable network. The output is shown by giving accuracy of each training epoch and further end result is given by a net percent of accuracy. Finally, we compared the proposed technique performance with other techniques. To best of our knowledge, we designed this approach to detect the blur and enable to categorize the blur amount using blur estimation. The proposed paper is categorized into five sections. Section 2 discusses relevant literature review followed by methodology in Sect. 3. Section 4 depicts demonstrated results and finally the conclusion of the paper is drawn in Sect. 5.
2 Related Work In recent years, image blur detection using neural networks has proved their superiority among the conventional areas of research. Most previous work has paid attention on finding the blur in an image, as summing many users have the knowledge of blur category (good, ok, bad) or desirability criteria [1]. Different features have been used to estimate the blur amount, such as local filters [2], representations [3] and finding similarity among different neighbour elements [4]. It proposed SmartBlur dataset [5] consisting of more than 10,000 natural photos. It proposed an approach that is ABC-FuseNet to fuse low-level blur estimation and ResNet-50 is used to extract semantic feature. Evaluation has been compared with baseline methods giving as, 0.822, 0.616, 0.607, 0.785. Thus, composing a problem in two steps: generating blur responses and understanding by distilling high-level semantics failing accuracy in low-level semantics. Its limitation consists of methods that took more storage memory. Whereas, in previous related research classification is done between two classes (blur and not blur), defining α-feature based on pixel’s properties. Dataset used was DIAQ [6] and SmartDoc [7] thus consists of a drawback with image level
A Machine Learning Approach to Detect Image Blurring
317
blur desirability [8]. Other authors proposed blur quality measure for six collections, that is, Gblur, JPEG 2000 LIVE, TID2008, TID2013, IVC and CISQ by using histogram modelling [9]. In the research, it used SVM classifier and implemented SVM-RFE theory [10], includes statistic, texture [11], spectrum, low power spectrum features, for blur classification. It consists of dataset named Berkeley segmentation and Pascal VOC. This research handles complex problems but has a limitation of failing in contributing results for big sample data [12]. It proposed an approach that includes problem formulation and single-scale [13] and multi-scale deep blur detection. It consists of benchmark dataset proposed by [2, 14]. This research consists of a drawback of not considering the other features and only deals with multi-scale blur detection. Author utilized deep learning algorithms, notably convolutional neural network to classify them either as blur or clear images. Classification procedures are done by extracting features of clear images in order to detect blur. This research has a limitation that it was failed in complex problems such as big sample dataset. In this research, comparison is drawn among three different image capture strategies for formulating the problem of motion de-blurring [15]. Experimental results are stimulated using high speed video camera. This research is sufficient for investigating the performance of a complex system but failed to investigate for one problem. Another research proposed a no-reference image blur assessment model that performs partial blur detection in the frequency domain [16]. Support vector machine (SVM) classifier is applied to the above features, serving as the image blur quality evaluator. It is using a medium size image dataset consisting of more than 2400 digital photographs. Hence, the author does not count this search for larger dataset casts as a limitation. In the recent approaches of year 2019, blur detection is labelled as an important feature of a digital image. A deep learning-based approach for detecting the blur in an image comprises of 250,000 identity images. They have done comparative analysis of our approach to statistical feature extractor, i.e. BRISQUE which was trained on SVM. Its limitations include the running time for a method applied on dataset [17]. In the research of 2020, the author proposed experimental results demonstrating the effectiveness of the proposed scheme and are compared to deterministic methods using the confusion matrix. It states an alternative method for Laplacian, and it has a limitation that it does not have a good result of classification of CNN [18, 19].
3 Methodology In this section, the proposed methodology is discussed. Firstly, the pre-processing techniques to enhance image before classification is performed. Contrast limited adaptive histogram equalization (CLAHE) [20], Laplacian filter [21], and CNN model are designed, and dataset is trained and evaluated in this section. Figure 1 shows the proposed approach framework. It also shows the steps to encounter the blur image from the dataset. It formulates the steps to be followed in order to reach the
318
H. Kohli et al.
Fig. 1 Proposed framework to detect image blur
end result. In this figure, dataset is processed under CLAHE and Laplacian. Further, a model is designed to provide the detection of blur in images.
3.1 Image Pre-processing Image pre-processing is the technique used to extract useful information from image in order to process results further. Image pre-processing consists of the following steps: a. b. c.
Import the image which is input to the system Further, analysing and manipulation of the input image Output is the enhanced image after pre-processing techniques.
3.1.1
Proposed Contrast Limited Adaptive Histogram Equalization (CLAHE)
In the study, CLAHE has been used in order to overcome the noise amplification problem. In CLAHE, the contrast limiting procedure has to be applied to neighbouring pixels from which the transformation function is derived. Under the technique CLAHE, the input image is divided into sub-images, tiles and blocks. The CLAHE [20] consists of two parameters which are used to enhance image quality. The two parameters are block size (BS) and clip limit (CL). If CL is increased, the image gets brighter making histogram flatter. The dynamic range becomes larger when the BS is bigger. The CLAHE method consists of the following rules to enhance the original image which further goes to other pre-processing techniques: a. b. c.
The original intensity image is divided into non-overlapping contextual regions. The total number of sub-images is preferred to be 8 × 8, hereby, taking P × Q. According to the image level, calculating histogram equalizers for each region. Using CL, further calculating the contrast histogram for each contextual region.
A Machine Learning Approach to Detect Image Blurring
i.
Taking CL value as: Q avg (Q r X ∗ Q r Y)/Q gray
ii.
d.
(1)
where Q avg is the average number of pixel, Q gray is the number of grey levels and Q r X and Q r Y are the numbers of pixels in the X dimension and Y dimension of the contextual region.
The actual CL is expressed as: QCL = Q clip × Q avg
e.
319
(2)
where QCL is the actual CL, Q clip is the normalized CL under the range [0, 1]. If the number of corresponding pixels is greater than QCL, the pixels will be clipped. The following are the rules for histogram equalizer: If Hregion (i) > QCL then
f.
Hregion_clip (i) = QC L
(3)
Else if (Hregion (i) + Q avggray ) > QCL then
(4)
Hregion_clip (i) = QCL
(5)
Else Hr egion_cli p (i) = Hr egion + QCL
(6)
where Hregion (i) and Hregion_clip (i) are original histogram and clipped histogram of each region. Redistribute the remaining pixels, where, Q remain is the remaining number of clipped pixels. Q gray /Q remain
g. h.
(7)
Enhancing intensity values in each region by Rayleigh transform. Linear contrast stretch, given by, X new =
X input−X min ∗ 255 X max − X min
(8)
When a bi or tri-modal of an image is distributed in a histogram, it stretches a certain value of histogram for increased enhancement in selected areas. Where, X min
320
H. Kohli et al.
and X max denote the minimum and maximum value of the transfer function, and X input is the input value of transfer function.
3.1.2
Laplacian Operator
In order to detect edges in the image, we used Laplacian operator on the platform of Python. It is more efficient than Sobel and Kirsch as they are first derivative operator and Laplacian is second derivative operator. We worked with Laplacian operator because it gave more and more sharpen effect on images, therefore, we chose second derivative over first derivative. Laplacian operator is scalar differential operator for a scalar function f (x, y), where x and y are spatial coordinates in a plane. Laplacian is used in digital image pre-processing to detect the edges to show the area of rapid intensity change. f = 2 f = · f Laplacian takes out edges in the following classification: a. b.
Inward edges Outward edge. Laplacian consists of two types of operator that is:
1.
Positive Laplacian operator
It consists of mask where corner elements are zero and centre elements are negative. 2.
Negative Laplacian operator. It consists of mask where corner element is zero and centre element as positive and remaining should be −1.
3.2 Proposed Convolutional Neural Network (CNN) In this research, we introduce the problem of automatically understanding image blur in terms of analysing image quality. The proposed CNN model is shown in Fig. 2. Here, in Fig. 2, the proposed framework is consisting of steps involved in the convolutional neural network followed by input image and corresponding layers with max pooling and softmax at the end of the layer [22]. Further, Swish is used as an activation function proposed in corresponding layer of the network. This network is implemented using Python 3.7 with imported files of Keras and Sklearn Library. The model used is sequential model, where batch of images is further used as an input to the model for every training iteration. It is implemented with number of training epochs as 15. Our model consists of adaptive learning rate optimizer (ADAM), and the model has an activation function applied to the layers of network named as Swish,
A Machine Learning Approach to Detect Image Blurring
321
Fig. 2 Architecture of CNN model
an extension of sigmoid function. Swish is a better and more efficient function than ReLU as by replacing ReLU the model accuracy has been improved by 10%.
4 Experimental Results Three data has been applied in our study for better results. The first one is blur image dataset granted an access from Portland State University [23], second is CUHK from The Chinese University of Hong Kong [24] and the third one is CERTH image blur dataset [25]. Our dataset consists of large-scale data with images having blur desirability with no annotations. It consists of 9500 images including testing and training dataset, which has been classified under three different categories as natural, artificial and undistorted blurred image dataset. The images in the dataset are challenging blur dataset are used for cross-dataset evaluation and further trained and tested in cross-dataset test. The code runs effectively and reached the desired goal. The code gives the accuracy of blur detection based on image level blur detection with good image quality. Then, we measured the performance of the pre-processing to detect the blur images with threshold value, V = 450 for training set
(9)
V = 420 for testing dataset
(10)
322
H. Kohli et al.
If the variance computed is less then threshold, then it is appended in the blur list, otherwise, it is appended in not blur list. It consists of blur label, if blur is equal to the value 1, then it is blur, otherwise, it is not blur. The following result is the evidence of the effectiveness of the program proposed, and the program run on the bucket of images to further provide the better result. Figure 3 shows pre-processing results. In Fig. 3, it consists of results of pre-processing techniques, where an input in Fig. 3a is processed under two pre-processing techniques that is CLAHE (Fig. 3b) and Laplacian method (Fig. 3c). Here, (a), (b), (c) simplifies input, CLAHE and Laplacian methods, respectively.
(a)
(b)
(c) Fig. 3 Enlarged illustration of two challenging scenarios with pre-processing techniques a inputs. b CLAHE. c Laplacian operator
A Machine Learning Approach to Detect Image Blurring
323
Fig. 4 Accuracy, F1and average precision-recall score for the model
Table 1 Image blur detection comparisons Technique name (network + pre-processing + dataset)
Accuracy (%)
CNN + LAPLACIAN + CERTH
44.86
CNN + LAPLACIAN + CUHK
58.31
CNN + LAPLACIAN + CLAHE + CERTH + CUHK
65.88
Proposed CNN + LAPLACIAN + CLAHE + CERTH + CUHK
79.54
The accuracy is given as an end result for whole dataset; thus, it is the fraction of samples predicted correctly. For the following dataset, we computed its F1 score, which is the harmonic mean of precision and recall which is obtained from scikitlearn [26]. Figure 4 shows the estimated accuracy, F1 score and average precision recall for proposed blur detection technique on used datasets. These are the techniques we implemented as a growth of our results from 44.86 to 79.54%. Where in first approach network used is CNN, pre-processing is Laplacian and dataset is CERTH. In second approach, network used is CNN, pre-processing is Laplacian and dataset is CUHK which is having high resolution images. In third approach, network used is CNN, pre-processing is Laplacian and dataset is created by including the images contained in CUHK, CERTH and CLAHE. In the last proposed sequential model of CNN, we enhanced the model by applying layers and max pooling which enabled an accuracy of 79.54%. The blur detection accuracy compares to other works is shown in Table 1. This depicts that the proposed method accuracy is more for classifying image blurring. Some of the detected image blur outputs can be seen from Table 2. Where it consists of two columns, input image and blurred image.
5 Conclusion This work has been successfully implemented to detect the blur images. A machine learning-based approach is suggested by using CNN. The image blur region is detected and classified into two categories. After processing the image, the pretrained neural network model is used to classify the images. Results are shown using accuracy parameters for each epoch on given datasets containing 8000 photos with elaborate human annotations. The results are tested on various datasets and show that the technique accuracy is satisfactory in terms of blur image detection. As an
324 Table 2 Image blur detection outputs
H. Kohli et al. Input image
Outputs
evaluation parameters, accuracy, F1, average precision-recall score for the model are estimated. In the future, we can explore more techniques and giving more methodologies on removal of blur, where blurred images will be sharpened and smoothed to make it more comprehensive for human use.
References 1. S. Golestaneh, L. Karam, Spatially-varying blur detection based on multiscale fused and sorted transform coefficients of gradient magnitudes, in IEEE Conference on Computer Vision and Pattern Recognition (2017) 2. J. Shi, L. Xu, J. Jia, Discriminative blur detection features. in IEEE Conference on Computer Vision and Pattern Recognition (2014) 3. J. Shi, L. Xu, J. Jia, Just noticeable defocus blur detection and estimation, in IEEE Conference on Computer Vision and Pattern Recognition (2015) 4. C. Tang, C. Hou, Z. Song, Defocus map estimation from a single image via spectrum contrast. Opt. Lett. 38(10) (2013) 5. S. Zhang, X. Shen, Z. Lin, R. Mech, J.P. Costeira, J.M. Moura, Learning to understand image blur, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018)
A Machine Learning Approach to Detect Image Blurring
325
6. J. Kumar, P. Ye, D. Doermann, A dataset for quality assessment of camera captured documents images, in The 2013 Camera-Baes Document Analysis and Recognition, in CBDAR’13 (Washington DS, USA, 2013) 7. J. Burie, J. Chazalon, M. Coustaty, S. Eskenazi, M. Luqman, M. Mehri, N. Nayef, J. Ogier, S. Prum, M. Rusinol, ICDAR 2015 competition on smartphone document capture and ocr(smartdoc), in Document Analysis and Recognition (ICDAR), 2015 13th International Conference on (2015) 8. V.C. Kieu, F. Cloppet, N. Vincent, Adaptive fuzzy model for blur estimation on document images, in Pattern Recognition Letters (2017) 9. F. Kerouh, D. Ziou, A. Serir, Histogram modelling-based no reference blur quality measure, in Signal Processing: Image Communication (2018) 10. M. Atas, Y. Yardimci, A. Temizel, A new approach to aflatoxin detection in chilli pepper by machine vision. Comput. Electron Agric, (2012) 11. Y. Maret, F. Dufaux, T. Ebrahimi, Adaptive image replica detection based on support vector classifier, in Signal Process, Image Communication (2006) 12. R. Wang, W. Li, R. Li, L. Zhang, Automatic blur type classification via ensemble SVM, in Signal Processing: Image Communication (2018) 13. A. Krizhevsky, I. Sutskever, G. Hinton, Imagenet classification with deep convolutional neural network, in Conference on Neural Information Processing Systems (2012) 14. R. Huanga, W. Fenga, M. Fana, L. Wan, J. Sun, Multiscale blur detection by learning discriminative deep features, in Neurocomputing (2018) 15. J. Ma, X. Fan, S. X. Yang, X. Zhang, X. Zhu, Contrast limited adaptive histogram equalization based fusion for underwater image enhancement, pp. 1–27 (2017) 16. A. Agrawal, R. Raskar, Optimal single image capture for motion deblurring, in Computer Vision and Pattern Recognition, 2009. CVPR 2009 (2009) 17. M. Kumar, M. Alshehri, R. AlGhamdi, P. Sharma, V. Deep, A DE-ANN inspired skin cancer detection approach using fuzzy c-means clustering. Mob. Netw. Appl. 25, 1319–1329 (2020). https://doi.org/10.1007/s11036-020-01,550-2 18. Uniqtech, Understand the Softmax Function in Minutes, 31 Jan 2018. Available: https://med ium.com/data-science-bootcamp/understand-the-softmax-function-in-minutes-f3a59641e86d 19. A. Long, Understanding data science classification metrics in Scikit-learn in Python, 6 Aug 2018. Available: https://towardsdatascience.com/understanding-data-science-classific ation-metrics-in-scikit-learn-in-python-3bc336865019 20. E. Mavridaki, V. Mezaris, No-reference blur assessment in natural images using Fourier transform and spatial pyramids, in 2014 IEEE International (2014) 21. K. Khajuria, K. Mehrotra, M.K. Gupta, Blur Detection in Identity Images Using Convolutional Neural Network (IEEE, Shimla, India, 2019). 22. T. Szandała, Convolutional neural network for blur images detection as an alternative for Laplacian method. Researchgate (2020) 23. A. Aggarwal, M. Kumar, Image surface texture analysis and classification using deep learning. Multimed. Tools Appl. (2020). https://doi.org/10.1007/s11042-020-09520-2 24. C. Chen, Q. Yan, M. Li, J. Tong, Classification of blurred flowers using convolutional, in Association for Computing Machinery (China, 2019) 25. Techopedia, Convolutional Neural Network (CNN), 5 September 2018. Available: https://www. techopedia.com/definition/32731/convolutional-neural-network-cnn 26. E. Mavridaki, V. Mezaris, No-reference blur assessment in natural images using Fourier, in Image Processing (ICIP), 2014 IEEE International (2014)
Object Detection for Autonomous Vehicles Using Deep Learning Algorithm E. J. Sai Pavan, P. Ramya, B. Valarmathi, T. Chellatamilan, and K. Santhi
Abstract Self-driving cars is recently gaining an increasing interest from the people across the globe. Over 33,000 Americans are killed in car accidents every year and lots of those accidents are often avoided by implementing the autonomous vehicle detection. Different ways are developed to manage and detect the road obstacles with the help of the techniques like machine learning and artificial intelligence. To resolve the issues associated with the existing vehicle detection like vehicle type recognition, low detection accuracy, and slow speed, many algorithms like the fast and faster region-based convolutional neural networks (RCNNs) are implemented but those were not supportive in real time because of the speed at which they compute and its two-step architecture with the faster RCNN, which is the enhanced version of RCNNs that runs at a speed of 7 frames per second. As it is observed that the CNN family has two steps (object detection and classification), which can reduce the response time in real time with good accuracy and high image resolution. So, the vehicle detection model like YOLOv2 and YOLOv3 is taken into consideration in this paper as they are very useful in real-time detection with a comparatively higher frame rate. As YOLO family of algorithms will mostly use the single step detection and classification. YOLO has an FPS rate of 45 which is pretty good in the real-time scenarios. We had an average of 90.4 using the taken algorithm for each image in this paper with a lower resolution image alone. Keywords Vehicle detection · RCNN · YOLOv2 · YOLOv3 · Bounding boxes and anchor boxes · Image processing E. J. Sai Pavan · P. Ramya · B. Valarmathi (B) Department of Software and Systems Engineering, School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India T. Chellatamilan Department of Computer Applications, School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India K. Santhi Department of Analytics, School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_28
327
328
E. J. Sai Pavan et al.
1 Introduction Object detection is considered as the most important computer-based technical issue that deals with classifying and detecting the image of certain classes. Object detection before the introduction of deep learning was a level by level process, with the help of edge detection and feature extraction by using the existed implementations like HOG, which provides the edge direction as an image outline in the detection. But these images are then compared against the present templates usually at multi-scale to find and localize items present in the image. Then, the deep learning with CNN comes into existence. This involves the two-step object detection algorithms that firstly identifies the regions which might potentially contain an object and then to independently classify each bounding box by using CNNs and region of interest (ROI) pooling layer that got added into the in fast RCNNs for the selection of the object location in the image followed by the detection and the use of anchor boxes in faster RCNNs. For the real-time object detection and classification, many one-step object detection architectures like YOLO family of algorithms, SSD have been proposed, as it uses the default boxes to calculate the scores. All these will help to detect and classify the object in a single stretch. This research work implements the YOLOv2 and YOLOv3 algorithms and provides the results.
1.1 Background You only look once, or YOLO, is one of the fastest object detection algorithms right now. While this not the most accurate object detection algorithm, but it is considered as a very good option, when you need real-time detection, without losing time and accuracy. YOLOv2 has added additional features to the YOLO algorithm for improved performance and accuracy. Furthermore, YOLOv3 is the newest with more accuracy than the other two algorithms.
2 Literature Survey This research work has been proposed to use 3DLiDAR sensors instead of cameras as they were more accurate and dependable for boundary detection, which is the most important for the autonomous vehicles. The whole process was proposed in four steps in the paper and was carried out for analysing different kinds of roads and conditions with different kinds of obstacles under the real-time environment and the accuracy of the outcomes of object detections with or without the obstacles have been proved to be really high [1].
Object Detection for Autonomous Vehicles …
329
In this, there were mainly three contributions to obtain the local position of the object from multiple fusion of cameras, mapping the multiple objects position form the cameras fusion and they proposed a method for curve and steep round using the information of the steering wheels angels from the autonomous vehicles and finding the blind spots in those areas. This will help in finding out the exact vehicle following coordinates for the autonomous vehicles even in the steep slopes [2]. A camera was fixed inside the car and the video was recorded, the obstacle present in the frame was taken as input for detection of the objects. The various steps for detection of objects were segmentation, where the area of interest was obtained from the frame and objects were highlighted. After this obstacle detection on different kinds of roads was performed, the results also show that by segmentation it can also identify almost all the shadows of the objects, unless they were difficult to distinguish as shadow on the roads when they were dark. Also, this algorithm was applied on X-rays, on skin colours detection, etc. [3]. In this paper, the proposed method of object detections by using the consecutive frames of a video from the cameras fixed in the vehicles. There were totally three steps in this method, first the edges of the objects were detected and then the association technique to compute edge points of consecutive images. So that we can get to know how far they were from the vehicle which was autonomously driven. And finally, symmetry was applied to find the region of interests and also the positions, coordinates of the objects were detected. Also, Adabor filter can also be used to differentiate between the vehicles and pedestrians which was not possible in symmetry technique. The results obtained were pretty good [4]. When the cameras were used for detection, it was only 2D so thus they introduced LiDAR provide scope for 3D detection, but the outcome of this process was really poor. So, in this paper, they proposed a method for the fusion of both camera and LiDAR. At the end of paper, it can be known that the accuracy of the model was increased that of compared with the CNN. The object detections which were not evident(or clear) in the normal detection methods were detected accurately by multimodal fusions. Though there were some disadvantages to this model, it does not weigh over the advantages [5]. The article proposed the on-road detection using SSD, but it has disadvantage of missing out of the small objects. In this paper, they were adjusting some of the properties responsible for those drawbacks and comparing the performance with other models. The problem can be overcome by data compensation and augmentation which would help to identify objects which would be in relatively small size. The fine-tuned SSD showed more accuracy and good performance than the basic SSD and YOLO [6]. The fusion of the sensors was taken, for taking input data, both cameras and LiDAR were arranged within the vehicle. The three-dimensional projections from LiDAR were onto four screens giving a different view of the data for better understanding. Also, HMVPN was proposed which was a convolution network as the architecture for the data. There were two convolution neural networks one for RGB and other for LiDAR projections and finally both would be integrated. The IOU was high than most of the faster RCNN [7].
330
E. J. Sai Pavan et al.
This article has thrown light on the various hardware components, cameras or sensors, or other filters which were most affective and accurate for the autonomous vehicles. Also, the generic algorithm involved in the function of this was also given in different steps. Some of hardware and software components methods for were sensors, computers and OpenCV, CUDA, android, respectively. Also, the datasets for training and testing the vehicles were also mentioned which would be quiet helpful in the process [8]. In this paper, the considered detection methods are Sobel method and NGP method, and the results were obtained for them and compared to know the best of them. They have used CALTECH dataset and the whole test detection was done in phases hypothesis generation (HG) and hypothesis verification (HV). Generally, visible camera was used for detection of the object, but during the nights, infrared cameras which were present capture the obstacles on the way. Additional algorithms like SVM and HOG were also used in both methods for edge detecting [9]. For getting the better speed of matching image, the maximally stable external regions method was proposed in this article. To improve speed, some of the unwanted processes involved for region matching was removed like feature point detection. The MSER was combined with the vision-IMU-based obstacle detection and it decreased the space and time complexity in a road environment perception. Also, from results, it was noted that the method has faster processing speed with comparison with others mentioned in the article [10]. The object detection is carried in two stages, first, the image will be divided into many columns, each column was taken as “Stixel” and a net was constructed from them using neural network. In the next step, the adjustment of the results was done by using the neighbouring Stixels. In this, they used only the monocular camera but the result by using this method was better than the stereo cameras or sensors. This method was cost effective also [11]. The CAN Bus has the collection of the data from core control systems of the vehicles. In this paper, they have proposed a neural network with triplet loss network for CAN Bus anomaly detection. They have compared the detection rate with other two algorithms SVM and softmax in the same architecture of the network. In triplet loss function, three random samples were taken from the dataset. The performance was better in the triplet method than the others [12]. They have used both normal camera and thermal cameras for better results. Dataset was created by them, by recording the road drives in the real time, and tests were conducted on them for pedestrian detection. Also, there was tri-focal tensor, with the cameras from three different angles which would help in combining different features from any sources and also to get the performance validated. In this paper, they used HOG and CCF-based methods for detection results have shown and the performance was better of with the CCF than HOG [13]. The detection of the on-road obstacles was done by implementing the faster RCNN algorithm which was a deep learning technique in the GPU framework. In this article, they proposed for the detection of only the objects on the road, the rest on the footpaths and surrounding the roads were masked using the filters. This reduced the area of the detection and thus improved the speed to some extent. For the detected objected, only
Object Detection for Autonomous Vehicles …
331
if the IOU was greater or equal to 0.5, then only the object was recognized inside the bounded rectangular box or else it was rejected. The dataset used for training was not of the Indian roads some of the vehicles (obstacles) which were mostly found in our country like autos could not be recognized. It was proposed in the future studies to improve that feature and make it fully adaptable to detect the Indian roads also [14]. In this paper, there were totally three kinds of obstacles that can be recognized pedestrians, cars and motorcycles. It carries out in two stages of detection for more accuracy. The camera for the car was fixed on the frontal bumper. They have carried out experiments on two different conditions under day light and night time. And finally calculated the average performance from those both. The steps for the obstacle detection were segmentation of the foreground, object detection and estimation of distance from image. The detection rate during day time was obviously more than the night time but the both the detection rates were more than the traditional HOG-based method [15]. For the object detection using the cameras, it was done in two ways using monocular vision-based method or multi-camera system based both of the methods have their disadvantages so another method stereo-vision based was proposed in the paper is better of all. First, the intersection of road was extracted to get all the possible position of the obstacles to be present in region from disparity image and then keeping track of those obstacles by using of the Kalman filter which helped to estimate their next position. The outcome of the results was good [16]. They proposed a hybrid net from SVM and CNN networking algorithms and tested on the datasets Caltech-101 and Caltech-pedestrian. They also took different hybrid neural net algorithms and tested on these datasets. Experimental results of both the datasets gave more recognition rate of the obstacles and less missing rate of the pedestrians. The LM-CNN-SVM methods improved the detection of objects in unsuitable conditions like eliminating the noise like shadows, darkness during the nights, etc. [17]. Along with the detection of vehicles and pedestrians, the autonomous vehicles have to know the location and distance from the stop line which were most essential to follow traffic rules on the roads. But it becomes almost impossible to have a sight of the stop signs once the vehicle reached near the board as the camera cannot capture them. So a method for detection was proposed in this paper. A computer vision algorithm was proposed for this to calculate the stop distance. The accuracy was really high and also the FP rate was also less. But this method proposed can be improvised by taking the input images into RGB instead of converting them grey scale reading as most of the stop sign will be in red colour. This property can be used to detect the signs where it was not clear and blurred most effectively [18]. The presented model YOLOv2 doesn’t use the K-means algorithm but instead they modified it with the feature box. Then, the next step was the choice of the anchor boxes and then after to improve the performance of object detection algorithm they applied hard negative mining. The result of this was compared with the traditional YOLOv2 and faster RCNN and the proposed method was faster than both of them with moderate acceptance rate [19].
332
E. J. Sai Pavan et al.
The main aim of this article was to get to know the uncertainties of the 3D LiDAR autonomous vehicles, and if it cannot detect the environment accurately, then it should alert the user about it and request them to drive by themselves in order to reduce any collisions. There were two uncertainties addressed one was epistemic, that occurs whenever there was any obstacle in the real environment which was not a part of training set. It should actively learn about the obstacle for the future references. The other was aleatoric uncertainty, the earlier can be due to lack of accuracy of detection but this one was influenced by distance of detection and occlusion. By feed-forwarding the data continuously, these uncertainties can be reduced, and the performance can also be improved [20]. The author proposed an improved procedure for handling the voluminous information. Spatiotemporal and video prediction were built using transfer learning techniques for forecasting metropolitan traffic was investigated in this paper [21]. As all the existing algorithms have their pros and cons, but the YOLO family has this one particular advantage on top of all, that is, the FPS. The average FPS for YOLO as mentioned till today is way greater that the other algorithms as it is a single stage detection algorithm. So, the YOLO algorithms can be used in the real-time detections and its accuracy is also not poor to use.
3 Dataset Description and Sample Data The dataset COCO is used in the YOLO implementation by installing the YOLO weights file. Basically, YOLOv2 weights file is directly downloaded from the Google and preplaced in the dark flow which is provided by the YOLO developers. COCO stands for common objects in context. So as the name suggests, the images in COCO dataset are taken out from everyday activities thus by attaching the description to the objects present in the scenes. Let us say we have to detect some object in an image, by looking at the photograph we can tell if the object really is present or not. However, it will be challenging and depends on environment where the photograph was shot without having any noise and bypassing of image. COCO is resourcefulness to collect daily images, and the images that we see in our daily life and provides related information. Those images can contain multiple objects inside same image but each image should be labelled as a multiple object and separated. COCO dataset gives us the context and separation of the objects in the images. COCO dataset can be used by anybody to build an effective object detection model. The coco.names files contain the list of object names the algorithm can detect and this list can be increased in the further algorithms. It is shown in Fig. 1.
Object Detection for Autonomous Vehicles …
333
Fig. 1 Coco dataset file
4 Proposed Algorithm with Flow Chart You only look once, or YOLO, is among the best object detection algorithms. Though it is not very highly accurate, but it is a very good to use this algorithm in real-time detection, as it takes very less time to detect without loss of too much accuracy. The flowchart of YOLO mechanism describes the flow of and the step by step action carried out in the YOLO algorithms. As we can see that the centre point for the flow is the threshold value which asks the bounding box for an object in the image and it is shown in Fig. 2. YOLOv2, one of the best algorithm at business, is more accurate and faster than the older versions. This is because YOLOv2 introduces the concepts like batch normalization and anchor boxes. Batch normalization (BN) normalizes the output of hidden layers which makes learning faster. Anchor boxes are assumed bounding boxes for image detection. 416 × 416 images are considered for training the detection and 13 × 13 feature map is the output. The size of the anchor boxes is pre-defined without having any prior knowledge. YOLOv2 internally takes help of k-means clustering algorithm as it provides us with good IOU scores. YOLOv2s architecture is still behind in few of the metrices which are primary in many algorithms. No remaining blocks, no hop connections and mainly no up sampling. YOLOv3 provides all of these. We should an image of 416 × 416 and the output feature map would be again of size 13 × 13. One detection is done here by
334
E. J. Sai Pavan et al.
Fig. 2 Proposed algorithm flowchart
the help the 1 × 1 detection kernel, providing us with a result detection feature map of 13 × 13 × 255 (Fig. 3). We can look at the architectural diagram for the YOLOv3 algorithm with 106 layers of convolution and the output with various sizes. The slowness of the YOLOv3 compared to YOLOv2 is because of the number of internal convolutional layers but a detailed increase in the accuracy and it is shown in Fig. 4. The SSD, fast RCNN, faster RCNN and YOLOv3 algorithms are giving 22, 5, 17 and 75 FPS, respectively. The YOLOv3 is superior when compared to other existing algorithms like SSD, fast RCNN and faster RCNN. It is shown in Table 1. The YOLOv3 Algorithm is provided with 106 layer of full convolution as an internal architecture. So, the reason for the low speed of YOLOv3 when compared to YOLOv2 is the huge number of fully connected layers in it. This is how the architecture of YOLOv3 now looks like. The 13 × 13 layer is used for finding larger
Object Detection for Autonomous Vehicles …
Fig. 3 YOLOv3 network architecture
Fig. 4 Experimental output
335
336 Table 1 Average FPS for an algorithm
E. J. Sai Pavan et al. Algorithm
Average FPS
YOLOv3
75 FPS
Faster RCNN
17 FPS
Fast RCNN
5 FPS
SSD
22 FPS
objects and 52 × 52 layer is responsible for detecting the smaller objects, finally with the 26 × 26 layer helping to detect medium objects. For an image of same size, YOLOv3 provides us with many bounding boxes than its older version YOLOv2.
5 Experimental Results As you can see the output of an image where the elephant is absolutely detected with a proper bounding box and no overlap and the experimental output is shown in Fig. 4. The FPS of the image depends on the GPU we use as the GPU component plays a major role in the frame rates image detection and the truck with 0.82 accuracy. It is shown in Fig. 5. The accuracy on the car is 0.92 is seen at an average FPS 0.4. It is shown in Fig. 6. In Fig. 7, this image, the accuracy on the person is 0.96. The above screenshots are some of the objects of the video which gave as input to the algorithm for the object detection and classification with bounding boxes around the objects along with the Fig. 5 Experimental result of the frame rates image detection and the truck with 0.82 accuracy
Object Detection for Autonomous Vehicles …
337
Fig. 6 Experimental result of the frame rates image detection and the car with 0.92 accuracy
Fig. 7 Experimental result of the frame rates image detection and the person with 0.96 accuracy
object classification name and accuracy. The accuracy is high and good for a real time with an average of 0.95 accuracy. Since in major cases, the detection is done on the video files. So, every video file is cut down into frames and sent into the algorithm as an image on which the YOLO works to detect the objects in the frame. Time package is imported into the model and the start time of the video is noted and the video runs till the file run length. And, OpenCV is used to deal with the image processing. Cap.read and the BolbFromImage are used to capture frames and re-centre the size of the frame into the respective resolution. The FPS depends on the hardware mostly since the experiment is done on AMD 560X graphics card the frame rate to render and detect at the same time will be very low compared to the present hardware graphics cards like Nvidia thousand series, Amd Ryzen 5th series, etc.
338
E. J. Sai Pavan et al.
6 Result and Discussion The project implementation is based on the object detection using two algorithms YOLOv2 and YOLOv3 algorithms. Both the algorithms have some pre-trained models which are imported as weights during implementation. And threshold value would be set for both for classification of the objects. As we all know that the accuracy and threshold are always between zero and one. So, threshold is loaded into the model at runtime in this case. When you feed any video to the YOLOv2, the runtime is comparatively less as that of YOLOv3 because it runs internally two convolutions, whereas YOLOv2 runs only once. But the accuracy of YOLOv3 is much more than that of YOLOv2. Two video files and two images were taken into consideration regarding the validation purpose. As we saw in the experimental results, the average accuracy of the objects in the image in YOLOv3 was 0.95 depending upon the threshold value and regarding the YOLOv2 the average accuracy was 0.85 and the time depends on the FPS rate. And the average FPS is 40 by the YOLO algorithms. The TensorFlow GPU versions only suits the NVidia graphic cards and allow to render the video at 30fps (frames per second). But since the experimental results are carried out on the AMD Radeon 560X, the frame rate is the only drawback with less than minimal (0.333 fps on average).
7 Conclusion and Future Work After the implementation of both algorithms, it can clearly concluded that the quality of the videos used will affect the detection rate and accuracy of objects. And also the CPU version gives us less frames per seconds (FPS) as compared to the GPU implementation version of TensorFlow. So not only the algorithm but also the other factors like hardware, video quality can impact the accuracy of the output. However, in our implementation, YOLOv3 is most accurate but slower than the YOLOv2. The future work will be to improve the object detection accuracy by using hybrid net and many more to work at the best as there is a risk of life or property in these conditions.
References 1. P. Sun, X. Zhao, Z. Xu, R. Wang, H. Min, A 3D LiDAR data-based dedicated road boundary detection algorithm for autonomous vehicles. IEEE Access 7, 29623–29638 (2019) 2. J. Kim, Y. Koo, S. Kim, MOD. Multi-camera based local position estimation for moving objects detection, in IEEE International Conference on Big Data and Smart Computing (BigComp) (2018), pp. 642–643
Object Detection for Autonomous Vehicles …
339
3. L.A. Morales Rosales, I. Algredo, C.A. Hernandez , H.R. Rangel, M. Lobato, On-road obstacle detection video system for traffic accident prevention. J. Intell. Fuzzy Syst. 35(1), 533–547 (2018) 4. K. Zebbara, M. El Ansari, A. Mazoul, H. Oudani, A fast road obstacle detection using association and symmetry recognition, in 2019 International Conference on Wireless Technologies, Embedded and Intelligent Systems (WITS) (2019), pp. 1–5 5. M. Person, M. Jensen, A.O. Smith, H. Gutierrez, Multimodal fusion object detection system for autonomous vehicles.J. Dyn. Syst. Meas. Control 141(7) (2019) 6. H. Kim, Y. Lee, B. Yim, E. Park, H. Kim, On-road object detection using deep neural network, in2016 IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia) (2016), pp. 1–4 7. J. Zhao, X.N. Zhang, H. Gao, J. Yin, M. Zhou, C. Tan, Object detection based on hierarchical multi-view proposal network for autonomous driving, in2018 international joint conference on neural networks (IJCNN) (2018), pp. 1–6 8. S. Kato, E. Takeuchi, Y. Ishiguro, Y. Ninomiya, K. Takeda, T. Hamada, An open approach to autonomous vehicles. IEEE Micro 35(6), 60–68 (2015) 9. J. Kim, S. Hong, J. Baek, E. Kim, H. Lee, Autonomous vehicle detection system using visible and infrared camera, in2012 12th International Conference on Control, Automation and Systems (2012), pp. 630–634 10. X. Yi, G. Song, T. Derong, G. Dong, S. Liang, W. Yuqiong, Fast road obstacle detection method based on maximally stable extremal regions. Int. J. Adv. Robot. Syst. 15(1) (2018) 11. D. Levi, N. Garnett, E. Fetaya, I. Herzlyia, StixelNet: a deep convolutional network for obstacle detection and road segmentation. BMVC 1(2) (2015) 12. A. Zhou, Z. Li, Y. Shen, Anomaly detection of CAN bus messages using a deep neural network for autonomous vehicles. Appl. Sci. 9(15), 3174 (2019) 13. Z. Chen, X. Huang, Pedestrian detection for autonomous vehicle using multi-spectral cameras. IEEE Trans. Intel. Veh. 4(2), 211–219 (2019) 14. G. Prabhakar, B. Kailath, S. Natarajan, R. Kumar, Obstacle detection and classification using deep learning for tracking in high-speed autonomous driving, in2017 IEEE Region 10 Symposium TENSYMP (2017), pp. 1–6 15. Y.W. Hsu, K.Q. Zhong, J.W. Perng, T.K. Yin, C.Y. Chen, Developing an on-road obstacle detection system using monovision, in 2018 International Conference on Image and Vision Computing New Zealand (IVCNZ) (2018), pp. 1–9 16. Z. Khalid, M. Abdenbi, Stereo vision-based road obstacles detection, in 2013 8th International Conference on Intelligent Systems. Theories and Applications (SITA) (2013), pp. 1–6 17. M. Masmoudi, H. Ghazzai, M. Frikha, Y. Massoud, Object detection learning techniques for autonomous vehicle applications, in 2019 IEEE International Conference of Vehicular Electronics and Safety (ICVES) (2019), pp. 1–5 18. A. Arunmozhi, S. Gotadki, J. Park, U. Gosavi, Stop Sign and stop line detection and distance calculation for autonomous vehicle control, in 2018 IEEE International Conference on Electro/Information Technology (EIT) (2018), pp. 0356–0361 19. Z. Yang, J. Li, H. Li, Real-time pedestrian and vehicle detection for autonomous driving, in2018 IEEE Intelligent Vehicles Symposium (IV) (2018), pp. 179–184 20. D. Feng, L. Rosenbaum, K. Dietmayer, Towards safe autonomous driving: capture uncertainty in the deep neural network for lidar 3d vehicle detection, in 2018 21st International Conference on Intelligent Transportation Systems (ITSC) (2018), pp. 3266–3273 21. T. Senthil Kumar, Video based traffic forecasting using convolution neural network model and transfer learning techniques. J. Innov. Image Process. 128–134 (2020)
CNN Approach for Dementia Detection Using Convolutional SLBT Feature Extraction Method A. V. Ambili, A. V. Senthil Kumar, and Ibrahiem M. M. El Emary
Abstract Early detection of dementia remains as the best solution to prevent the development of various illness. It will help in providing an appropriate treatment to the affected individuals. So the early detection of dementia with higher accuracy is always essential for its treatment. MRI is the common biomarker for diagnosing the dementia. In this analysis, MRI image will be given as an input. The input image is pre-processed by using some morphological operations. This study introduces a convolutional shape local binary texture (CSLBT) for performing feature extraction to extract the key features from the magnetic resonance image. Linear discriminant analysis (LDA) is also used for performing feature reduction. Convolutional neural network can also be employed in the classification process. Keywords Convolutional neural network · Dementia · Deep learning · Magnetic resonance image · Morphological operation · Shape local binary texture
1 Introduction In the recent years, the prevalence of infectious diseases and chronic conditions has raised due to the demographic shifts in developed nations. According to the World Health Organization (WHO), approximately 47 million people are living with dementia worldwide, and this number is expected to increase to 82 million in 2030 and 150 million by 2050 [1]. Magnetic resonance imaging (MRI) is an inevitable factor in the medical field due to its high spatial resolution, soft tissue contrast, and non-invasive properties. MR imaging bestows valuable insights for prognosis and preparing brain tumor treatment [2]. To engender accurate brain images, MRI and computer tomography (CT) take advantage of computer technology. For analyzing A. V. Ambili (B) · A. V. Senthil Kumar PG Research and Computer Application, Hindustan College of Arts and Science, Coimbatore, India I. M. M. El Emary Ph.D. Information Science Department, King Abdulaziz University Jeddah, Jeddah, Saudi Arabia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_29
341
342
A. V. Ambili et al.
the structure of brain activity, MRI or CT scans are used. When compared to the CT scan, MRI scanning is more effective. There is no radiation in it [3]. In comparison with all the other imaging methods, [4] MRI is useful for the use of dementia diagnosis and identification because of its high contrast in soft tissue and its high spatial resolution, and because it produces no harmful radiation and is non-invasive [5–7]. Dementia is a brain disorder of memory, language, problem-solving, and other cognitive skills deteriorating. It incorporates the provision to do everyday tasks for a person. Different ways are available to detect dementia including brief examination, evaluation, challenging the patient’s cognitive capacity, and so on [8]. The result of the analysis would also depend on the evaluator’s knowledge and expertise. Also, there is a possibility of not getting a drawing test, depending on the person’s characters and skills. To detect dementia quickly, such a dementia search can be done without usual constraints [9]. Deep learning (DL) is a machine learning system that has attracted tremendous interest from investigators, smashing criteria in fields like dementia tracking. The potential for efficient representation of original data by coherent nonlinear transformations increases the sophistication and complexity in DL, which varies from traditional machine learning methods [10]. Therefore, the early diagnosis of Alzheimer’s disease (AD) can be naturally modeled to be a multiclass classification problem. This research work suggests a CNN classifier for dementia classification. MRI input image is first pre-processed to reduce the noise present in the image. The preprocessed image is subjected to the morphological operation, which helps extract the brain image boundary areas. Features are extracted using GLCM, statistical, and convolutional SLBT methods. The convolutional SLBT is a combination of convolutional LBP and SLBT [11]. The extracted features are fed to CNN for the classification of dementia. The paper is depicted in the following manner; Sect. 2 gives a literature survey of existing works. Section 3 explains the architecture of the proposed work. Results and discussions are covered in Sect. 4, and the work is concluded in Sect. 5.
2 Literature Survey This segment discusses the research works on numerous current dementia diagnosis strategies. These research papers are taken and reviewed according to the recently research works on dementia detection techniques. Texture and shape analysis was extensively studied in recent years, and studies have discovered advanced methodologies for the classification of dementia. Kawanishi et al. [9] had discussed an anomaly detection method for dementia detection. This system focuses drawing test on a tablet to detect dementia. MRI is an essential tool for brain disease detection [2, 3, 5]. Zhang et al. [12] developed a landmark-based feature extraction method using longitudinal structural MRI. It improves the classification efficiency and eliminates the time-consuming steps. Pachauri et al. discuss the topology-based core algorithm for cortical thickness
CNN Approach for Dementia Detection …
343
measures [13]. It performs well on many statistical inference tasks without selecting features, reducing the dimension, or tuning parameters. In [14], a global grading biomarker is used to predict mild cognitive impairment (MCI) to Alzheimer’s disease conversion based on registration accuracy, age correction, feature selection, and training data selection. This somehow provides a more accurate forecast of the conversion from MCI to AD. Zhou et al. [15] show that when combined with MRI measures, MMSE enhances the prediction performance. MMSE is the most significant statistical variable. It uses the SVM classifier for classification based on twofold cross-validation [16] and proposes a multimodal behavioral variant FTD (bvFTD) classification model to differentiate the pre-symptomatic FTD mutation carriers from healthy controls. Classification performance is calculated using receiver operating characteristic curves. It has a higher performance in classification. In shape and texture feature extraction method, it is not sufficient for extracting sharpen the image. So a combined feature extraction method is used to enhance the classification task.
2.1 Comparitive Analysis
Paper title
Methodology
Limitations
Kawanishi et al. [9]
Unsupervised anomaly detection method
Limitation of test contents
Ehab et al. [5]
A second decision for the surgeons and radiologists is proposed
Limited data base
Vieira et al. [10]
Neuroimaging correlates of psychiatric and neurological disorders
Statistical inferences are drawn from multiple independent comparisons based on the assumption that different brain regions act independently
Studholme et al. [20]
Template-based approach
Larger set of data are not considered for predicting the disease conditions
Ahmed et al. [1]
Neuroimaging and machine learning for dementia diagnosis
limited prodromal stage of dementia with Lewy bodies
3 Proposed Methodology This study’s sole objective is to design and develop a dementia detection framework based on MRI images.
344
A. V. Ambili et al.
Dataset The datasets are collected UCI repository from https://www.nitrc.org/frs/?group_ id=48 and https://www.smir.ch/#featured. Here, images are taken as the base of brain image, with the complete structure of cerebrum and medulla oblongata, focusing as an input structure. The overall procedure of the proposed approach involves pre-processing, feature extraction, feature reduction, and classification. Initially, the input image will be subjected to pre-processing for removing the noise in the image. The feature extraction will then be performed on the pre-processed image using the proposed convolutional shape local binary texture feature, gray level co-occurrence matrix (GLCM) feature, and the statistical quality like mean, variance, standard deviation, kurtosis, and the entropy. This paper proposes a new convolutional SLBT for feature extraction. The convolutional SLBPT feature will be designed by combining SLBT (Shape Local Binary Texture) [11] and convolutional LBP in which the LBP will be modified based on convolution. At last, for the diagnosis of dementia, the classification will be carried out based on CNN. The proposed approach will be in MATLAB and the dataset that will be employed UCI repository. The efficiency of the proposed method will be assessed using measures like sensitivity, specificity, and accuracy. Figure 1 shows the flow diagram of the proposed system (Fig. 2). Fig. 1 Flow diagram of proposed dementia detection using CNN
Input MRImage
Pre-processing
Feature Extraction GLCM features
Statistical Features
Convolutional SLBT Feature
Feature Reduction
Classification
CNN Approach for Dementia Detection …
345
Output
Input Hidden
Depth
Hidden Height
Width
Fig. 2 Convolutional neural network architecture
3.1 Pre-processing Pre-processing has a vital role to play in the segmentation of MRI brain image. The main goal of the pre-processing stage is the image quality and the reduction of noise. Pre-processing steps improves the quality of the MR image. The pre-processing step eliminates the unwanted noise and aversive parts of the image.
3.2 Morphological Operation Imperfections are quite common in images. These imperfections in the images can be eradicated through morphological operations—the material modifications made on the pixels concerning the value of adjacent pixels. Dilation, erosion, opening, and closing are four elementary or fundamental morphological operations. Only dilatation and erosion are carried out in work mentioned. In erosion operation, every pixel that touches background pixels is transformed into background pixels, and the objects are converted into smaller pixels by dividing a single object into multiple objects. As against the erosion in dilation, many objects are merged into a single object [18].
346
A. V. Ambili et al.
3.3 Feature Extraction The proposed work uses an MRI image for the process of identifying dementia. The pertinent and significant features from the image rendering the data provided as input into a less dimensional space are extracted, which is the main feature extraction objective.
3.4 GLCM GLCM is the most popular statistical textural feature extraction method. The feature is derived from the gray-level covariance matrix. GLCM is a textural feature extracting strategy. Textural characteristics show a connection between the pixel values [19]. It is a second-order statistical method. The GLCM image has the same number of columns and rows as the gray forms. This gives an idea about how the co-occurring values are distributed. GLCM stands for image distance and orientation. The GLCM calculates the grayscale pixel value x that appeared in the pixel intensity y image either horizontally, vertically, or diagonally. GLCM provides gray-level data pairs. It comprises different varieties of gray values. It gives the disparity in the picture by statistical analysis compared to the original image. Contrast =
(x − y)2 pc,ø (x, y)
(1)
x,y
pc,ø is the co-occurrence matrix and x and y are pixel intensities. Correlation is the similarity degree of the pixel over the entire image with its neighborhood. correlation =
(x − μi) (y − μ j) pc,ø (x, y) σi σ j x,y
(2)
μi and j are the means of two correlated images σ i and σ j are the standard deviations Energy =
( pc,ø (x, y))2
x,y
3.5 Statistical Features Statistical features are commonly used for the statistical analysis of the image, Mean: mean is the average value of pixels.
(3)
CNN Approach for Dementia Detection …
M=
347
1 p(x, y) AB x,y
(4)
Let p(x, y) is the grayscale at the point (x, y) of the picture dimension A × B. Variance: Variance measures the pixel value distribution. The variance gives a calculation of the difference between each pixel and the average score. It is given as the average of the square of the difference between a mean and individual pixel. variance =
1 p(x · y) − M)2 x·y AB
(5)
Standard deviation (σ ) defines the spread of the gray level around the mean. It is the square root of variance. σ =
1 ( p(x, y) − M x,y AB
(6)
Kurtosis is the measure of tailedness of an image K tos
1 = AB
( p(x, y) − M 4 σ4
(7)
Entropy: Entropy is the measure of dissimilarity in the MRI image. Entropy is the statistical measure of randomness that will characterize the texture of an image. Entropy =
x,y
pd,ø (x, y)log2 [ pd,ø (x, y)]
(8)
3.6 Convolutional SLBT (Convolutional Shape Local Binary Texture) The proposed feature extraction method combines shape local binary texture method and convolutional LBP in which the convolution operation is applied on LBP. The local binary pattern (LBP) is an elementary descriptor embraced in several applications. LBP searches for each center pixel of an image and its localized neighborhood pixel values (X) confined within the window size ascertained by Y. LBP can be eval X −1 T (Pc − Pn )2i , where Pn and Pc are the intensity values uated as LBP X,Y = i=0 of neighborhood pixels and center pixels. The thresholding process is represented by T (·). X = 8 and Y = 1 are the values applied when high-resolution images are taken into consideration. The development of binary pattern for a window of 3 × 3 utilizing the weighted sum of the eight convolutional filters by convolutional LBP can be examined as an instance. Then the LBP using convolution is expressed as,
348
A. V. Ambili et al.
C=
X
σ (bi ∗ V ).wi , i = {1, 2, . . . , X},
(9)
i=1
where * represents convolution and bi is the sparse convolutional filter with two nonzero values {+1, −1}, which convolves with the vectorized input image V. σ denotes Heaviside step function. w = [2X-1 , 2X-2 , …, 20]T constitutes predetermined weight values. X is the pixel count in the neighborhood (e.g., X = 8 with 3 × 3).
3.7 SLBT SLBT is proficient in the extraction of both shape-based and texture-based attributes. A combination of the shape and texture aspects is achieved by reckoning the weights corresponding to each shape feature [11]. The texture, local, and global shapes are obtained by Eq. (10) S = ViT b ST − b ST
(10)
where S is the shape texture parameter, ViT is the eigenvectors, and b ST is the mean vector bST is calculated using [11]. Combination of these two-parameter vectors, we get convolutional SLBT parameter vector which is formulated as, CSLBT = C + S
(11)
3.8 Linear Discriminant Analysis (LDA) LDA is used for pattern recognition. LDA is the traditional method of reducing dimensionality. It helps depict the disparity between groups of data. This transformbased approach improves the proportion between-class variance to the within-class variance throughout every dataset [17].
3.9 Classification Approach Using Convolutional Neural Network Convolutional layers derive local characteristics from the MR image data. The extracted features are passed through convolutional neural network. CNN is a neural network’s most common classification method developed with neurons that have
CNN Approach for Dementia Detection …
349
learned with weights and biases. Each neuron receives some input and performs dot and precedes it in a nonlinear way. The entire network also shows a distinct discernible score function from input image pixels on one end to class scores on the other, and the fully connected layer has a loss function on it. As a part of discerning the local structures, convolution layers heed the neurons. Weight allocation is carried out between the nodes to activate the local traits across the entire input channel. Kernel refers to the shared weights set, which can be utilized predominantly in the convolutional layer to discover the feature maps. CNN encompasses discrete pairs of convolution and pooling layer accompanied with fully connected layers and softmax layer as the terminal layer to foster the outcome generation. The backpropagation (BP) algorithm is employed in the CNN training phase. The algorithm examines optimum weight value by utilizing the gradient descent approach and subsequently reduces the computational intricacy. Normalized values are move to the neural network input layer irrespective of the existence of the pre-processing phase. Every neuron’s aggregate total potential in the hidden layer is computed using these input values and initialization of random weights and accordingly. The output is determined by exploring the capability of the activation function. The successive layers of these steps are repeated. Gradients are availed to rationalize the weights and biases in due course of training by propagating the error signal backward. Learning and momentum measurements are used to increase the learning speed and circumventing the local minima during the convergence. The forgoing the feedforward and feedback estimations are continual operations until the terminating parameter is acquired, which generally refers to the root mean square (RMS) an error value.
4 Result and Discussion MRI images are considered for the detection of dementia. Data are taken from the UCI repository. Here, images are taken as the base of brain image, with the complete structure of cerebrum and medulla oblongata, which focus as an input structure, such that many fields are diverse within the image, where the images are taken from the repository UCI data, where many images are formulated differently, here in our scenario images, which are focused with a frontal view such that different correlation aspect is highlighted. Those are key facts to measure the image intensity in different scopes. 150 subjects were chosen for classification. The subjects were divided into training and testing data set. 100 subjects were used for training and 50 for testing. It is processed through the algorithm to have regular and remedy required for the patient who required corrective action with accuracy and efficiency. The proposed work uses MATLAB for performance evaluation. The input MRI is shown in Fig. 3a. The given input is pre-processed. Figure 3b shows the morphological image used for the separation of the demented and normal brain regions. After feature extraction and reduction, classification is performed on
350
A. V. Ambili et al.
a
b
c
Fig. 3 a Input image; b Morphological image; c Abnormal MRI image
Performance comparison
100 80 60
Accuracy
40
sensitivity
20
specificity
0 SVM
bvFTD
CNN
Fig. 4 Graphical representation of accuracy, sensitivity, and specificity
the enhanced image for accurate dementia detection. Figure 3c shows the abnormal brain image. Also, a judgment on the observed outcome may be either true or false. The adjudication would become one of four possible groups. The noise is observed in the dataset like homogeneous noise, texture noise, magnitude noise, non-uniformity of construction of noise.
4.1 Performance Analysis Performance metrics are essential in the discrimination of deep learning models. In this paper, three metrics, such as accuracy, sensitivity, and specificity metrics, are used for performance evaluation. These measures are estimated based on true positive (T p ), true negative (T n ), false positive (F p ), and false negative (F n ). The evaluation methods are compared to existing methods [12, 16] (Table 1). Accuracy =
T p + Tn Tn + Tn + F p + Fn
(12)
CNN Approach for Dementia Detection … Table 1 Classification evaluation for various techniques
351
Techniques
Accuracy (%) Sensitivity (%) Specificity (%)
SVM [12]
88.3
79.61
59.90
BvFTD [16] 73.5
66.4
64
CNN
89
64
90.1
Sensitivity =
Tp T p + Tn
(13)
Specificity =
Tn Tn + F p
(14)
5 Conclusion We establish a novel approach for feature extraction during this mission, namely convolutional shape local binary texture (convolutional SLBT) along with GLCM and statistical analysis, which improves the diagnostic accuracy. This method decreases redundancy and provides key features to clinical scores. Besides, compared to the existing methods, our approach gains good accuracy in dementia detection. Convolutional SLBT will provide smooth, intensified sharpen and enhanced shape and texture image. The proposed method achieved 90.1% accuracy and 89% sensitivity.
References 1. M.R. Ahmed, Y. Zhang, Z. Feng, B. Lo, O. Inan, H. Liao, Neuroimaging and machine learning for dementia diagnosis: recent advancements and future prospects. IEEE Rev. Biomed. Eng. 1–1. https://doi.org/10.1109/rbme.2018.2886237 2. S. Bauer, R. Wiest, L.-P. Nolte, M. Reyes, A survey of MRI-based medical image analysis for brain tumor studies. Phys. Med. Biol. 58(13), R97–R129 (2013) 3. T.S. Armstrong, Z. Cohen, J. Weinberg, M.R. Gilbert, Imag. Tech. Neuro-Oncol. 20(4), 231– 239 (2004) 4. S. Manoharan, Performance analysis of clustering based image segmentation techniques. J. Innov. Image Process. (JIIP) 2(01), 14–24 (2020) 5. E.F. Badran, E.G. Mahmoud, N. Hamdy, An algorithm for detecting brain tumors in MRI images, in Proceedings of International Conference on Computer Engineering and Systems (ICCES) (2010), pp. 368–373 6. V. Anitha, S. Murugavalli, Brain tumour classification using two-tier classifier with adaptive segmentation technique. IET Comput. Vis. 10(1), 9–17 (2016) 7. J. Naik, S. Patel, Tumor detection and classification using decision tree in brain MRI. Int. J. Comput. Sci. Netw. Secur. (IJCSNS) 14(6), 87 (2014) 8. T. Vijaykumar, Classification of brain cancer type using machine learning. J. Artif. Intell. 1(2), 105–113 (2019)
352
A. V. Ambili et al.
9. K. Kawanishi, H. Kawanaka, H. Takase, S. Tsuruoka, A study on dementia detection method with stroke data using anomaly detection, in Proceedings of 6th International Conference on Informatics, Electronics and Vision and 7th International Symposium in Computational Medical and Health Technology (ICIEV-ISCMHT) (2017), pp. 1–4 10. S. Vieira, W.H.L. Pinaya, A. Mechelli, Using deep learning to investigate the neuroimaging correlates of psychiatric and neurological disorders: methods and applications. Neurosci. Biobehav. Rev. 74, 58–75 (2017) 11. N.S. Lakshmiprabha, S. Majumder, Face recognition system invariant to plastic surgery, in Proceedings of 12th International Conference on Intelligent Systems Design and Applications (ISDA) (IEEE, 2012), pp. 258–263 12. Jun Zhang, Mingxia Liu, Le An, Yaozong Gao, and DinggangShen, “Alzheimer’s Disease Diagnosis using landmark-based Features from Longitudinal Structural MR Images”, IEEE Journal of biomedical and health informatics, vol.21, no.6, pp.1607–1616, 2017. 13. D. Pachauri, C. Hinrichs, M.K. Chung, S.C. Johnson, V. Singh, Topology-based kernels with application to inference problems in Alzheimer’s disease. IEEE Trans. Med. Imag. 30(10) (2011) 14. T. Tong, Q. Gao, R. Guerrero, C. Ledig, L. Chen, A novel grading biomarker for the prediction of conversion from mild cognitive impairment to Alzheimer’s disease. IEEE Trans. Biomed. Eng. 64(1), 155–165 (2016) 15. Qi. Zhou, M. Goryawala, M. Cabrerizo, J. Wang, W. Barker, D. Loewenstein, R. Duara, M. Adjouadi, An optimal decisional space for the classification of Alzheimer’s disease and mild cognitive impairment. IEEE Trans. Biomed. Eng. 61(8), 2245–2253 (2014) 16. R.A. Feis, M.J.R.J. Bouts, J.L. Panman, L.C. Jiskoot, E.G.P. Dopper, T.M. Schouten, F. Vos, J. Grond, J.C. van Swieten, S.A.R.B. Rombouts, Single-subject classification of pre-symptomatic frontotemporal dementia mutation carriers using multimodal MRI. NeuroImage Clin. (2019) 17. J.-S. Wang, W.-C. Chiang, Y.-L. Hsu, T.C. Yang, ECG arrhythmia classification using a probabilistic neural network with a feature reduction method. Neurocomputing 116, 38–45 (2013) 18. S. Goswami, L.K.P. Bhaiya, A hybrid neuro-fuzzy approach for brain abnormality detection using GLCM based feature extraction, in 2013 International Conference on Emerging Trends in Communication, Control, Signal Processing and Computing Applications (C2SPCA) (Bangalore, 2013), pp. 1–7. https://doi.org/10.1109/C2SPCA.2013.6749454 19. P. John,Brain tumor classification using wavelet and texture-based neural network. Int. J. Sci. Eng. Res. 3(10), 1–7 (2012) 20. C. Studholme, V. Cardenas, E. Song, F. Ezekiel, A. Maudsley, M. Weiner, Accurate templatebased correction of brain MRI intensity distortion with application to dementia and aging. IEEE Trans. Med. Imag. 23(1) (2004)
Classification of Ultrasound Thyroid Nodule Images by Computer-Aided Diagnosis: A Technical Review Siddhant Baldota and C. Malathy
Abstract The thyroid is an indispensable gland of the human endocrine system that secretes hormones which have a significant effect on the metabolic rate and protein synthesis. The thyroid is susceptible to a variety of disorders. One among them is the formation of a thyroid nodule, an extraneous mass formed at the thyroid gland requiring medical attention and diagnosis. About 5–10 nodules out of 100 are malignant (cancerous). When a formation of nodules is discernible to the doctors, they call for a diagnostic blood test, often perfunctory, and do not differentiate malignant and benign tumours. This is where ultrasonography comes across as a better option. Automation of ultrasonography diagnosis results in a decrease in reporting time as well as provides a provisional diagnosis before the doctors’ expert opinion. Thus, deep learning methods were suggested and produced better results. Initially, region of interest (ROI) and feature extraction were done before applying machine learning models like support vector machines and multilayer perceptrons. Nevertheless, the feature selection required in machine learning methods was a long-drawn process, often involving the elements of trial and error. Deep convolutional neural networks along with histogram of gradients (HOG)-aided feature extraction which was used along classifiers have yielded high specificity and sensitivity values along with the accuracy. In this paper, we have studied and compared the efficacy of the application of various deep convolutional networks for the diagnosis of malignancy in thyroid nodules. Keywords Computer-aided diagnosis (CAD) · Deep learning · Machine learning · Malignancy · Thyroid nodules · Ultrasound (US) · Ultrasonography(USG)
S. Baldota · C. Malathy (B) Department of Computer Science and Engineering, SRM Institute of Science and Technology, Kattankulathur 603203, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_30
353
354
S. Baldota and C. Malathy
1 Introduction The application of artificial intelligence (AI) in healthcare has been a topic of discussion, experimentation, and application for quite some time now. The rise of AI in healthcare has coincided with population growth, leading to a larger number of people affected by various kinds of diseases. Over the last ten years, the people of India, especially the youth, who have been reportedly affected by thyroid cancer, have been more than any other decade. One of the possible reasons for this is that a larger number of people have become aware and get themselves tested by a radiologist. However, due to the increased amount of testing, the time required for reporting the results for each individual has increased. To counter this problem of latency between testing and results, the automation of provisional reports via AI has come to the fore. Particularly, machine learning models have been used over the years to give a prediction of whether a thyroid tumour is malignant or not. However, machine learning models require optimal feature selection, large datasets to train themselves on, time and resources, expert interpretation, and expensive feature engineering. All of this adds to the latency of diagnosis, which defeats its purpose. To overcome this problem, supervised deep learning has been used. Supervised deep learning is a branch of machine learning dealing with deep artificial neural networks, weights, biases, loss functions, gradient optimizers, activation units, and more. The process of feature selection and engineering is automated by using deep learning. Along with the testing, the system for reporting too has transformed. Previously, a binary system was used to classify between malignant and benign thyroid nodules. However, this system had some disadvantages. First, the misclassification rate was high as there were a number of false positive and false negatives per class—as only two classes were present: benign and malignant. Changes came about when Horvath et al. [1] presented a reporting system called Thyroid Image Reporting and Data System which consisted of a spectrum of malignancy illustrated in Fig. 2. This spectrum sets the paradigm for the classification of thyroid nodules and their diagnosis. As shown in Fig. 1, the ultrasound report of the thyroid nodule is analysed by the medical expert as well as the computation algorithm. After analysis of the report, each entity gives its opinion. If these opinions match, then a final decision is made and the nodule is deemed as either strictly malignant or strictly benign. In the case of a difference in opinion between the computer and the medical expert, a team of medical expert analyses the nodule, providing a collective perspective. In case of TIRADS, the final decision can be of six different categories as shown in Fig. 2. This paper reviews the computer-aided diagnosis (CAD) done on both the binary classification of thyroid nodules into malignant and benign and explains the need of using TIRADS.
Classification of Ultrasound Thyroid Nodule Images …
355
Fig. 1 Computer-aided diagnosis system currently in use
Fig. 2 Computer-aided diagnosis system using TIRADS
2 Literature Survey We began our survey of the existing research and review work in the diagnostic approach by doctors for thyroid nodules. Hena et al. addressed the problem of malfunction of the thyroid and its diagnosis by discussing various evaluation methods for diagnosis [2], particularly in India. They discussed that the American Association of Clinical Endocrinologists (AACE) had set up a systematic step-wise approach for diagnosis. The AACE suggested ultrasonography (USG) when they were showing signs of being malignant. Moreover, USG diagnosis proves to be a brilliant supporting factor for aspiration biopsy which has a false positive rate upwards of 7% and a false negative rate ranging between 1 and 7%. Similarly, Geanina et al. evaluated the different diagnostic studies including serum markers, thyroid USG, elastography,
356
S. Baldota and C. Malathy
fine needle aspiration (FNA) biopsy, cytologic diagnosis, and indeterminate cytology [3]. Out of these, FNA along with USG was one of the linchpins of thyroid diagnosis, giving a justification of lower false negatives when FNA is used with USG to support it in comparison to the methods involving palpations. To provide a second opinion to medical professionals, automation of the USG reports is followed. Automation of thyroid nodule diagnosis has been largely due to the application of machine learning and deep learning. Machine learning techniques like support vector machine (SVM) [4], introduced by Vapnik, were used to detect cancerous diseases [5]. Training algorithms from real-world examples and problems were taken to train the support vector machine. The particle swarm optimization [6] by Kennedy et al. was used to classify using iterations. This method was analogous to the incipient movement of birds flying in a flock, looking for food. Principles of quantum physics for particles were applied to a modification of this method called quantum-behaved particle swarm optimization [7]. Apart from these complex training procedures, the process of feature selection [8] in SVM was involved. MeiLing et al. [9] worked on expanding the function of SVM from giving good results binary classification to multiple classes, using a recursive feature elimination (RFE) approach. Using a zoo dataset consisting of 16 features, a class label over 100 test samples, and a dermatology dataset having 33 features, one class label, and over 360 test samples and 16 features, they performed reverse order sorting on the features based on descriptive power. After sorting, new sets of features were selected by RFE, and hyperparameters were tuned. Finally, the feature sets were fed to the SVM to obtain over 95% classification accuracy. The process of SVM-RFE is depicted in Fig. 3. This long-drawn process of feature selection, however accurate, is not optimal as the time latency is high as well as sorting time complexity. Therefore, SVMs are not an ideal method to automate the process of thyroid diagnosis. Zhi Fei et al. [10] proposed the categorization of medical images of two standard datasets, ISIC2017, and HIS2828, for computer-aided diagnosis (CAD) using high-level feature extraction using a deep convolutional neural network (DCNN), fusing these high-level features extracted from convolutional filters and rectified linear units (ReLu) [11] combined with traditional features, like the histogram of colours, moments of colour, shapes, structural and textural features, subsequently using a multilayer perceptron (MLP) classifier as shown in Fig. 4. The SVM using
Fig. 3 Support vector machine—recursive feature elimination
Classification of Ultrasound Thyroid Nodule Images …
357
Fig. 4 Methodology used by Zhi Fei et al. which uses a combination of high- and low-level features
traditional features was outperformed by the model used by them. They obtained an accuracy of 90.1% on ISIC2017 and 90.2% on the HIS2828. However, this process relied on some traditional feature extraction, which dampens the purpose of automation. Before we found out about deep learning methods to perform thyroid nodule diagnosis, there were a few problems that needed to be addressed. A common problem identified was class imbalance in medical datasets. The healthy cases or benign cases occur far more in quantity in comparison to malignant cases since a disease generally affects only a small part of the global population. The problem with the class imbalance in medical data leads to the model is highly biased, which makes accuracy a redundant metric. Sara et al. were cognizant of the class imbalance problem and developed a costsensitive method [12] inspired by the least mean square (LMS) technique [13]. The LMS method reduces the average quadratic error to its minimum by determining the solution to a set of linear equations. Similarly, they solve the problem of class imbalance by penalizing errors of various samples using distinct weights, based on the cost-sensitive protraction of LMS. This method was an improvement from the traditional upsampling or downsampling of the deficient or superabundant class, which can change in the organization of datasets [14]. An exhaustive review of the sampling of classes is found at [15]. One more issue medical datasets face is the number of images. The quantity of the data in the form of images for thyroid nodules available through public platforms is significantly low in volume in comparison to, say, for instance, plant data or face
358
S. Baldota and C. Malathy
recognition datasets. Since deep learning is a data-hungry process [16], the performance of deep learning models, even the ones used in the medical scenario, becomes better with an increase in data. Data augmentation, promulgated by Martin et al. in [17], provides a solution to this problem. A wide range of augmentation tricks exists— some of which include random cropping, adding and reducing noise, jittering, rotation, zooming, shearing, hue saturation value (HSV) filtering used by Hyuon-Koo et al. in [18], Gaussian filtering, horizontal and vertical flipping, resizing, rescaling, and introduction lighting and contrast in images. Zeshan et al. have worked in [19] to determine which augmentation strategies for datasets to be fed into deep learning models benefit medical image classifiers the most by making them more differential. Flips—both horizontal and vertical, the introduction of Gaussian noise, jittering, scaling, raising to a power, Gaussian blurring, rotations, and shearing were done on each image of a balanced dataset consisting of 1650 abnormal and 1651 normal cases with an 80–20 train validation split from the Digital Database of Screening Mammography [20]. After this, every image was then cropped to 500 × 500 to hold its resolution when it was fed to VGG-19 [21] in the size of 224 × 224. Effective plans of action were found out to be using flips, which yielded a validation accuracy of 84%, and gaussian filters, which yielded 88% on the validation set. Conversely, the addition of Gaussian noise worsened the results, rendering accuracy of 66% on the validation data. Nima et al. [22] worked another pertaining question related to deep learning for medical images. The issue was whether to train deep learning models from scratch with random initialization of weights or to perform transfer learning pretrained on ImageNet [23] weights. They took up four different datasets from three unique medical applications related to computed tomography, cardiology, and gastroenterology. After training and inference, a conclusion appeared that in three datasets out of the four layer-wise transfer works better. Moreover, the worst performance of transfer learning was equal to the training from scratch. Dhaya et al. [24] worked on detection of Covid-19 from chest radiographs by using transfer learning on deep learning architectures like modifications of Inception (GoogleNet) architectures like InceptionResNetV2, InceptionV3, and a residual neural network like ResNet50. Data was taken from various sources from GitHub. The dataset included 50 chest X-ray images affected by Covid-19 as well as 50 normal chest X-ray images. ResNet50 gave the best results with an accuracy of nearly 98%. The work done in [24] highlights the importance of transfer learning on pretrained architectures. Fatemeh et al. performed image augmentation for automated thyroid nodule detection from USG images to be fed into their proposed Mask R-CNN model in [25]. It was observed that shearing, warping, and addition of noise to the images did not fit the application whereas naive strategies like flipping, rotations, scaling seemed to work well. Even if the use case of [25] was different, they too worked on ultrasound (US) thyroid nodules images belonging to the same distribution. From [19, 25], we infer that simple augmentation techniques work well with thyroid nodule US data. Sai et al. used deep learning methods for CAD of the thyroid US [26]. They used open-source TIRADS data consisting of 298 images and a local dataset that was
Classification of Ultrasound Thyroid Nodule Images …
359
labelled by a medical professional. For the TIRADS data, they grouped the images in malignant and benign by labelling levels 2 and 3 as benign and 4 and above as malignant. They compared three methods: (a) training a convoluted neural network consisting of three convolutional layers with 3 × 3 filters (as shown in Fig. 5) from scratch (b) transfer learning on VGG-16 [21] and Inception-v3 [27] by adding a bottleneck and an SVM as shown in Figs. 6 and 7 to classify. The features of the bottleneck were obtained in the previously trained CNN. (c) They used Inception-v3 and VGG-16 as their baseline models to perform fine-tuning on, using softmax to classify as shown in Figs. 8 and 9. VGG-16 with SVM resulted in giving the best results with an accuracy of 99%, a sensitivity of 1, and a specificity of 0.82.
Fig. 5 Architecture of convolutional neural network which was trained from scratch using random weights
Fig. 6 InceptionV3 or VGG-16 bottleneck features passed through convolutional neural network
Fig. 7 InceptionV3 or VGG-16 bottleneck features passed through SVM classifier
360
S. Baldota and C. Malathy
Fig. 8 Transfer learning on inceptionV3
Fig. 9 Transfer learning on VGG-16
Tianjiao et al. [28] classified malignant and benign thyroid nodule US images by using a combination of deep learning feature maps and histogram of gradients (HOG) feature extraction. They performed transfer learning on the pretrained VGG19 [21] network and obtained the feature maps from the third pooling layers and first two fully connected layers. These high-level feature maps were fused with scale-invariant feature transform (SIFT) local binary patterns (LBP) and HOG, by directly connecting the features from VGG19, LBP, SIFT and then performed feature selection based on differential sorting of the nature of samples. Another technique used for fusion was a voting strategy on each of the feature extraction methods by finding out the mode of the classification results. The feature maps of the third pooling layer of VGG-19, when fused using the feature selection strategy, with the extracted features—HOG, LBP, and SIFT—yielded the highest accuracy, sensitivity, specificity, and area under the receiver-operating curve (AUC) of magnitude 93.1%, 0.908, 0.945, and 0.977, respectively. Olfa et al. [29] used residual convolutional neural networks (ResNet) [30] for CAD of 3771 US images, having a benign–malignant ratio of 1316 images to 2455 images.
Classification of Ultrasound Thyroid Nodule Images …
361
They performed transfer learning on the ResNet50 architecture. They claimed to obtain better accuracy than VGG-19. Meng et al. [31] improved upon this by using region of interest (ROI) extraction by performing automate the segmentation process on thyroid ultrasound images using U-Nets [32]. Data augmentation was done on the ROI using variational autoencoders [33]. The augmented data along with the original ROI was fed to a pretrained Resnet50 model. An accuracy of 87.4%, 0.92 sensitivity, and 0.868 specificity were obtained by the ResNet50 when ROI extraction and autoencoder augmentation was used. Without the augmentation process, transfer learning on ResNet50 yielded 85.1% accuracy, 0.923 sensitivity, and 0.824 as specificity. Avola et al. used a knowledge-driven approach in [34], wherein the opinions of experts on ultrasound images are concatenated and passed through dense layers. The weights from the dense layers are combined with the weights obtained from the feature maps extracted performing transfer learning on densely connected convolutional neural network (DenseNet) [35] pretrained on ImageNet [23] weights. After the fusion of weights, they are passed through three more dense layers, after which they traverse through a softmax activation unit which gives the resultant output. Ensemble learning was done where DenseNet was replaced by distinctive DCNN architectures. Moreover, while performing transfer learning, a particular percentage of layers were unfrozen from the baseline model. DenseNet, when its layers were frozen up to 25%, resulted in the accuracy over the cross-validation set. Young Jun et al. reviewed the application of deep learning to US images to thyroid nodules in [36]. They identified that the problems faced while using deep learning in US thyroid nodule images. First, the image selection should be done by highly skilled clinicians. Next, the resolution of the US images should be of high quality and has a standard resolution throughout the distribution. Moreover, the number of images should be adequate as deep learning requires a lot of data [16]. Also, the labelling should be done accurately to identify the diseases properly. A poorly labelled dataset for classification is one of the worst things that could happen to the model, even if it is quite robust. The most impending thing though is the classification of images not just into malignant or benign but into the indeterminate TIRADS proposed in [1]. The difference between images of TIRADS [1] is best illustrated by Aishwarya et al. [37]. Some of the parameters used while TIRADS scoring is done based on morphological and physical features like the dimensions of the lesion, the structure of the lesion, the response to ultrasound waves by the nodule, the internal component of the nodule, the formation of calcium on the nodule tissue, perinodular halo and vascularity of colour Doppler. Several criteria were set at one-point increment to the final score. The final scoring decided the level of suspicion of malignancy. The authors of [37] concluded that TIRADS was reliable, accessible, and convenient. Table 1 highlights the current methods used on ultrasound data and their shortcomings, by summarizing the survey of literature.
Zoology and dermatology images
ISIC2017 and HIS2828
“SVM-RFE-based feature selection and Taguchi parameters optimization for multiclass SVM classifier” [9]
“Medical image classification based on deep features extracted by deep model and static feature fusion with multilayer perceptron” [10]
“Exploring image Open-source TIRADS classification of thyroid data ultrasound images using deep learning” [28]
Dataset used
Title
Methodologies used
Grouped the images in malignant and benign by labelling levels 2 and 3 as benign and 4 and above as malignant. Classification of thyroid ultrasound images into malignant and benign
Categorization of medical images
Used a CNN trained from scratch. Use VGG16 and InceptionV3 bottleneck features with the CNN and SVM. Fine-tuning on VGG-16, InceptionV3
High-level features extracted from convolutional filters combined with traditional features, classified using a multilayer perceptron (MLP) classifier
Expanding the function SVM with recursive of SVM for achieving feature elimination better results from binary classification to multiple classes
Problem statement
Table 1 Summary of literature survey highlighting the shortcomings of the existing work
VGG-16 with SVM resulted in giving the best results with an accuracy of 99%, a sensitivity of 1, and a specificity of 0.82
They obtained an accuracy of 90.1% on ISIC2017 and 90.2% on the HIS2828
Over 95% classification accuracy using SVM-RFE
Conclusion
(continued)
Even if work was done on TIRADS data, binary classification of strictly malignant or strictly benign was performed
However, this process relied on some traditional feature extraction, which dampens the purpose of automation
Feature selection is tedious and not optimal as it leads to high time complexity or sorting as there is latency in results
Limitation
362 S. Baldota and C. Malathy
Dataset used
Problem statement
Methodologies used
Ensemble learning was done where DenseNet was replaced by distinctive DCNN architectures. Moreover, while performing transfer learning, a particular percentage of layers were unfrozen from the baseline model
Open-source TIRADS data
“Knowledge-driven learning via experts consult for thyroid nodule classification” [26]
Classified three categories of TIRADS
3771 ultrasound images, Classified malignant and Transfer learning with having benign:malignant benign thyroid nodule residual convoluted ratio 1316:2455 ultrasound images neural networks (ResNet). Used the ResNet-50 architecture
Classified malignant and Used a combination of benign thyroid nodule deep learning feature ultrasound images maps obtained from transfer learning on the pretrained VGG-19 model and HOG feature extraction. They performed
“Thyroid nodules classification and diagnosis in ultrasound images using fine-tuning deep convolutional neural network” [31]
“Classification of Thyroid ultrasound data thyroid nodules in ultrasound images using deep model-based transfer learning and hybrid features” [30]
Title
Table 1 (continued)
Binary classification of strictly malignant or strictly benign was performed
Binary classification of strictly malignant or strictly benign was performed
Limitation
DenseNet, when its Classification done for layers were frozen up only three categories of to 25% resulted in the TIRADS accuracy over the cross-validation set
Better accuracy than VGG-19
Obtained an accuracy, sensitivity, specificity and AUROC of 93.1%, 0.908, 0.945, and 0.977, respectively
Conclusion
Classification of Ultrasound Thyroid Nodule Images … 363
364
S. Baldota and C. Malathy
3 Discussion The survey of existing literature helps us to get acquainted with and identify the current practices, algorithms, and methodologies used in this computer-aided diagnosis of ultrasound image segmentation. In regard to medical dataset, we address the issue of data and class imbalance, lower quantity of images, and the decision to use transfer learning or training deep learning models from scratch. From the work analysed, we conclude that the class imbalance problem in medical data is best solved by using a cost-sensitive approach. Simple augmentation methodologies, like flipping and the application of Gaussian filters, tend to work better than complex augmentation strategies, on ultrasound thyroid images. Transfer learning on pretrained imagenet weights was found out to work better than training from scratch, more often than not. Machine learning algorithms like support vector machines and multilayer perceptrons yield high accuracies using iterative and recursive training methods but have faced a lot of latency issues in feature selection and hyperparameter tuning. Hybrid methodologies do tend to work much better but they still involve feature selection. The process of feature selection can be automated by using deep learning methods. Training with ResNet50, DenseNet, VGG-16, VGG-19 or Inception-v3, all pretrained on imagenet weights are efficient models for performing transfer learning. Transfer learning with VGG-16 gives brilliant results when paired with a bottleneck, whose features are selected from a feature map obtained by training a naive convoluted neural network from scratch and an SVM classifier. Using baseline models like ResNet50 fed with ROI extracted data from U-Nets and augmented data from variational encoders works much better than without augmentation. Nevertheless, all of this was computed on binary classification of diseases for thyroid nodules which assigns them a value strictly malignant or strictly benign. Since malignancy follows more of a spectrum than a binary discrimination, the need for Thyroid Image Reporting and Data System (TIRADS) has become more significant than ever. Some work has been carried out regarding automation of diagnosis on this system but has not been definitive.
4 Dataset The dataset intended to be used in future work is taken from the Thyroid Digital Image Database, an open-source database of ultrasound images of thyroid nodules. Presently, this database includes a collection of beta tested ultrasound images pertaining to TIRADS, made by expert radiologists. This data is collected from of 389 patients and consists of more than 420 annotated ultrasound thyroid images. Figure 10 shows a small batch of the images from the dataset. We intend to parse the annotations and the TIRADS category to perform ROI based image classification.
Classification of Ultrasound Thyroid Nodule Images …
365
Fig. 10 Batch of thyroid ultrasound images consisting of TIRADS
5 Conclusion The review of existing literature highlights the importance of deep learning in CAD for ultrasound thyroid images. Deep learning models like VGG-16, VGG-19, ResNet50, DenseNet, and Inception-v3 have automated the process of feature selection, resulting in the reduction of reporting time and providing a quicker and accurate second opinion to medical professionals. From the literature survey, it is observed that transfer learning may work better than other existing training models discussed. Moreover, we recognize the need for implementation of TIRADS diagnosis using deep learning. The current work largely deals with binary classification of thyroid nodules into malignant and benign. However, one needs to understand that computer-aided diagnosis is not meant to replace medical professionals completely but to provide supporting evidence to their diagnosis. The problem with binary classification is that it deterministically gives the result for the presence or absence of cancer in the thyroid nodule. Such deterministic behaviour results in a greater number of false positive and false negative reports. Even a single false positive or false negative report, especially in terms of binary classification of the malignancy or benignity of a thyroid nodule, can prove to be fatal. This discrepancy has developed the motivation for using an indeterminate spectrum like TIRADS.
366
S. Baldota and C. Malathy
Our future work will be to perform transfer learning on existing architectures on TIRADS data by following the general thumb rules of balancing classes and using simple augmentation techniques like flipping, rotations, scaling, and Gaussian filtering. Our future work panders to our goal which is to design an accurate and robust diagnostic system for TIRADS using transfer learning on state of the art architectures.
References 1. E. Horvath, S. Majlis, R. Rossi, C. Franco, J.P. Niedmann, A. Castro, M. Dominguez, An ultrasonogram reporting system for thyroid nodules stratifying cancer risk for clinical management. J. Clin. Endocrinol. Metabol. 94(5), 1748–1751 (2009). https://doi.org/10.1210/jc.2008-1724 2. H.A. Ansari, S.M. Vasentwala, N. Saeed, K. Akhtar, S. Rehman, Diagnostic approach to thyroid nodules. Annals Int. Med. Dental Res. (2016). https://doi.org/10.21276/aimdr.2016.2.6.PT1 3. G. Popoveniuc, J. Jonklaas, Thyroid nodules. Med. Clin. North Am. 96(2), 329–349 (2012). https://doi.org/10.1016/j.mcna.2012.02.002.PMID:22443979;PMCID:PMC3575959 4. V.N. Vapnik, The Nature of Statistical Learning Theory (Springer, New York, 1995). https:// doi.org/10.1007/978-1-4757-3264-1 5. N.H. Sweilam, A.A. Tharwat, N.K. Abdel, Support vector machine for diagnosis cancer disease: a comparative study. Egypt. Inform. J. 11(2), 81–92 (2010). https://doi.org/10.1016/j.eij.2010. 10.005 6. J. Kennedy, R. Eberhart, Particle swarm optimization, in Proceedings of ICNN ‘95—International Conference on Neural Networks, vol. 4 (Perth, WA, Australia, 1995), pp. 1942–194848. https://doi.org/10.1109/ICNN.1995.488968 7. X. Fu, W. Liu, B. Zhang, H. Deng, Quantum behaved particle swarm optimization with neighborhood search for numerical optimization. Math. Probl. Eng. Article ID 469723, 10 pp. (2013). https://doi.org/10.1155/2013/469723 8. M.H. Nguyen, F.D.L. Torre, Optimal feature selection for support vector machines. Pattern Recogn. 43(3) (2010). https://doi.org/10.1016/j.patcog.2009.09.003 9. M.-L. Huang, Y.-H. Hung, W.M. Lee, B.-R. Jiang, SVM-RFE based feature selection and Taguchi parameters optimization for multiclass SVM classifier. Sci. World J. Article ID 795624. https://doi.org/10.1155/2014/795624 10. Z. Lai, H.F. Deng, Medical image classification based on deep features extracted by deep model and static feature fusion with multilayer perceptron. Comput. Intell. Neurosci. Article ID 2061516. https://doi.org/10.1155/2018/2061516 11. V. Nair, G.E. Hinton, Rectified linear units improve restricted Boltzmann machines, in Proceedings of the 27th International Conference on Machine Learning (Haifa, 2010), pp. 807–814 12. S. Belarouci, M. Chikh, Medical imbalanced data classification. Adv. Sci. Technol. Eng. Syst. J. 2, 116–124. https://doi.org/10.25046/aj020316 13. D.-Z. Feng, Z. Bao, L.-C. Jiao, Total least mean squares algorithm. IEEE Trans. Signal Process. 46(8), 2122–2130 (1998). https://doi.org/10.1109/78.705421 14. X.-Y. Liu, J.X. Wu, Z.-H. Zhou, Exploratory under-sampling for class-imbalance learning, in Proceedings of the 6th International Conference on Data Mining (ICDM ‘06) (IEEE, Hong Kong, 2006), pp. 965–969 15. H. He, E.A. Garcia, Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009) 16. C. Sun, A. Shrivastava, S. Singh, H. Mulam, Revisiting unreasonable effectiveness of data in deep learning era, in IEEE International Conference on Computer Vision (ICCV) (2017), pp. 843–852. https://doi.org/10.1109/ICCV.2017.97
Classification of Ultrasound Thyroid Nodule Images …
367
17. M.A. Tanner, H.W. Wing, The calculation of posterior distributions by data augmentation. J. Am. Stat. Assoc. 82(398), 528–540 (1987). https://doi.org/10.2307/2289457 18. H. Kim, J. Park, H.-Y. Jung, An efficient color space for deep-learning based traffic light recognition. J. Adv. Transp. (2018), pp. 1–12. https://doi.org/10.1155/2018/2365414 19. Z. Hussain, F. Gimenez, D. Yi, D. Rubin, Differential data augmentation techniques for medical imaging classification tasks, in AMIA ... Annual Symposium Proceedings. AMIA Symposium (2017), pp. 979–984 20. R. Lee, F. Gimenez, A. Hoogi, K. Miyake, M. Gorovoy, D. Rubin, A curated mammography data set for use in computer-aided detection and diagnosis research. Sci. Data 4, 170177 (2017). https://doi.org/10.1038/sdata.2017.177 21. S. Karen, Z. Andrew, Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556v6 22. N. Tajbakhsh, J. Shin, S. Gurudu, R.T. Hurst, M.B. Gotway, J. Liang, Convolutional neural networks for medical image analysis: full training or fine tuning? IEEE Trans. Med. Imaging 35(5), 1299–1312 (2016). https://doi.org/10.1109/TMI.2016.2535302 23. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A.C. Berg, L. Fei-Fei, ImageNet large scale visual recognition challenge. Int. J. Comput. Vision (2014). https://doi.org/10.1007/s11263-015-0816-y 24. R. Dhaya, Deep net model for detection of Covid-19 using radiographs based on ROC analysis. J. Innov. Image Process. (JIIP) 2(03), 135–140 (2020) 25. F. Abdolali, J. Kapur, J.L. Jaremko, M. Noga, A.R. Hareendranathan, K. Punithakumar, Automated thyroid nodule detection from ultrasound imaging using deep convolutional neural networks. Comput. Biol. Med. 122, 103871 (2020). https://doi.org/10.1016/j.compbiomed. 2020.103871 26. K.V.S. Sundar, K. Rajamani, S. Sai, Exploring image classification of thyroid ultrasound images using deep learning, in Published in Proceedings of the International Conference on ISMAC in Computational Vision and Bio-Engineering 2018 (ISMAC-CVB) (2019). https://doi.org/10. 1007/978-3-030-00665-5_151 27. C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the inception architecture for computer vision, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Las Vegas, NV, 2016), pp. 2818–2826. https://doi.org/10.1109/CVPR.2016.308 28. T. Liu, S. Xie, J. Yu, L. Nia, W. Sun, Classification of thyroid nodules in ultrasound images using deep model based transfer learning and hybrid features, in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (New Orleans, LA, 2017), pp. 919–923. https://doi.org/10.1109/ICASSP.2017.7952290 29. O. Mouse, Khachnaoui, R. Guetari, N. Khlifa, Thyroid nodules classification and diagnosis in ultrasound images using fine-tuning deep convolutional neural network. Int. J. Imag. Syst. Technol. 30(1) (2019). https://doi.org/10.1002/ima.22363 30. K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Las Vegas, NV, 2016), pp. 770–778. https://doi.org/10.1109/CVPR.2016.90 31. M. Zhou, R. Wang, P. Fu, Y. Bai, L. Cui, Automatic malignant thyroid nodule recognition in ultrasound images based on deep learning, in E3S Web Conference (2020). https://doi.org/10. 1051/e3sconf/202018503021 32. O. Ronneberger, P. Fischer, T. Brox, U-Net: convolutional networks for biomedical image segmentation, in N. Navab, J. Hornegger, W. Wells, A. Frangi (eds.), Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015. MICCAI 2015. Lecture Notes in Computer Science, vol 9351 (Springer, Cham, 2015). https://doi.org/10.1007/978-3-31924574-4_28 33. D.P. Kingma, M. Welling, Auto-Encoding Variational Bayes. CoRR, abs/1312.6114 34. D. Avola, L. Cinque, A. Fagioli, S. Filetti, G. Grani, E. Rodol, Knowledge-driven learning via experts consult for thyroid nodule classification, 28 May 2020. arXiv:2005.14117v1 [eess. IV] 35. G. Huang, Z. Liu, L. Maaten, K. Weinberger, Densely Connected Convolutional Networks (Cornell University paper, 2018). arXiv:1608.06993v5
368
S. Baldota and C. Malathy
36. Y. Chai, J. Song, M. Shear, Artificial Intelligence for thyroid nodule ultrasound image analysis. Annals Thyroid 5 (2020). https://aot.amegroups.com/article/view/5429 37. K.C. Aishwarya, S. Gannamaneni, G. Gowda, A. Abhishiek, TIRADS classification of thyroid nodules: a pictorial review. Int. J. Med. Res. 4(2), 35–40 (2019). www.medicinesjournal.com
A Transfer Learning Approach Using Densely Connected Convolutional Network for Maize Leaf Diseases Classification Siddhant Baldota, Rubal Sharma, Nimisha Khaitan, and E. Poovammal
Abstract Cereal crops are always considered as one of the most important sources of proteins and carbohydrates to human beings. Cereal grains are consumed daily in any one form or another. Over the last decade, the production and demand of maize have shot up manifolds. As maize remains a very crucial crop for food, this research work looks at one of the profound reasons due to which maize crops might be destroyed. Generally, plant diseases will lead to crop failure most of the time. This research has been carried out on the PlantVillage dataset by using the densely connected convolutional neural networks. The PlantVillage dataset is a benchmark dataset that consists of three classes of diseases and one healthy class. Comparing to earlier proven methods on the same dataset, transfer learning on DenseNet121 has yielded better results by leveraging an accuracy of 98.45% on the test set accompanied with less storage space and low training time. The robustness of the model was proved, when it has yielded a validation set accuracy of about 94.79% and a test set accuracy of 91.49% despite being retrained with a jitter of 0.03 and lighting of 0.15, which will contribute to the noise added to the original dataset. This ability of DenseNet121 to perform well on noisy data increases the probability of the application of the proposed method in countries, where the operational costs are limited. Keywords DenseNet · Image classification · Maize plant disease · Transfer learning
1 Introduction Presently, the world has become more vulnerable to food security. A few factors influencing food security worldwide are the rise of environmental problems, natural calamities, crop failures, and so on. In the present scenario, crop failure is the paradigm that requires significant research attention in recent times. Failure or spoiling of crops occurs due to pests, plant diseases, excessive use of chemicals, etc. If S. Baldota · R. Sharma · N. Khaitan · E. Poovammal (B) Department of Computer Science and Engineering, SRM Institute of Science and Technology, 603203 Kattankulathur, Tamil Nadu, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_31
369
370
S. Baldota et al.
the plant diseases are identified and detected in the early stages, it can be curbed, and the crops can be saved. The growing technology promises better and efficient agriculture by incorporating emerging technologies like artificial intelligence and machine learning. Nowadays, neural networks are also finding their application in the detection of plant diseases. Deep learning models are applied to classify the images of infected plants. Most of these models use pretrained weights that are trained for large scale competitions such as ImageNet and COCO. They are fine-tuned to be fed with an input image, which is categorized within the defined classes as either healthy or one of the diseases. Cereals are specifically cultivated for their edible component that is grain. Cereals consist of a very rich source of minerals, carbohydrates, vitamins, and proteins. In developing countries, rice, maize, or millet will remain as a part of their daily meal. This research is narrowed down to maize crops. It is an important crop as both its silage and crop residue have heavy usage. Maize can be used to extract both oil and starch. It is used to make porridges and beer. Moreover, people eat it after roasting. Therefore, it is important to look at the maize diseases and a way to detect them. This work intends to use densely connected convolutional networks (DenseNet) for the image classification on maize leaves to identify the nature and type of the disease. Our dataset contains both healthy and disease images. So, the model does both the jobs at once of detecting the disease and classifying it. The aim is to test the models to check the optimal model which could convey the best classification parameters. ResNets have a heavy architecture (as they can go up to 1000 layers) taking up more space and time to execute. On the other hand, DenseNets have a significantly smaller architecture. DenseNet121 trained on ImageNet taking only up to 30 MB of space. DenseNet shows promising aspects that would be tested on the dataset, split in training, validation, and hold out test sets. To check the robustness of the model, some noise will be introduced to the images till a certain level to check whether the model can sustain in low-resolution images. Since the crop plays a major role as the daily food item in developing countries, there is a pressing need to check the indefatigable nature of the model. Jitter will be introduced in images to check the accuracy in such cases where low-quality and low-resolution images are expected from the farmers of developing nations. Throughout, this research strives to find the best model which fits the requirements for detection and classification of maize plant diseases.
2 Literature Review The study was initiated with some existing research work setting up the course for disease detection in plants. In general, various image processing, machine learning, and deep learning architectures were used to delineate plant diseases as given in Table 1. The leaf images were taken as they showed the symptoms of the disease clearly. In-depth knowledge was gathered about the plant diseases by Santhosh et al. in [1]. They have researched various diseases like rust which is common in maize
A Transfer Learning Approach Using Densely Connected …
371
Table 1 Comparison of related works Title
Objective
Methods
Conclusion
Analysis of artificial intelligence-based image classification techniques [6]
Fruit image classification
ML algorithms like SVM, RF, DA, and KNN
KNN gave the highest accuracy. Tuning K in KNN for test set show dependency on the training procedure
Visual tea leaf disease recognition using a convolutional neural network model [7]
Tea leaf disease classification
BOVW with ML classifiers vs DCNN from scratch
DCNN was better than ML models
Using deep learning for Plant disease image-based plant classification on disease detection [8] PlantVillage dataset
Transfer learning on DCNN models like AlexNet and GoogleNet
Accuracy using transfer learning was much higher than training from scratch
Maize leaf disease identification based on feature enhancement and DMS-robust AlexNet [3]
Maize disease classification
DMS-robust AlexNet Higher accuracy, but a lot of hyperparameter tuning needed to be done
Mellowness detection of dragon fruit using deep learning strategy [12]
Mellowness identification in dragon fruit images
VGGNet vs ResNet152
The performance of ResNet was better than VGGNet. ResNet152 is a large model
ResNet152, SE-ResNeXt101, Inception-v4, and DenseNet121
DenseNet121 gave higher accuracy
A large chest Chest radiograph radiograph dataset with classification uncertainty labels and expert comparison [14]
crops. This work elucidated the plant diseases which could be seen infecting the leaves such as kole roga, yellow leaf disease, leaf curl, and so on. Such diseases could be detected by analyzing the images of their leaves. Before approaching the stated problem, which deals with the crop called maize, it has been attempted to learn the facts about its historical perspective and challenges which one might encounter in the future. Sai et al. have conducted research specifically on maize plants to study in-depth about maize crops [2]. Their work stated the current facts about the production of maize throughout India and growth during recent decades. Moreover, their research explained the challenges which the crop faces and the use of new techniques to enhance production. Machine learning algorithms like Naïve Bayes, decision tree, and random forest to classify the maize leaf images were used by Kshyanaprava et al. [3]. These algorithms are easy to implement. They used a supervised learning approach. The dataset consisted of 3823 images with four classes including a healthy class. They subjected the images to preprocessing where images were converted to grayscale. Image segmentation and feature extraction were performed to extract the necessary features. All the algorithms were giving above 75% accuracy but random forest
372
S. Baldota et al.
yielded the best accuracy of 79.23%. Jitesh et al. carried out a detailed study of various varieties of rice infected by diseases [4]. After acquiring the dataset from a village in Gujarat, the images were subject to preprocessing where RGB was converted into HSV. K-means was used for clustering the images. They used color features for feature extraction. Support vector machines use a supervised learning approach and classify the data based on the class labels. Priyanka et al. classified the cereal crops like maize, rice, and wheat [5]. After examining the images of the crops, image preprocessing and segmentation was applied. They applied six classifiers to get the desired accuracy, but when all crops were combined, Naïve Bayes stood out and gave the accuracy of 90.97%. Shakya in [6] applied various machine learning (ML) models, namely k-nearest neighbor (KNN), support vector machine (SVM), random forest (RF), and discriminant analysis (DA) on a fruit dataset consisting of 300 images of fruits taken from Kaggle. The dataset had an 80:20 train–test split. Feature extraction on the data was performed by applying various preprocessing methods such as resizing the images into a*b picture elements, converting these resized images to grayscale, and edge detection of these grayscale images using Gaussian filter convolutions and canny edge function. The features extracted from these methods were combined with the color, textural, and spatial features. Ensemble learning was carried out on the machine learning models which acted as classifiers for the data. KNN achieved the best metrics which included the highest accuracy, specificity, and sensitivity of 93.103, 80, and 94.339%, respectively, among the classifiers. However, the use of machine learning models resulted in the need for manual feature selection. Moreover, the estimation of the K values in the KNN was done on basis of this feature extraction carried out during the training process. This indicated a high dependency of the proposed model on the training procedure. However, when used in real-time with different data distributions of fruit images, this model may result in low accuracy. Therefore, this justifies the need for using deep learning for image classification to automate the process of feature selection and reduction in the number of moving parts in the architecture. Jing et al. [7] proved this by proposing LeafNet, a deep convolutional network model to automate the process of feature extraction on a tea leaves diseases dataset consisting of high resolution 3810 images in six categories which were augmented using different transformation processes to result in a total of 7905 images which were distributed in a train: cross-validation: test split of 80:10:10. Comparison of classification was done with a bag of visual words (BOVW) model which was fed with DSIFT features and using SVM and multilayer perceptron (MLP) classifiers. Results hugely turned in favor of LeafNet which obtained an accuracy of 90.16% in comparison with SVM’s 60.62% and MLP’s 70.77%. Granted that the results obtained by LeafNet were high in comparison with machine learning algorithms, and the arduous process of feature extraction was eliminated. However, transfer learning on deep learning architectures pretrained on ImageNet seemed to give better results than training from scratch. Sharada et al. applied deep learning models and algorithms on the PlantVillage dataset [8]. They applied AlexNet and GoogleNet with transfer learning. They gradually kept decreasing the learning rate. Their model achieved the best accuracy of
A Transfer Learning Approach Using Densely Connected …
373
98.21%. Similarly, Sumita et al. applied deep convolutional neural network (DCNN) on the dataset which consisted of various categories of the corn plant. They used a pretrained DCNN model [9]. They developed the model using convolutional layers, max-pooling layers, activation, and dropout layers. The model’s optimized learning rate was 0.0004. The designed model gave an accuracy of 98.40%, and using mobile phone images, 88.6% was achieved. Mingjie et al. deployed DMS-robust AlexNet for the classification of seven categories of maize leaves which includes the healthy category too [10]. The data augmentation process like rotation, flipping, etc., was applied. Their method eliminates the need for selecting specific features. Their model was able to achieve more than 98%. However, a lot of hyperparameter tuning was needed to be done for the DMS-robust version of AlexNet. Choosing these hyperparameters was a long-drawn process, and it introduced unnecessary computation overheads. Krishnaswamy et al. [11] analyzed data from ten different diseases from four distinct crops such as eggplant, beans, okra, and lime from a field in Tamil Nadu, India. They performed transfer learning on VGG16, VGG19, GoogleNet, ResNet101, and DenseNet201. The dataset was augmented by adding transformations like rotations, flipping, and translation. For the validation data, GoogleNet gave the best accuracy of 97.30%. In real-time testing, VGG16 yielded the highest accuracy of 90.00%. One of the reasons for this is due to the variance in the distribution since there were four different crops. ResNets and DenseNets tend to perform better on a single crop having different types of diseases. This is justified in [12] where Vijaykumar et al. proposed a residual convolutional neural network ResNet152 for the identification of mellowness in dragon fruit image dataset, labeled by experts. The dataset had an 80:20 train–test split. The data was trained on the ResNet152 model for a total of 500 epochs. It was observed that the train and test losses did not increase, resulting in higher accuracies in comparison with VGG16 and VGG19. It was claimed that ResNet152 gave very good results, yielding an area under the ROC curve of 1.0. The drawback of using ResNet152 was that it was a large model requiring large storage capacity. Moreover, the depth of the model was 152 layers which meant that the number of parameters of the models was large, resulting in higher computation costs. Gao et al. [13] proposed the convolutional network which has direct connections between any two-layers having the same size as the feature map. With growing parameters, its accuracy improves. They named it dense convolutional network (DenseNet). Tao T et al. introduced a novel approach by using transfer learning on top of DenseNet [2]. They called it sequential fine-tuning. This model achieves better accuracy than the traditional one. After applying this model, they got high accuracy on their dataset classification. Irvin et al. [14] proved the efficacy of using DenseNet121 over conventional CNNs for transfer learning. The dataset used consisted of 2, 24, and 316 chest radiology images classified for the presence, absence, and uncertainty of 14 types of diseases. Deep learning models such as ResNet152, SE-ResNeXt101, Inception-v4, and DenseNet121 were used as experimental transfer learning baseline architectures. DenseNet121 yielded the highest accuracy in these tests and therefore was trained with a learning rate of 0.0001, a momentum of 0.9, a decay of 0.99, a batch size of
374
S. Baldota et al.
16, for three epochs using an Adam optimizer. The details of training and the results of Chexpert are outside the scope of this paper since the need has been simplified to prove that ResNet121 is optimal for performing transfer learning on image classification and not actually perform the chest radiography classification. The survey of the existing literature leads us to conclude that the DenseNet121 should be used as the baseline model for transfer learning on maize leaf images.
3 Dataset The dataset used in this paper is a part of a dataset obtained from Kaggle, which consisted of greater than 50,000 curated images from experts. Speaking of numbers, this dataset consists of a total of 3858 images comprising 3090 training images and 384 validation and test set images each. The class balancing is applied by using a costbased approach to account for the disparity in the number of lead images per class. There are four categories of images including the healthy category. The first disease is known as gray leaf spot which is a fungal disease caused due to warm and humid weather. Initially, spots tan in color appears on the leaves which eventually turn gray when exposed to rain. The second disease is also a fungal disease called common rust that usually occurs during June. Chlorotic flecks are one of the early visible symptoms signifying common rust. Like gray leaf spot, northern leaf blight also occurs during damp and humid environments. The suitable temperature for northern leaf blight is 18–27 °C, and it is one of the easiest diseases to identify because it forms distinct cigar-shaped lesions on the leaves.
4 Methods a.
Preprocessing
The dataset used consists of 3090 training images, 384 validation set images, and 384 test set images for four categories of maize leaves including a healthy class. Train and validation loaders are set up, each having a batch size of 50. Various transformation methods are implemented on the training images to test the robustness of the proposed model and augment the training data. The first method is to try with normal transformations which include horizontal and vertical flipping of images, with a zooming up of the image up to a maximum factor of 1.1 along with a maximum distortion factor of 0.2. A random rotation up to a maximum of 10 degrees is performed on the images, both clockwise and counter-clockwise. Each image was resized to a size of 256 × 256. These transformations were part of the standard transformation path to test the general validation and test accuracy, error rate, and validation loss of the model used on these images. To verify the robustness, a noisy transformation path consisting of vertical flipping, a maximum rotation range of 15 degrees on either side, a lighting
A Transfer Learning Approach Using Densely Connected …
375
effect of the magnitude of 0.15 on a scale of 1 is included. While the distortion factor remained 0.2, jitter is introduced of magnitude 0.03 to the images. Jitter changes the pixels of an image with random replacement with pixels in the vicinity. The degree of the vicinity used to replace the pixels is defined by the magnitude of the jitter. Introducing jitter to the transformation process is a noticeable change from the previous transformation path. Jitter introduces a corruption in the image signal ratio, which makes the images noisy, thereby replicating the distribution of data which is likely to be available in places where image resolution for training is low as shown in Fig. 1. b.
Model
The deep convoluted neural network architectures resolve the issue of crops suffering from vanishing gradients. The gradients have to traverse such a long path that they tend to disappear before reaching the layers near the input layer, thus making the training process an almost impossible tasks after a point of time. Residual networks or ResNets were introduced to solve this problem by creating an identity path, thus adding the identity factor to the weights. However, as it turned out, ResNet models tended to be very deep too, going past 1000 hidden layers at times. To solve this problem, DenseNets were introduced. They provide a simplified solution, by connecting each layer directly to the other. This inflated the gradient flow to such an extent that maximum information was passed, reducing the need for very wide or very wide models. The feature maps extracted from other models were rendered superfluous because the model required lesser parameters than even ResNet. Rather,
Fig. 1 A batch of maize leaf images from the noisy transformation path
376
S. Baldota et al.
DenseNet relies more on the reusability of weights and features. Another significant difference between ResNets and DenseNets was that ResNets were significantly wider than DenseNets, wherein each layer can access the penalized weights from the loss function. DenseNet applies a composite of operations. Instead of summing up the feature maps, it concatenates them. DenseNets are partitioned into dense blocks. The sizes of the maps remain constant in the block. The changes occur in the number of filters applied between the blocks. DenseNet121 model is applied for our test case, where 121 indicate the depth of the DenseNet model. Transfer learning is performed on DenseNet121 pretrained from ImageNet weights. A striking advantage of using DenseNet121 is its compact size of 30 MB as compared to DenseNet161 whose size exceeds 100 MB. Torchvision models offered by PyTorch are used for transfer learning on DenseNet121. c.
Training
The training was carried out on Google Colaboratory that uses an Nvidia K80 GPU and offers up to 12 GB of GPU memory with a memory clock of 0.82 GHz, a performance of 4.1 TFLOPS, an available RAM of 12 GB, and disk space of 358 GB. At first, transfer learning is carried out on DenseNet using the data from the standard transformation path (without jitter). One of the first steps of training is to find an optimum learning rate that minimizes the validation loss. To do this, the model is trained with a changing learning rate and calculated the loss and learning rate curve, where the learning rate was the independent parameter and loss was the dependent parameter. The learning rate was set as the minimum numerical gradient (2.75 e-04 or 0.000275) as shown in Fig. 2. The training was carried out for a total of 25 epochs. It was in two parts of 5 epochs and 20 epochs in that order. The average training time per epoch was 38 s. Once the error rate and the validation loss had become stagnant, training was stopped. Note that the epoch rate is very low because the loss had stopped decreasing and the accuracy had stopped increasing even after training for 5 epochs. Moreover, the number of steps per epoch was high. So, it has been concluded that the epoch rate was adequate. The training procedure for DenseNet121 on the data subjected through the second transformation path was a bit different in terms of the number of epochs and hyperparameter optimization. The purpose of the second transformation was to check whether the DenseNet121 model works as well when fed on high noise and lower quality data. From Fig. 3, it is noticed that the optimal learning rate required which was computed from the minimum numerical gradient had changed a little bit from 2.75 to 2.29 e-04 . Then, the model was trained for 5 epochs. Apart from the first epoch which took 94 s, the rest four were done in half the time each. The model was trained for 25 more epochs to reduce the validation loss obtained after the five epoch training process. However, this was not a standard 25 epochs training process. The model was trained for 5 epochs in an iterative process five times. The catch was that the optimal learning rate was calculated and recalculated every five epochs. Table 2 shows the different learning rate, by computing the minimum
A Transfer Learning Approach Using Densely Connected …
377
Fig. 2 Finding the near-perfect learning rate for DenseNet121 trained on the standard transformation path
Fig. 3 Finding the near-perfect learning rate for DenseNet121 trained on the noisy transformation path Table 2 Optimum learning rate for epochs Range of epochs Learning rate
∗10−4
0–4
5–9
10–14
15–19
20–24
25–29
2.9
0.015
0.029
0.0063
0.0063
0.0075
378
S. Baldota et al.
numerical gradient for every fifth epoch. The average training time per epoch was maintained at 47 s.
5 Results Over the 25 epochs of training, there are a very few exploding gradients and almost no vanishing gradients. Figure 4 plots the training and validation loss of the DenseNet121 model over 25 epochs of training on the standard transformation data. The validation accuracy remains within 90–95% for the first three epochs. After epoch 10, it does not become lower than 96%. The validation accuracy versus epoch curve is plotted in Fig. 5. On training the model on the standard transformation data (non-jitter data), a test time augmentation (TTA) accuracy on the validation set of 97.66% was obtained as given in Table 3. A recall of 0.9765, a precision of 0.9772, and an F1 score of 0.9767 were obtained on the validation set images of non-jitter data. The accuracy on the test set which also had 384 images (non-jitter) was 98.4536%. In other words, only 6 examples out of 384 were misclassified. The precision, recall, and F1 score for these models were 0.984811, 0.98453, and 0.98460, respectively, as given in Table 3. The confusion matrix for the test set was plotted as shown in Fig. 6. For the noisy data, DenseNet121 is retrained, with the same approach of transfer learning. The model is trained for 30 epochs tuning the learning rate every five epochs as given in Table 2. Test time augmentation (TTA) accuracy on the validation set of 94.79% is given in Table 3. A recall of 0.94791, a precision of 0.94868, and an F1 score of 0.948260 were obtained on the validation set images for DenseNet121
Fig. 4 DenseNet121: training on standard transformation data
A Transfer Learning Approach Using Densely Connected …
379
Fig. 5 Validation accuracy versus epoch curve for DenseNet on standard transformation data
Table 3 Evaluation of DenseNet
Evaluation metrics
Standard transformation
Noisy transformation
TTA accuracy on the validation set
97.66%
94.79%
F-1 score on the validation set
0.9767
0.9482
Accuracy on the test set
98.45%
91.49%
F-1 score on the test set
0.9846
0.9141
trained on jitter data. The metrics plotted 98.9 in Fig. 7 show that the validation accuracy oscillates between 92 and 93% for the larger part of the 30 epochs. The accuracy on the test set which also had 384 images was 91.49%. In other words, 33 samples out of 384 were misclassified. The precision, recall, and F1 score for these models were 0.913470, 0.914948, and 0.9141031, respectively, as given in Table 3. The confusion matrix for the test set was plotted as shown in Fig. 8.
6 Conclusion The DenseNet121 is implemented successfully to infer the model that yielded good results. DenseNet121 was chosen as baseline model because of its low storage requirements and densely connected architecture, which reuses all feature maps instead of generating new ones. Initially, preprocessing was applied to the dataset,
380
S. Baldota et al.
Fig. 6 Confusion matrix plot for DenseNet121 on standard transformation data
Fig. 7 Validation accuracy versus epoch curve for DenseNet121 on noisy data
and later, it was fed to the model for training. DenseNet121 gave nearly perfect accuracy for all four classes. The learning rate was optimized for various ranges of epochs. An accuracy of 98.45% was obtained on the test set. To check the robust nature of the model, jitter (noise) was added to some extent on the dataset images. The proposed model yielded an accuracy of more than 91%. This result proved that even with noise and with low-resolution images, the model classified quite well. It was concluded that the proposed model was able to achieve near human-level performance. In the future, research may be furthered by using a much larger dataset with
A Transfer Learning Approach Using Densely Connected …
381
Fig. 8 Confusion matrix plot for DenseNet121 on noisy data
more categories and test more models on the datasets to find the optimal solution to the problem which exists in the developing nations.
References 1. S. Kumar, B. Raghavendra, Disease detection of various plant leaf using image processing techniques. in International Conference on Advanced Computing and Communication Systems(2019). https://doi.org/10.1109/ICCUBEA.2015.153 2. T. Tan, Z. Li, H. Liu, F. Zanjani, Q. Ouyang, Y. Tang, Z. Hu, Q. Li, Optimize transfer learning for lung diseases in bronchoscopy using new concept: sequential Fine-Tuning. IEEE J. Transitional Eng. Health Med. (2018).https://doi.org/10.1109/JTEHM.2018.2865787 3. K. Panigrahi, H. Das, A. Sahoo, S. Moharana, Maize Leaf detection and classification using machine learning algorithms. Advances in intelligent systems and computing, vol. 1119 (2020).https://doi.org/10.1007/978-981-15-2414-1_66 4. J. Shah, H. Prajapati, V. Dabhi, A survey on detection and classification of rice plant diseases. in IEEE International Conference (2016). https://doi.org/10.1109/ICCTAC.2016.7567333 5. P. Thakur, P. Aggarwal, M. Juneja, Contagious disease detection in cereals crops and classification as solid or undesirable: an application of pattern recognition, image processing and machine learning algorithms. Int. J. Eng. Technol. (2018). https://doi.org/10.14419/ijet.v7i1. 2.9043 6. S. Shakya, Analysis of artificial intelligence based image classification techniques. J. Innov. Image Process. 2, 44–54 (2020). https://doi.org/10.36548/jiip.2020.1.005 7. J. Chen, L. Qi, I. Gao, Visual tea leaf disease recognition using a convolutional neural network model. Symmetry 11, 343 (2019). https://doi.org/10.3390/sym11030343 8. S. Mohanty, D. Hughes, M. Salathe, Using deep learning for image-based plant disease detection. Front. Plant Sci. (2016). https://doi.org/10.3389/flps.2016.01419
382
S. Baldota et al.
9. S. Mishra, R. Sachan, D. Rajpal, Deep convolutional neural network based detection system for real-time corn plant disease recognition. in International conference on Computational Intelligence and Data Science (2019). https://doi.org/10.1016/j.procs.2020.03.236 10. M. Lv, G. Zhou, M. He, A. Chen, W. Zhang, Y. Hu, Maize leaf disease identification based on feature enhancement and DMS-robust AlexNet. IEEE Access (2020).https://doi.org/10.1109/ ACCESS.2020.2982443 11. R. Krishnaswamy, R.P. Aravind, Automated disease classification in (Selected) agricultural crops using transfer learning. Automatika. 61, 260–272 (2020). https://doi.org/10.1080/000 51144.2020.1728911 12. T. Vijaykumar, R. Vinothkanna, Mellowness detection of dragon fruit using deep learning strategy. J. Innov. Image Process. 2, 35–43 (2020). https://doi.org/10.36548/jiip.2020.1.004 13. G. Huang, Z. Liu, L. Maaten, K. Weinberger, Densely Connected Convolutional Networks. Cornell University paper (2018). arXiv:1608.06993v5 14. J. Irvin et al., CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison. Proc. AAAI Conf. Artif. Intell. 33, 590–597 (2019). https://doi.org/10.1609/aaai. v33i01.3301590
Predicting Embryo Viability to Improve the Success Rate of Implantation in IVF Procedure: An AI-Based Prospective Cohort Study Dhruvilsinh Jhala, Sumantra Ghosh, Aaditya Pathak, and Deepti Barhate
Abstract In general, infertility affects one in the seven couples across the globe. Therefore, an innovative and beneficial procedure is used to fertilize an egg outside the human body with the help of in vitro fertilization (IVF) procedure. IVF is considered as the most common procedure, as it accounts for 99% of the infertility procedures. From being the most widely used procedure, its success rate for women under 35 is 39.6%, and above 40 is 11.5% depending on the factors like age, previous pregnancy, previous miscarriages, BMI and lifestyle. However, human embryos are complex by nature, and some aspects of their development are still remaining as a mystery for biologists. Embryologists will subjectively evaluate an embryo and its efficiency by making their observations manually during the embryo division process. Since these embryos are dividing rapidly, the manual evaluations are more prone to error. This paper gives a brief explanation and insights into the topic of evaluation and the success rate prediction by using artificial intelligence techniques. Keywords In vitro fertilization (IVF) · Embryo · Blastocyst · Artificial intelligence · Machine learning · Deep learning · Infertility
1 Introduction Infertility is a reproductive system disease that has been described as failures in clinical pregnancy after 12 months or more of regular unprotected sexual intercourse according to the World Health Organization (WHO). Infertility has been remaining as the fifth most severe disability in the world for women under 60. The Department of Economic and Social Affairs of the United Nations reports that the global average fertility rate today is just less than 2.5 children per woman. The average rate of fertility has halved in the past 50 years, and over the course of social modernisation, the number of children per wife has also decreased dramatically. In the pre-modern period, fertility rates were normal between 4.5 and 7 children per woman [1]. This D. Jhala (B) · S. Ghosh · A. Pathak · D. Barhate Narsee Monjee Institute of Management Studies, Mumbai, Maharashtra, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_32
383
384
D. Jhala et al.
Fig. 1 (1950–2015) infertility stats by WHO [2]
is why in vitro fertilization is widely used today to overcome this challenge. In vitro fertilization (IVF) is a complex set of procedures that are used to help with pregnancy or avoid genetic defects and to assist with a child’s conception (Fig. 1). In vitro fertilization techniques have improved significantly since the first baby was born in 1978. But the research related to test-tube babies started much before. In the 1800s, Dr Sims at the Women’s Hospital, New York, conducted a new intrauterine insemination as soon as the idea of reproduction became apparent. The first donor insemination was then done in 1884 by Dr William Pancoast in Philadelphia. This was a beginning to one of the first techniques related to IVF. In the 1900s, first clinic was open in Massachusetts, and after that, in a short span of time, IVF has progressed more than any other medical area [3]. In the early period of the IVF, multiple embryos per cycle were the principal strategy for increasing IVF success. Over the years, the transfer of many embryos has increased the success rates of IVF at the risk of increased multiple pregnancies. Increased numbers and complications of multiple births lead society to apply policies limiting the number of embryos being transferred. The key realistic strategy for maximizing the effects of the IVF was therefore to pick embryos with the highest transfer potential [4]. To identify the best embryo for the process, some methods are used to automate this selection saving both time and money. The methods used for this analysis are the usage of machine learning algorithms on the day 5 image of embryos after implantation of sperm [5]; another is to extract the features from the image to find the timing of pronuclear breakdown (PNB) [6]. In recent years, neural networks are used to improve the chances of pregnancy by separating the good embryos that have more possibility of cousin pregnancy from the bad ones [7]. Another emerging trend
Predicting Embryo Viability to Improve the Success …
385
is to upload the embryo image directly into an online interface, and the online system grades the embryo image into good, ok and bad. This is an automated system which takes into consideration 24 major embryo characteristics while grading. This grading is done passing the image into three different Artificial Neural Network (ANN) [8]. The grading done by the software and professional are ought to have an agreement of 22%, which is very low. The online grading software can be made more efficient by [9] scaling the image as the size of image taken from smartphone is smaller than the one manually analysed, and using this method, the agreement between system and professional jumps to 85.7% while the online system is able to successfully segment 77.8% of the overall embryo images. During the entire time from egg retrieval to transferring selected embryos to the woman’s uterus for various examinations and monitoring procedures, the specimen is transferred to or from the IVF incubator. The embryo is in the incubator for about 5 days before it is transferred to the womb. The incubation stage is therefore crucial to the success of the IVF [10]. Chung et al. [11] shows the use of the digital microfluid (DMF) device. In this study, the DMF method has been demonstrated to be biologically consistent and is based on the scattered droplet type for the use of in vitro mouse gametes and embryo cultures. The IVF fertilization rate in DMF was found to be 34.8%, with around 25% inseminated embryos grown on DMF chips growing up into an eight-cell process. Tzeng et al. [12] helps us understand incubation devices, oocytes-zonal removal, viscosity and overall transfer channel structure.
2 Stages in IVF Procedure See Fig. 2. Step 1: Control Ovarian Hyperstimulation (COH) In this step, the gonadotropin hormone generation is controlled to prevent premature ovulation. Once suppressed fully, ovulation is achieved under a controlled environment by monitoring the gonadotropin injections. Ultrasound and hormone evaluation is done until the eggs obtain the optimum size.
Fig. 2 Stages in IVF procedure
386
D. Jhala et al.
Step 2: Egg Retrieval Eggs are retrieved from patients using surgery. Ovarian follicles are sucked using a transvaginal ultrasound-guided instrument. The embryologist scans the follicular fluids to find all viable eggs. In recent years, assistant robots are used to retrieve the eggs more safely from the patients. An auto egg detection algorithm is also used in this auto asset to find the location of the egg more accurately [13]. Step 3: Fertilization and Embryo Culture The parameters of sperm are checked in this stage. If the parameters are normal, then 50,000–100,000 motile sperm are transferred. This process is called standard insemination. If the sperm parameters are abnormal, then a technique called intracytoplasmic sperm injection (ICSI) is used for sperm transplants in which the transfer of sperm is done by a specialist under a high-powered microscope. The spermatozoa are directly inserted into egg cytoplasm. Nowadays, research is being conducted to automate the ICSI process using logical regression and neural networks. The research takes the patient characteristics into consideration while performing regression or creating neural networks. The ICSI outcome predicted by logical regression is 75% and neural network is 79.5% [14]. Step 4: Embryo Quality There are many criteria for classifying the quality of embryos. The embryos are examined by the embryologist early morning on the day of the transfer. The embryos are transferred usually at either on day 3 (cleavage stage) or day 5 (blastocyst stage). 5 Day Blastocyst Process • Blastocyst initial stage–Normally, fertilized embryos show two pronuclei (PNs) in their centre. • Blastocyst stage–Most of the embryos will now have divided and will ideally be 2–4 cells. Around 98% of embryos which have shown normal fertilization on day 1 will divide and continue to grow. • Complete blastocyst–Embryos made another division from the previous day and to ideally have 6–8 cells. • Prolonged blastocyst–The embryos have more than 8–10 cells and have started to compact, and this is the morula stage (Cells merging together). • Blastocyst hatched–On day 5, a proportion of the embryos becomes blastocysts (Fig. 3). Step 5: Embryo Transfer Embryo transfer takes place either on day 3 or day 5 depending on the previous step. This is an easy step that does not require the patient to use anaesthesia. In this stage, embryos are inserted into a soft catheter and are put under ultrasonic supervision in the uterus of the patient via the cervix.
Predicting Embryo Viability to Improve the Success …
387
Fig. 3 Blastocyst transition [15]
Step 6: Blood Test You will have a blood test to assess the level of hormone human chorionic gonadotropin (hCG) approximately 10–15 days after the embryo transmission. hCG in your bloodstream usually means a positive pregnancy test.
3 Embryo Morphology and Grading System It is been more than 40 years’ time since IVF started and since then all fields of assisted reproductive technology had very dynamic developments. There have been rapid improvements in the embryology methods, especially for pre-implantation embryo assessments. After the eggs have been infused with the sperm, they are in process to become embryos. During this time, they are closely watched. The selection of the best embryos starts when the eggs and the sperm have matured for about 2 or 3 days. The embryologist will inspect the embryos carefully and select the healthiest ones. Therefore, the embryo selection procedure is enhanced using a variety of different approaches. Precise tests of embryos in several days following IVF make it possible to pick the most effective embryos for transfer. This increases the IVF procedure’s success rates. The insemination of the best transfer embryo also decreases the frequency of multifetal pregnancies transferred. There are certain features on which the evaluation depends on following:
388
• • • • • • • •
D. Jhala et al.
Early cleavage. Polar body structure and placement. Blastomeres dimensions, orientation and fragmentation. Appearance of zona pellucida and cytoplasm. Pronuclear morphology. Number of blastomeres in particular days of culture. Compaction and expansion of blastomeres. Number of nuclei in each blastomere.
3.1 Cleaved Embryos Scoring This embryo evaluation is made approximately 1 day after insemination when the presence of first cleavage, blastomere consistency and the degree of fragmentation is investigated. Healthy embryos consist of two symmetrical blastomeres with little or insignificant fragmentation and should have at least four cells on day 2 and at least eight cells on day 3 of culture (Fig. 4). If there are many blastomeres and little or small fragmentation in the good quality embryo, the good blastomere number on day two is 4–6 and on day three 8–12. For example, 4A1 on the day 1 and 8A1 on day 3 are the best-performing embryos [16].
Fig. 4 Cleaving embryo scoring [16]
Predicting Embryo Viability to Improve the Success …
389
3.2 Blastocyst Based Scoring Before being moved back into the uterus, blastocyst represents the ultimate stages of clinical embryonic culture and is thus the final step of morphological evaluation. The blastocyst is characterized by cavitation and blastocoel formation that correlate with the cell distinguishment between internal cell mass (ICM) and trophectoderm (TE). Any static morphological assessment is poor because blastocyst which is highly complex, and the assessment can be dramatically altered even in a short interval of time [17]. Despite the complexity of these and other grading systems, no convention exists for assigning numeric scores or for what represents a high or a low grade [18].
4 Classification of Embryos Using AI See Table 1.
4.1 Single-Day Embryo Structure-Based Classification There are certain ways of using artificial intelligence (AI)-based models to predict human embryo viability. One important method is to use the static images captured from an optical light microscope which works on cell distinguishment between internal cell mass (ICM) and trophectoderm (TE). VerMilyea et al. [19] performed image analysis on single static images of day 5 blastocysts using the life whisperer model based on deep learning and got an accuracy of 64.3%. The mentioned endpoint is pregnancy as measured by foetal heartbeat, and this does not mean the possibility of a living pregnancy and is limited to day 5 itself. A more accurate solution was given by [4, 33]. They used deep convolutional neural networks for segmentation of blastocyst and implantation prediction accuracy. The architecture used was compact contextualize calibrate (C3) and GoogleNet Inception v3 with mean accuracies of 70.9% and 89.01%, respectively, with only a single blastocyst image. In their research, Moradi et al. proposed a slow-fusion strategy to learn cross-modality features, and Bormann et al. replaced the final classification layer by retraining it with a dataset of 1282 embryo images captured at 115 h post-insemination. Chen et al. used approximately 1.7 lakh images from day 5 and day 6 to create a CNN model with ResNet50 architecture. The output was classification of inner cell mass, blastocyst, and trophectoderm quality using a three-point grading system. For all three grading groups, the results showed an average predictive accuracy of 75.36%, i.e. 96.24% for blastocyst growth, 84.42% for Trophectoderm and 91.07% for ICM [42]. Similar type of research was done by [32], and it was based on a set of Levenberg–Marquardt’s neural networks and uses binary patterns present in the
390
D. Jhala et al.
Table 1 Image classification embryo datasets Reference
Data type
Data samples
AI method used
Accuracy
VerMilyea et al. [19]
Day 5
8886 images
Life whisperer
64.3%
Arsal et al. [20]
Time lapse
38 embryos
conditional random field (CRF), bag of visual words (BoVW)
96.79%
Sammali [21]
Ultrasound strain imaging
16 patients
SVM, KNN and GMM
93.8%
Sujata et al. [22]
Time lapse
535 images
Convolutional neural network
87.5%
Moradi et al. [4]
Day 5
578 blastocyst images
Deep convolutional neural networks (DCNNs), dense progressive sub-pixel upsampling (DPSU)
70 0.9%
Patil et al. [23]
Day 3
–
Hessian-based ridge detector and Hough circle transform
–
Kheradmand [24] Day 5
8460 images augmented from 235 images
Fully Convolutional Networks (FCNs)
95.6%
Amarjot Singh [25]
Day 1 and 2
40 embryo images
Hoffman modulation contrast (HMC), Hessian-based edge detector
80%
Zhang et al. [26]
Patient details
11,190 patients
Clustering and SVM in each cluster
70%
Khayrul Bashar et al. [5]
Timelapse
10 embryos
Supervised classifier using normalized cross-correlation
Mölder et al. [6]
Time lapse
20 embryos
Hough detection
95%
Habibie et al. [27]
Time lapse
68 images
Particle swarm optimization (PSO)-based Hough transform
6.24% error
(continued)
Predicting Embryo Viability to Improve the Success …
391
Table 1 (continued) Reference
Data type
Data samples
AI method used
Accuracy
Uyar et al. [28]
Day 5
7745 embryo images
3 Bayesian networks
63.5% TPR & 33.8% FNR
Kragh et al. [29]
Time lapse
6957 images
CNN and RNN
65.5% and 72.7%
Khosravi et al. [30]
Time lapse
50,000 images
Google’s inception and DNN
98%
Uyar et al. [31]
Day 2–3
3898 images
Naive Bayes, KNN, decision tree, SVM, MLP and RBF
80.4%
Manna et al. [32] Day 2–3
269 images
Neural Network and binary patterns
AUC = 0.8
Bormann et al. [33]
182 images
CNN
89.01%
Chen et al. (2019) Day 5–6
1.7 lakh images
CNN
75.36%
Wang et al. [34]
Time lapse
389 videos
Three-level classification
87.92%
Chung et al. [11]
Day 3
–
Digital microfluidic (DMF) system
25%
Storr et al. [7]
Time lapse and day 5
380 images
General estimated 8 parameters, regression models, AUC 0.748 multivariable regression model
Kotze et al. [35]
Day 3 image, patient details
770 image
GES-score plus sHLA-G expression
~ 60.0%
Chiang et al. [36] day 3, patient details
47 patients
Ultrasound, basal uterine perfusion, colour
Increase in uterine (best factor)
Durairaj et al. [31]
Patient details
250 patients
ANN
73%
Uyar et al. [37]
Patient details
2453 records
KNN, SVM, NB(0.739) and decision tree (DT), RBF(0.712) naive Bayes, MLP and RBF
Azmoudeh et al. [38]
Patient details
160 patients
Anti-mullerian hormone (AMH)
Day 5
– (continued)
392
D. Jhala et al.
Table 1 (continued) Reference
Data type
Data samples
AI method used
Accuracy
Qiu et al. [39]
Patient details
7188 records
SVM, logistic regression, XGBoost and random forest
0.70 ± 0.003
Fábio et al. [9]
Mobile-captured image evaluation
18 embryo images, mobile captures images
Blast3Q, image segmentation
87.5%
Bertogna et al. [8]
Mobile-captured image evaluation
18 images, mobile captures images
Blast3Q, Image segmentation
22% common result
Gowramma et al. Patient details [40]
1,21,744 records
Data analysis
–
Durairaj et al. [41]
27 test factors
Clustering, ANN
90%
Patient details
images. Area under the curve (AUC) for this study was found to be approximately 0.8 which is good enough even after using a small dataset of 269 embryos. The selection of a model for evaluation of data is an important part of the process, but the pre-processing and preparation of data for the selected model is as important as the model. Uchida et al. [43] discusses the selection and optimization of CNN and image filter so as to improve the accuracy. They perform experiments using four different models: no filter, all filter, random filter and joint optimization. The CNN model used consists of LGG-Net. The results of the experiment convey that redundant image pre-processing can lead to adverse results, but selection of appropriate can improve the output. The accuracy obtained is 78% through joint optimization which is 9% more than using no filter. Similarly, instead of using different filters, [37] using different models to compare embryos to get the highest chance of pregnancy. The six different algorithms used were naive Bayes, K-nearest neighbour, decision tree, support vector machine, multi-layer perceptron and radial basis function network. The best result was given by naive Bayes with accuracy of 80.4%. Uyar et al. [44] used only SVM for classification upon 546 embryos and got an accuracy of 82.7% Kotze et al. followed a different approach in his study by using GES-score. He compared selecting embryos examining day 3 images of IVF V/S GES-score plus sHLA-G expression and the effect of both the methods on implantation, miscarriage and pregnancy. The research found that GES-score plus sHLA-G expression gave less chance of miscarriages but implantation rates did not differ much [35].
Predicting Embryo Viability to Improve the Success …
393
4.2 Time Series Embryo Structure Based Classification Another approach to evaluate the embryo quality is to observe the embryo as it develops through day 1 to day 5 and then use this observation to predict if embryo implantation will result in successful pregnancy. This approach gives more favourable results as embryo day 5 images have constraints in image capturing like timing and light intensity. Thus, a time-lapse image gives better results. Patil et al. had similar thoughts when they proposed a CNN model that took 350 images scaled to 256 × 256 pixels [22]. The accuracy of 87.5% is observed in initial stages. Similarly, RNN and CNN models were independently trained using 6957 time-lapse data with a single ICM and TE annotation each, with precision levels of 65.5% and 72.7%, respectively, [29]. A slightly varied approach was by Liu et al. where they propose a multi-tasking deep learning with dynamic programming model for auto classification embryo in time-lapse video [26]. They discuss CNN (ResNet) with one to one, one to many and many to one approval of model. The results received recommend one to many as many to many is computation heavy with accuracy of 86%. [45] discusses TLI and its use in embryo assessment. It also discusses a predictive algorithm for TLI, and it gives accuracy as specificity of 84.7%, sensitivity of 38.0%, PPV 54.7% and NPV 73.7%. Another different research direction which was discussed in [30] was a model named STORK and was created based on Google’s inception model which is basically composed of deep neural networks. Approximately, 50,000 time-lapse images of embryos were passed through this DNN to find high-quality embryos. STORK can predict embryo development cycle with an accuracy of more than 98%. They further improved their system by including a decision tree into STORK which helps the system to predict likelihood of pregnancy based on age and embryo quality. In the paper [34], the three-level classification is discussed on time-lapse video of embryo development to classify the embryo. Tracking of an individual embryo cell is avoided in this research, and the classifier can easily adapt to image modularities and the problem of mitosis detection. The accuracy obtained by using multi-level classifiers was 87.92% by using 389 embryo development videos. Similarly, Uyar et al. correctly predicted the development cycle of embryos using three different Bayesian networks [31]. The first network layer had day 5 embryo image scores derived from day 3 data of embryo, the second layer contained the characteristics of the patient along with day 5 score and the third layer combined all the data collected from day 1, 2 and 3 into day 5 score. Tenfold cross-validation was used to remove the bias from 7745 embryo images along with patient records including embryo morphological characteristics. The resulting experimental data produced 63.5% true positive rate (TPR) and 33.8% false negative rate (FNR) while predicting blastocyst development. Storr et al. used 380 images of day 5 blastocysts to evaluate whether time-lapse parameters contributed any major difference while predicting the quality of embryo. Embryo morphology evaluation was done on day 2, day 3 for symmetry fragmentation percentage and blastomeres numbers. Embryos were graded on day 5 on the
394
D. Jhala et al.
basis of blastocoel cavity expansion and the number and unity of both the trophectoderm and cells inner cell mass (ICM). Eight important factors were identified using single and time variate regression models which are time durations for: division into 5-cell and then division into 8-cell, division into a 6-cell embryo and cleavage into an 8-cell embryo, full compaction, initial blastulation signs and complete blastocyst [7]. Nogueira et al. [46] used 394 time-lapse images of the human embryo taken at 110hpi, rated for inner cell mass (ICM), trophectoderm and expansion, with overall accuracy of 78.8, 78.3 and 81.5%, respectively.
4.3 Classification Based on Cell Count (Ellipse Detection) As discussed earlier in the embryo observation process, owing to the fact of the dividing natural state of embryos, it is vulnerable to error and inaccuracy resulting from human misinterpretation. The cell number is the most significant predictor of future growth, as it can directly demonstrate the ability of an embryo to progress cell cycles. Studies show that the developmental ability of embryos with lower or higher cell numbers was substantially reduced. Day 3 human embryos with good potential for growth can typically mature into the 7–8 cell stage. Thus, by using circle or ellipse detection techniques, the number of cells can be counted (Fig. 5). Arsa et al. used 25 patients and 38 pieces of embryo images to give an effective method to solve the prediction problem with the help of SVM, bag of visual words (BoVW) and conditional random field (CRF). They achieved an average accuracy of 80% [20]. Another study reports strategies for enhancing embryonic identification. Hough circle transformation is used to detect the presence and orientation of embryos within the acquired boundaries [23]. This method is highly accurate to get the number of cells in the blastomere. A different approach was proposed, based on the modifying of Hough transform using particle swarm optimization (PSO), to approximate the embryo as the circle. The increasing PSO particles are a period in parameter space
Fig. 5 Ellipse detection and verification [23]
Predicting Embryo Viability to Improve the Success …
395
and are used primarily to minimize Hough transform’s computational complexity [27]. Embryo image analysis is a critical step in IVF, but because of the sensitivity of the embryo, the image taken is in an unstable environment, amounting to the problem of high noise in the image. To solve this issue, the extraction of features is used by various researchers to extract essential components. One method is to extract only two features, i.e. nuclear numbers and nuclear volumes in a sequential manner [5]. Automated methods were used in this method to obtain nuclear numbers while segmentation was used by the researchers to obtain nuclear volume. Constant increase of these two features during the embryo development cycle was taken as indication of a healthy embryo. The extraction of images without high noise is a difficult process so another different approach to analyse images is to study the embryo images in high levels of noise. This was done by researchers by extracting the images in circular format using Hough transformation [6]. Although this process was done under noisy conditions, the accuracy of locating embryo position was 92.9%. Hough transformation was used in this method to detect the timing of pronuclear breakdown (PNB) with 83.0% accuracy. The shape of cell formation in the embryo is directly related to its quality and vitality. Jonaitis et al. [47] presents an idea for the detection of the embryos as well as the number of cells together by using time-lapse microscopy. Singh et al. [25] mainly proposes an algorithm for segmentation of up to four blastomeres in a single human HMC embryo image. The algorithm performs under conditions such as fragmentation, cellular and side-lit illumination of cells. Experimental findings show that the algorithm proposed is able to classify and model blastomeres in HMC images with an average accuracy of 80%. Sammali et al. provide an overview of the uterine motion analysis and the characteristics with the help of ultrasonic imaging. They used KNN, SVM and GMM, where KNN had the best accuracy of 93.8% in providing the optimal frequencyrelated characteristic feature set for embryo implantation [21]. They used 4-min TVUS scans that were taken at 5.6-MHz central frequency and 30 frames/s. Similarly, Fernandez et al. discussed several algorithms for improvement of f assisted reproductive technology (ART) such as SVM, ANN, Bayesian networks and convolution neural networks (CNN). They found CNN architecture was more suitable than an MLP for classifying images. But CNN type of AI has a higher computational cost than MLP. The results showed that naïve Bayes gave the best result, and the SVM was the fourth-best area under the curve (AUC) [48]. Inner cell mass (ICM) is an important factor in determining viability of an embryo. Kheradmand et al. discussed the use of fully connected deep convolution neural network (DCNN) trained on 8460 images augmented from 235 images. They proposed a two-stage pipeline pre-processing step, a data augmentation technique. The method proposed gave a high accuracy of 95.6% [24].
396
D. Jhala et al.
4.4 Classification Based on Patient Characteristics There is no reliable process to predict the success rate of an IVF procedure, and the success rate varies from patient to patient. The effectiveness of the specific treatment is influenced by a variety of factors, such as male and women, and different IVF test results. In assessing the effectiveness of treatment, even the pair‘s psychological factors play an important role, while multiple treatment cycles increase the cost and health of the patients as well as the level of stress. Hence, data like endometriosis, tubal factors and follicles in the ovaries and the physiological factors such as stress level factors can be used for reference. A research by Gowramma et al. discusses many intrinsic and extrinsic factors that affect success rate of IVF. The intrinsic factors analysed were the age of the patient, BMI, visibility of embryo, quality of sperm, genetic predisposition, hormonal balance and endometriosis. The extrinsic factors which were current medical technology, treatment method used, experience of the professionals, stress and process time over the data collected from 1,21,744 women [40]. Before starting to model over the key features, pre-processing is an essential part. Hence, an approach to cluster the data and use Johnson’s reducer to find influential factors which are fed as an input to artificial neural networks is effective [41]. This process gave 90% similarity by comparing experimental results to actual results over a dataset containing 27 different test factors of 250 patients. Among these, various factors Hassan et al. worked to narrow down the key factors. The number of highly influential attributes was reduced to 19 for MLP, 16 on RF, 17 for SVM, 12 on C4.5 and eight in CART [49] by the algorithm for selection of features. Overall, age, fertility factor indices, antral follicle counts, NbreM2, sperm collection process, chamotte, in vitro fertilization rate, follicles on day 14 and embryo transfer day were the most influent defined characteristics. One of the earlier studies was done by Durairaj et al. [50] by implementing ANN model over a dataset with 27 attributes, consisting of features like BMI, previous pregnancy, miscarrage, sperm vitality, etc. This work shows 73% of accuracy in the results. Here, the amount of data plays an important role, as it is implemented on a small dataset with 250 samples, and the accuracy is also low as embryo is not analysed and based purely on theoretical data. Later on, among 11,190 patients, [51] and conducted related research with additional features such as AFC, AMH, FSH and five pathogenic factors. The technique used was first to divide patients into different groups and then create a model for each group of SVM to achieve best overall efficiency. Similar research was performed by Qiu et al. [39], where most accurate model was found to be XGBoost which created 0.73 under the ROC curve, and it also gave an accuracy score of 0.70 ± 0.003 when cross-validated with a dataset of 7188 patients. This research helped in reduction of cost and time of pregnancy. A research by Chiang et al. found that an increase in uterine perfusion also increased the chances of pregnancy for women having age 40 or more [36]. The results have been calculated by taking into account various factors such as day 3 FSH, patient age, number of antral follicles, artery PI in the basal uterine, day 3
Predicting Embryo Viability to Improve the Success …
397
estradiol (E2), endometrial thickness on the day that hCG was given, level 2 of the artery and total number of gonadal therapy ampoules. Later on, Azmoudeh et al. found a significant relationship between IVF success and anti-mullerian hormone (AMH) [38]. Logistic regression analysis showed that only AMH > 0.6 was an independent predictor of IVF success. One other way is to use patient characteristics as well as image classification on the embryo to get a more advanced prediction. Uyar et al. [31] implemented six different classifiers on the dataset consisting of 2453 images out of which 89% are positive and 11% negative. The dataset had 18 features that characterize the embryo like women age, infertility category, infertility factor, sperm quality, early cleavage morphology, number of cells, etc. The study found that naive Bayes (0.739 ± 0.036) and radial basis function (0.712 ± 0.036) gave the best ROC curves. The only shortcoming of the research was data imbalance which had an effect on final results.
5 Conclusion The proposed research work has discussed about four different approaches (single day embryo evaluation, time-lapse embryo evaluation, ellipse detection and patient characteristics) to predict the IVF embryo viability. Single day embryo evaluation and time-lapse embryo evaluation both are based on image processing, but more precise selection process can be carried out by using time-lapse imaging as morphological evaluation may vary after distinct time. For ellipse detection technique, Hough transform is commonly used, and the models are able to achieve great accuracies as the number of cells is directly related to embryo efficiency. Last technique discussed was using patient characteristics, which can be used to predict pregnancy outcomes. This method does not take in account the mistakes or any faults in IVF procedure itself. According to analysis, CNN and SVM were found to give the most accurate solutions. Hence, by using these AI algorithms, the accuracy can be improved as high as 89.01% for single day [33], 96.79% for time-lapse imaging [20] and 73% with patient characteristics [31]. To our best knowledge, this is the first study on discussing different AI techniques to predict the embryo viability. The predictions can help the embryologist make optimal decisions based on results obtained from these methods. This will reduce the patient’s cost, time to pregnancy and improve her quality of life.
References 1. IVF Success Rates Increase Using PGD, https://www.fertility-docs.com/programs-and-ser vices/pgd-screening/pgd-increases-ivf-success-rates.php 2. Fertility Rate https://ourworldindata.org/fertility-rate 3. J.J. Wade, V. MacLachlan, G. Kovacs, The success rate of IVF has significantly improved over the last decade. ANZJOG 55(5), 473–476 (2015)
398
D. Jhala et al.
4. R.M. Rad, P. Saeedi, J. Au, J. Havelock, Predicting human embryos’ implantation outcome from a single blastocyst image, May 19 ©2019, (IEEE, 2019) 5. M.K. Bashar, H. Yoshida, K. Yamagata, Embryo quality analysis from four dimensional microscopy images: a preliminary study. in 2014 IEEE Conference on Biomedical Engineering and Sciences, 8–10 December 2014, (Miri, Sarawak, Malaysia, 2014) 6. A. M¨older, S. Czanner, N. Costen, G. Hartshorne, Automatic detection of embryo location in medical imaging using trigonometric rotation for noise reduction. in 2014 22nd International Conference on Pattern Recognition (2014) 7. A. Storr, C.A. Venetis, S. Cooke, D. Susetio, S. Kilani, W. Ledger, Morphokinetic parameters using time-lapse technology and day 5 embryo quality: a prospective cohort study. Accepted: 1 July 2015 / Published online: 15 July 2015, (Springer, New York, 2015) 8. V.B. Guilherme, M. Pronunciate, P. Helena dos Santos, D. de Souza Ciniciato, M.B. Takahashi, J.C. Rocha, M.F.G. Nogueira, Distinct sources of a bovine blastocyst digital image do not produce the same classification by a previously trained software using artificial neural network. in International Conference on Computer-Human Interaction Research and Applications CHIRA 2017: Computer-Human Interaction Research and Applications (2017), pp. 139–153 9. M.F.G. Nogueira, V.B. Guilherme, M. Pronunciate, P.H. dos Santos, D.L. Bezerra da Silva, J.C. Rocha, Artificial intelligence-based grading quality of bovine blastocyst digital images: direct capture with juxtaposed lenses of smartphone camera and stereo microscope ocular lens. Sensors (2018) 10. K.A.S. Pramuditha, H.P. Hapuarachchi, N.N. Nanayakkara, P.R. Senanayaka, A.C. De Silva , Drawbacks of current IVF incubators and novel minimal embryo stress incubator design. in 2015 IEEE 10th International Conference on Industrial and Information Systems, ICIIS 2015, Dec. 18–20, (Sri Lanka, 2015) 11. L.-Y. Chung, H.-H. Shen, Y.-H. Chung, C.-C. Chen, C.-H. Hsu, H.-Y. Huang, D.-J. Yao, In vitro dynamic fertilization by using EWOD device. MEMS 2015, 18–22 January (Estoril, Portugal, 2015) 12. Y.-C. Tzeng, Y.-J. Chen, C. Chuan, L.-C. Pan, F.-G. Tseng, Microfluidic devices for aiding in-vitro fertilization technique. in Proceedings of the 12th IEEE International Conference on Nano/Micro Engineered and Molecular Systems April 9–12, 2017 (Los Angeles, USA, 2017) 13. J. Lu, Y. Hu,A potential assistant robot for IVF egg retrieval. in IEEE SoutheastCon 2004. Proceedings. (Greensboro, North Carolina, USA, 2004), pp. 32–37 14. Z. Abbas, C. Fakih, A. Saad, M. Ayache, Vaginal power doppler parameters as new predictors of intra-cytoplasmic sperm injection outcome. in 2018 International Arab Conference on Information Technology (ACIT), (Werdanye, Lebanon, 2018), pp. 1–7 15. IVF Multi-step process, https://fertility.wustl.edu/treatments-services/in-vitro-fertilization/ 16. N. Nasiri, P. Eftekhari-Yazdi, An overview of the available methods for morphological scoring of pre-implantation embryos in in vitro fertilization. Cell J. 16(4), 392–405 (2015) 17. J. Kort, B. Behr, Traditional Embryo Morphology Evaluation: From the Zygote to the Blastocyst Stage (In Vitro Fertilization, Springer, Cham, 2019). 18. C. Racowsky, M. Vernon, J. Mayer, G.D. Ball, B. Behr, K.O. Pomeroy, D. Wininger, W. Gibbons, J. Conaghan, J.E. Stern, Standardization of grading embryo morphology. J. Assist. Reprod. Genet. 27(8), 437–439 (2010) 19. M. VerMilyea, J.M.M. Hall, S.M. Diakiw, A. Johnston,T. Nguyen, D. Perugini, A. Miller, A. Picou, A.P. Murphy, M. Perugini, Development of an artificial intelligence-based assessment model for prediction of embryo viability using static images captured by optical light microscopy during IVF. Submitted on October 13, 2019; resubmitted on December 23, 2019; Editorial decision on January 16, (2020) 20. D.M.S. Arsa, Aprinaldi, l. Kusuma, A. Bowolaksono, P. Mursanto, B. Wiweko, W. Jatmiko, Prediction the number of blastomere in time-lapse embryo using conditional random field (CRF) method based on bag of visual words (BoVW), ICACSIS (2016) 21. F. Sammali, C. Blank, T.H.G.F. Bakkes, Y. Huang, C. Rabotti, B.C. Schoot, M. Mischi, Prediction of embryo implantation by machine learning based on ultrasound strain imaging. in 2019 IEEE International Ultrasonics Symposium (IUS) Glasgow, Scotland, October 6–9, (019)
Predicting Embryo Viability to Improve the Success …
399
22. S.N. Patil, U.V. Wali, M.K. Swamy, Non-Member IEEE, Selection of single potential embryo to improve the success rate of implantation in IVF procedure using machine learning techniques. in International Conference on Communication and Signal Processing, April 4–6 (2019) 23. S.N. Patil, U.V. Wali, M.K. Swamy, Application of vessel enhancement filtering for automated classification of human In-Vitro fertilized (IVF) images. in 2016 International Conference on Electrical, Electronics, Communication, Computer and Optimization Techniques (ICEECCOT) (2016) 24. S. Kheradmand, A. Singh, P. Saeedi, J. Au, J. Havelock, Inner cell mass segmentation in human HMC embryo images using fully convolutional network, Aug 2017-IEEE-ICIP (2017) 25. A. Singh, J. Buonassisi, P. Saeedi, J. Havelock, Automatic blastomere detection in day 1 to day 2 human embryo images using partitioned graphs and ellipsoids. in ICIP (2014) 26. Z. Liu, B. Huang, Y. Cui, Y. Xu, B. Zhang, L. Zhu, Y. Wang, L. Jin, D. Wu, Multi-task deep learning with dynamic programming for embryo early development stage classification from time-lapse videos. Received August 3, 2019, Accepted August 22, 2019, Date of Publication August 27 (2019) 27. I. Habibie, A. Bowolaksono, R. Rahmatullah, M.N. Kurniawan, M.I. Tawakal, I.P. Satwika, P. Mursanto, W. Jatmiko, A. Nurhadiyatna, B. Wiweko, A. Wibowo, Automatic detection of embryo using Particle Swarm Optimization based Hough Transform. in 2013 International Symposium on Micro-NanoMechatronics and Human Science, MHS (2013) 28. A. Uyar, A. Bener, H.N. Ciray, M. Bahceci, Bayesian networks for predicting IVF blastocyst development. in 2010 International Conference on Pattern Recognition (2010) 29. M.F. Kragh, J. Rimestad, J. Berntsen, H. Karstoft , Automatic grading of human blastocysts from time-lapse imaging. Comput. Biol. Med. 115, 103494 (2019) 30. P. Khosravi, E. Kazemi, Q. Zhan, J.E. Malmsten, M. Toschi, P. Zisimopoulos, A. Sigaras, S. Lavery, L.A.D. Cooper, C. Hickman, M. Meseguer, Z. Rosenwaks, O. Elemento, N. Zaninovic, I. Hajirasouliha, Deep learning enables robust assessment and selection of human blastocysts after in vitro fertilization. npj Received: 2 November 2018 Accepted: 1 March (2019) 31. A. Uyar, A. Bener, H.N. Ciray, M. Bahceci, ROC based evaluation and comparison of classifiers for IVF implantation prediction. (Ed.): eHealth 2009, LNICST 27, (2010), pp. 108–111 32. C. Manna, L. Nanni, A. Lumini, S. Pappalardo, Artificial intelligence techniques for embryo and oocyte classification. A 2012, Reproductive Healthcare Ltd. Published by Elsevier Ltd (2012) 33. C.L. Bormann, Thirumalaraju, Kanakasabapathy, R. Gupta, R. Pooniwala, I. Souter, J.Y. Hsu, S.T. Rice, P. Bhowmick, H. Shafiee, Artificial intelligence enabled system for embryo classification and selection based on image analysis. Fertility Sterility 111(4), e21 (2019) Supplement 34. Y. Wang, F. Moussavi, P. Lorenzen, Automated embryo stage classification in time-lapse microscopy video of early human embryo development, medical image computing and computer-assisted intervention–MICCAI 2013. in MICCAI 2013. Lecture Notes in Computer Science, vol. 8150. (Springer, Berlin, Heidelberg, 2013) 35. D.J. Kotze, P. Hansen, L. Keskintepe, E. Snowden, G. Sher, T. Kruger, Embryo selection criteria based on morphology VERSUS the expression of a biochemical marker (sHLA-G) and a graduated embryo score: prediction of pregnancy outcome. Accepted: 23 February 2010 / Published online: 1 April 2010. (Springer Science+Business Media, LLC, 2010) 36. C.-H. Chiang, T.-T. Hsieh, M.-Y. Chang, C.-S. Shiau, H.-C. Hou, J.-J. Hsu, Y.-K. Soong, Prediction of pregnancy rate of in vitro fertilization and embryo transfer in women aged 40 and over with basal uterine artery pulsatility index. J. Assis. Reprod. Genet. 17(8) (2000) 37. A. Uyar, A. Bener, H.N. Ciray, Predictive modeling of implantation outcome in an in vitro fertilization setting: an application of machine learning methods. Med Decis Making Published Online 19 May (2014) 38. A. Azmoudeh, I.D. Zahra Shahraki, F.-S. Hoseini, F. Akbari-Asbagh, D.-T. Fatemeh, F. Mortazavi, In Vitro Fertilization success and associated factors: a prospective cohort study. Int. J. Women’s Health Reprod. Sci. 6(3), 350–355 (2018)
400
D. Jhala et al.
39. J. Qiu, P. Li, M. Dong, X. Xin, J. Tan, Personalized prediction of live birth prior to the first in vitro fertilization treatment: a machine learning method. Qiu et al. J. Transl. Med. (2019) 40. G.S. Gowramma, S. Nayak, N. Cholli, Intrinsic and extrinsic factors predicting the cumulative outcome of IVF / ICSI treatment. Int. J. Innov. Technol. Explor. Eng. (IJITEE) 9(2S) (2019) 41. M. Durairaj, R. Nandhakumar, An integrated methodology of artificial neural network and rough set theory for analyzing ivf data. in 2014 International Conference on Intelligent Computing Applications, (Coimbatore, 2014), pp. 126–129. https://doi.org/10.1109/ICICA. 2014.35 42. T.-J. Chen, W.-L. Zheng, C.-H. Liu, I. Huang, H.-H. Lai, M. Liu, Using deep learning with large dataset of microscope images to develop an automated embryo grading system. in 2019 by the Asia Pacific Initiative on Reproduction (ASPIRE) and World Scientific Publishing Co., Received 31 January 2019; Accepted 25 February 2019; Published 29 March (2019) 43. K. Uchida, S. Saito, P.D. Pamungkasari, Y. Kawai, I.F. Hanoum, F. H. Juwono, S. Shirakawa, Joint optimization of convolutional neural network and image preprocessing selection for embryo grade prediction in in vitro fertilization. in ISVC 2019, (2019), pp. 14–24 44. A. Uyar, H.N. Ciray, A. Bener, M. Bahceci, 3P: personalized pregnancy prediction in IVF treatment process. eHealth 2008, LNICST 1, (2009), pp. 58–65 45. A. Ahlström, A. Campbell, H.J. Ingerslev, K. Kirkegaard, Prediction of embryo viability by morphokinetic evaluation to facilitate single transfer. Springer International Publishing Switzerland (2015) 46. M.F.G. Nogueira, N. Zaninovic, M. Meseguer, C. Hickman, S. Lavery, J.C. Rocha, Using artificial intelligence (AI) and time-lapse to improve human blastocyst morphology evaluation. in Conference: ESHRE 2018 October 2018, (Barcelona, Spain, 2018) 47. D. Jonaitis, V. Raudonis, A. Lipnickas, Application of computer vision methods in automatic analysis of embryo development. in IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications 24–26 September 2015, (Warsaw, Poland, 2015) 48. E.I. Fernandez, A.S. Ferreira, M.H.M. Cecílio, D.S. Chéles, R.C. Milanezi de Souza, M.F.G. Nogueira, J.C. Rocha, Artificial intelligence in the IVF laboratory: overview through the application of different types of algorithms for the classification of reproductive data. Received: 8 March 2020 /Accepted: 3 July 2020 Springer Science+Business Media, LLC, part of Springer Nature (2020) 49. M.R. Hassan, S. Al-Insaif1, M.I. Hossain, J. Kamruzzaman, A machine learning approach for prediction of pregnancy outcome following IVF treatment. Received: 3 July 2017 / Accepted: 24 August 2018 The Natural Computing Applications Forum (2018) 50. M. Durairaj, P. Thamilselvan, Applications of artificial neural network for IVF data analysis and prediction. J. Eng. Comput. Appl. Sci. (JEC&AS) 2(9) (2013) 51. B. Zhang, Y. Cui, M. Wang, J. Li, L. Jin, D. Wu , In Vitro fertilization (IVF) cumulative pregnancy rate prediction from basic patient characteristics, (IEEE, 2019) 52. C.-T. Lee, H.-Y. Tseng, Y.-T. Jiang, C.-Y. Haung, M.-S. Lee, W. Hsu, Detection of multiple embryo growth factors by bead-based digital microfluidic chip in embryo culture medium. in Proceedings of the 13th Annual IEEE International Conference on Nano/Micro Engineered and Molecular Systems, April 22–26, (Singapore, 2018) 53. J. Herna´ndez-Gonza´lez, I. aki Inza, L. Crisol-Ortı´z, M.A. Guembe, M.J. In˜arra, J.A. Lozano , Fitting the data from embryo implantation prediction: learning from label proportions. Statistical Methods in Medical Research The Author(s) (2016) 54. T. B˛aczkowski, R. Kurzawa, W. Gł˛abowski, Methods of embryo scoring in in vitro fertilization. Reprod. Biol. 4(1), 5–22 (2004) 55. The Birth and History of IVF, https://rmanetwork.com/blog/birth-history-ivf/ 56. Understanding Embryo Grading, https://www.utahfertility.com/understanding-embryo-gra ding/
Breast Cancer Detection and Classification Using Improved FLICM Segmentation and Modified SCA Based LLWNN Model Satyasis Mishra, T. Gopi Krishna, Harish Kalla, V. Ellappan, Dereje Tekilu Aseffa, and Tadesse Hailu Ayane Abstract Breast cancer death rates are higher due to the low accessibility of early detection technologies. From the medical point of view, mammography diagnostic technology increases are essential in the detection process. This research work proposes a segmentation for each image by using improved Fuzzy Local Information C-Means (FLICM) algorithms and classification by using the novel local linear wavelet neural network (LLWNN-SCA) model. Further, the weights of the LLWNN model is optimized by using the modified Sine Cosine Algorithm (SCA) to improve the performance of the LLWNN algorithm. By applying an improved FLICM algorithm, the segmented images have undergone the process of feature extraction. The statistical features are extracted from the segmented images and fed as input to the SCA based LLWNN model. The improved FLICM segmentation achieves an accuracy of about 99.25%. Classifiers such as Pattern Recognition Neural Network (PRNN), Feed Forward Neural Network (FFWNN), and Generalized Regression Neural Network (GRNN) are also utilized for classification, and comparison results are presented with the proposed SCA-LLWNN model. Keywords Breast cancer · Mammography · Feed-forward neural network · Sine cosine algorithm · Feed-forward neural network · Generalized regression neural network
1 Introduction Breast Cancer is a leading reason for death across the globe including the emerging countries. In 2019, nearly more than 150,000 breast cancer survivors are living with metastatic disease, where three-fourths of whom were originally diagnosed with S. Mishra (B) · H. Kalla · V. Ellappan · D. T. Aseffa · T. H. Ayane Signal and Image Processing SIG, Department of ECE, SoEEC, Adama Science and Technology University, Adama, Ethiopia T. Gopi Krishna Department of CSE, SoEEC, Adama Science and Technology University, Adama, Ethiopia © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_33
401
402
S. Mishra et al.
stage I-III according to a recent study by American cancer society epidemiologists [1]. Mammography has a false-negative rate of 10–30%, false positive of 10%. Over 90% of breast cancer can be detected using mammography. It assists interpretation classifier directly to the Region Of ˙Interest (ROI) image data. The feed-forward and the back-propagation neural network has been used with gradient descent learning rule with momentum. Classification methods perform innate programming to classify the masses into benign and malignant class [2]. The features extracted contain the edge, sharpness measures, shape factors, and statistical texture features. Identification of breast tissues using simple image statistics such as kurtosis, correlation, and entropy plays an important role. SVM [3] was then used to segment mammogram images and identifying the true masses from the normal parenchyma and classified the masses as benign or malignant in mammograms. The fast development of deep learning improves the performance of segmentation and classification to identify medical imaging problems. This paper has developed a novel segmentation and classification model to detect breast cancer on mammograms using an “end-to-end” training method that efficiently influences the training datasets with the cancer status of the entire image. Li et al. [4] used Digital Database for Screening Mammography (CBIS-DDSM) database and the Deep learning model and achieved sensitivity: 86.1%, specificity: 80.1%. There are different feature selection methods available as “Principal Component Analysis (PCA)” [5], “Linear Discriminant Analysis (LDA)” [6], “filtering techniques such as chi-square test” [7–9], are used to avoid over fitting. Riddli et al. [10] utilized the “Faster R-CNN” model for the detection of mammographic scratches and classify them into benign and malignant. Singh et al. [11] proposed a “conditional generative adversarial network (cGAN)” for segmentation from an ROI. Agarwal et al. [12] proposed deep CNNtrained on “DDSM dataset” which shown noteworthy results in identifying the semantic characteristics in the clinical decision. Gao et al. [13] proposed a “shallow-deep CNN (SD-CNN)” and “contrast-enhanced DMs (CEDM)” for detection and classification. A “multiview CNN” [14–16] is proposed to detect breast masses. Jung et al. [17] proposed a “single-stage masses detection model” using the “RetinaNet” model for mammogram images. An adaptive local linear optimized radial basis functional neural network model has been proposed [18] for financial time series prediction. Satyasis Mishra et al. [19] proposed “LLRBFNN(Local linear radial basis function neural network)” with SCA-PSO for brain tumor “detection and classification”. The mentioned classifiers involve complex network models and convolutional calculations and take more “computational time”. Motivated by this, to reduce the complexity in the models, a simple LLWNN-SCA model is proposed for classification of breast cancers from the images. Several studies have been conducted for FCM based segmentation such as EnFCM, [19–21], FCMS1 [20], FCMS2 [19], FLICM [22], etc. in the literature to rule out the malignancy from breast images and unfortunately couldn’t acquire the desired results. To have better results of detection, and improved FLICM segmentation algorithm is proposed. Further, the features are extracted from the FLICM segmented images. The extracted features are fed to the SCA-LLWNN model to have better classification results.
Breast Cancer Detection and Classification …
403
The remaining part of the article is organized with different sections as follows: Sect. 2 presents material and methods which include the workflow diagram, and Improved FLICM Algorithm. Section 3 presents the generalized regression neural networks, FNN, PRNN, SCA, and LLWNN model. Section 4 presents the results of segmentation and classification. Section 5 depicts the discussion of the research and Sect. 6 draws the conclusion of the research work.
2 Materials and Methods 2.1 Research Flow Block Diagram of the Proposed Method Different methods are proposed to detect and classify breast cancers. In addition to existing methods, this article proposes a novel local linear wavelet neural network model with SCA optimization of weights to improve the performance of the LLWNN model. The results of accuracy are compared with that of previous models. The proposed research flow is shown in Fig. 1. At the first step(1) the breast cancer image dataset has been undergone for preprocessing to enhance the images (ii) in the second step the enhanced images are utilized for segmentation by utilizing the proposed improved FLICM algorithm to detect the tumor from the mammogram breast cancer images (iii) Further in the third step, the
Image Acquisition and pre processing
Segmentation by FLICM
Statistical Feature Extraction
RBFNN
LLWNN -SCA Breast Cancer Classification layerComparison results
Fig.1 Workflow diagram of the proposed method
LLWNN
404
S. Mishra et al.
segmented images are utilized for statistical feature extraction (iv) At the 4th step the statistical features are fed as input to the proposed SCA based LLWNN model for classification breast cancer and comparison results are presented with the conventional RBFNN [23] and LLWNN models. The RBFNN(Radial Basis function Neural network) has been considered for comparison of classification as it has also proved good results for breast cancer classification, but the computational time taken by the algorithm is more in comparison to the proposed SCA based LLWNN model. At the same time, the LLWNN model also takes more time to converge which is shown in the Table 3.
2.2 Improved FLICM Algorithm In order to reduce the computation in FCM algorithms [19, 20], the neighborhood term of FCM_S with variant is proposed. the objective function can be written as follows: Jm =
c
m xk − vi 2 + α u ik
i=1 k∈Nk
N c i=1 k=1
m u ik
xk − vi 2
(1)
k∈Nk
where xk is a means of neighbouring pixels lying within a window around xk . The EnFCM [21] algorithm proposed to speed up the segmentation process where as a linearly-weighted sum image ξ and its local neighbor average image is presented as ⎞ ⎛ 1⎝ α ⎠ xj (2) ξk = xk + α N R j∈N k
where ξk denote the gray value of the kth pixel of the image ξ, The FLICM [24] introduces a fuzzy factor to improve the performance of the segmentation better than the EnFCM but fails to preserve the robustness of the image quality. To improve the performance further the fuzzy factor of the FLICM has been modified as G kv =
r ∈Nv v=r
1 exp(dvr + 1)
(1 − u kr )m xr − vk 2
(3)
With the modification of the fuzzy factor with the exponential function to the spatial Euclidean distance dvr , the updated cost function with the modification factor is given by
Breast Cancer Detection and Classification …
Js =
q c
γl u ilm (ξl − vt )2 + exp(G kv )
405
(4)
t=1 l=1
With the new cost function, the accuracy has been calculated and compared to the other FCM based segmentation algorithms.
3 LLWNN Model with SCA Weight Optimization In Fig. 2, the proposed SCA-LLWNN model the weights of the LLWNN [25–27] model with the SCA algorithm is presented and the pseudo-code for the algorithm for optimization also been shown. With the data inputs x1 , x2 , . . . xn which are treated as features and Z 1 , Z 2 , –––-Z N are the wavelet activation function in the hidden units and is demarcated by a wavelet Kernel as x − qi 1 (5) Z n (x) = | pi | 2 pi where the parameters p, q is the scaling and translation parameters, respectively
Fig. 2 SCA weight optimization based LLWNN model
406
S. Mishra et al.
yn =
N
(wi0 + wi1 x1 + . . . wi N x N )Z n (x)
(6)
i=1
The objective function is to minimize the error and the MSE is given by MSE(e) =
N 1 (dn − yn )2 N n=1
(7)
where “d” is the desired vector.
3.1 Modified Sine Cosine Algorithm According to the sine cosine algorithm [19], the position equation is updated as X in+1
=
X in + α1 × sin(α2 ) ×
α3 p g best − X in
, α4 < 0.5 X in + α1 × cos(α2 ) × α3 p g best − X in , α4 ≥ 0.5
(8)
where α1 , α2 , α3 , α4 are the random variables and α1 is given by n α1 = a 1 − K
(9)
where n is the current iteration, K is the maximum number of iterations. The current position X in and X in+1 updated position has been mentioned in Eq. (12). The parameter α1 governs the next place regions, α2 evaluates the direction of movement from xi (n). The parameter α3 controls the current movement, and the parameter α4 equally changes among the sine and cosine functions. To have faster speed of convergence of the parameter α1 is improved as
1 αm = exp 1 + (α1 )
(10)
And the corresponding updated position equation is given as
X in+1 j
=
X inj + αm × sin(α2 ) × α3 p g best − X in , α4 < 0.5
X inj + αm1 × cos(α2 ) × α3 p g best − X in , α4 ≥ 0.5
(11)
Breast Cancer Detection and Classification …
407
Further, the weights are mentioned as W = [wi0 + wi1 x1 + . . . wi N x N ] and the weights are mapped and updated using
Win+1 j
=
Winj + αm × sin(α2 ) × α3 p g best − Win , α4 < 0.5
Winj + αm1 × cos(α2 ) × α3 p g best − Win , α4 ≥ 0.5
(12)
3.2 Data Collection The breast cancer databases are collected from the “Cohort of Screen-Aged Women (CSAW)” which is a population-based data of all women “40 to 74 years of age” in the “Stockholm region, Sweden”. “CSAW” [28] included 499,807 women invited for screening examinations. A total of 500 images are considered for training and testing. The features such as mean, standard deviation, variance, energy, entropy, minimum, and maximum are obtained from the segmented images. So there is 500 × 7 = 3500 number of data re-utilized for the purpose of classification. The simulations are accomplished with MATLAB2019a with an 8 GB RAM,2.35 GHz system. Pseudo-code: SCA for weight optimization of LLWNN Model 1. Initialize the parameters of SCA parameters α1 , α2 , α3 , αm , K 2. Initialize the position equation and map with the weights 3. Initialize the weights as Winj 4. % starting of loop for l = 1:K update(Winj ) as
Winj + αm × sin(α2 ) × α3 p g best − Win , α4 < 0.5 n+1 Wi j =
Winj + αm1 × cos(α2 ) × α3 p g best − Win , α4 ≥ 0.5 End to obtain minimum weight values update Win+1 j end of for loop 6. Stopping criteria: Continue till optimization gets minimum error values 7. If not converges, repeat until nearly zero error is satisfied
408
S. Mishra et al.
4 Results 4.1 Segmentation Results 4.2 Classification Results The X-axis of the above graph represents the simulation result iteration (epochs) and Y-axis determines Mean Squared Error (MSE). A set of dataset observations were randomly split into test error (as showed in red), validation set (showed in blue), and training set (showed in cyan). This graph showed test error at the beginning is quite higher when epoch increase error linearly decreased.
5 Discussions Figures 3, 4, 5, shows the segmentation results. The segmentation accuracy is measured with rician noise[19]. Improved FLICM also reduces the noise from the breast cancer images and provides better performance in segmentation. The segmentation accuracy has been presented in the Table 1 with rician noise. The classifier LLWNN-SCA is focused with the help of the training method and validation method. For comparison “Feed-Forward Neural Network (FFWNN)”, Fit Function Neural Fig. 3 Segmentation of the breast cancer images
Fig. 4 Segmentation of the breast cancer images
Breast Cancer Detection and Classification …
409
Fig. 5 Segmentation of the breast cancer images
Table 1 Segmentation accuracy
Algorithm Noise level Rician noise (σn = 10) Rician noise (σn = 20) FCM S1
90.21
89.83
FCM S2
95.43
92.37
En FCM
98.21
96.52
FLICM
98.81
97.75
IFLICM
99.25
98.83
The bold indicates higher values of accuracy in comparison to other mentioned algorithms
Network (FFNN), and pattern regression neural network (PRNN). The output simulation results of the same data used for the Generalized Regression Neural Networks (GRNN), PNN, and RBFN are considered. The output simulation result of comparison performance by hold-validation, cross-validation for LLWNN-SCA is shown in Fig. 6. Among training set methods LLWNN-SCA training performs best accuracy among the others as observed in Table 2. As the model LLWNN-SCA is a novel method, and the functions of the network are not available, MSE is taken through mathematical calculation. In this research work, performances of different parameters have been considered. All parameters in detection schemes are used to measure the enhancement of mammogram image by removing noise and minimize the variation of neighboring pixels and increase contrast image. MSE is the key parameter to measure the performance of the error in classification. Network error (net error) is also one of the parameters which measure the error at each mentioned neural network. Bias is an important factor that has a great impact on the radial basis function. Another property of radial basis function is the disparity of output configuration and distance between input and weights. In order to manifest the statistics of the simulation in terms of different parameters for all scheme are provided in Table 2. The test errors and accuracy are calculated by using the machine learning models such as PRNN, GRNN, RBFNN, PNN, and FFWNN that are simulated with the dataset newly created for this research work. The sensitivity and specificity performance
410
S. Mishra et al. Best Validation Performance is 0.087111 at epoch 2 Train Validation
100
Test Best
Mean Squared Error (mse)
Goal
-2
10
-4
10
-6
10
0
1
2
3
4
5
6
7
8
8 Epochs
Fig. 6 Comparison of performance for hold-validation of LLWNN-SCA
evaluations are presented in Table 3. The sensitivity is defined as True Positive (TP) divided by the True Positive (TP) + False Negative(FN).Similarly, the specificity is defined as true negative (TN) divided by true negative (TN) + False positive (FP). The sensitivity and specificity are calculated by considering the classification measure. The computational time taken by the proposed SCA + LLWNN took 18.4521 secs which is lower than the RBFNN, and LLLWNN method.
6 Conclusion This research work proposes a novel Improved FLICM segmentation and hybrid SCA-LLWNN classification machine learning model. The breast cancer images are segmented by the improved FLICM method and the segmented images are utilized for feature extraction. The statistical features are extracted and fed as input to the proposed SCA-LLWNN model. The FFNN, FFWNN, GRNN also has been taken into consideration and the classification results are presented in Table 3. The SCALLWNN has shown better accuracy compared to all other mentioned methods. PRNN also showed good performance among training neural networks due to reduction of the false negative and false positive predicted class as well as improve true positive and true negative predicted class of breast cancer dataset. As observed from Table2 that, when the number of hidden layers decreases, the error also decreases, and the accuracy increases. The key existing parameters such as MSE, test error, accuracy in
Breast Cancer Detection and Classification … Table 2 Performance validation for classification
411
Neural network scheme
Parameters (%)
GRNN
MSE
0.427
Net error
0.683
0.683
Accuracy
96.667
98.333
MSE
0.428
0.428
Net error
15.409
15.409
Accuracy
98.333
96.667
MSE
0.030
0.030
Net error
0.0876
Accuracy
95
MSE
3.70
Test error
0.0381
0.032
Accuracy
97.145
98.095
MSE
1.83
Test error
0.046
Accuracy
97.145
MSE
0.99
Test error
0.029
0.051
Accuracy
98.095
99.048
MSE
0.130
0.130
Test error
0.023
0.048
Accuracy
99.020
99.307
PNN
RBFNN
FFWNN
PRNN
LLWNN
LLWNN-SCA
Hidden layer (N) N = 20
N = 40 0.427
0.876 98.333 1.86
1.9 0.0485 98.968 0.99
The bold indicates higher values of accuracy in comparison to other mentioned algorithms
Table 3 Performance evaluation Model
No. of data
Computational Time in sec
Sensitivity in %
Specificity in %
Accuracy in %
RBFNN
3500
39. 5234
97.25
88.59
98.333
LLWNN
3500
26.6754
98.43
95.67
99.048
LLWNN + SCA
3500
18.4521
98.89
99.15
99.307
The bold indicates higher values of accuracy in comparison to other mentioned algorithms
mammogram image classification has been utilized to present the performance of the models. With these experimental results, it can be concluded that the performance of detection and classification achieved by improved FLICM segmentation and SCALLWNN for mammogram image is superior to the other FCM based segmentation methods and classification models.
412
S. Mishra et al.
References 1. https://www.cancer.org/content/dam/cancer-org/research/cancer-facts-and-statistics/breastcancer-facts-and-figures/breast-cancer-facts-and-figures-2019–2020.pdf in American cancer society 2020 2. S. Mojarad, S. Dlay, W. Woo, G. Sherbet,Breast cancer prediction and cross validation using multilayer perceptron neural networks. in Proceedings 7th Communication Systems Networks and Digital Signal Processing, Newcastle, 21st–23rd July (IEEE, 2010), pp. 760–674 3. Y. Ireaneus Anna Rejani, S. Thamarai Selvi Noorul, Early detection of breast cancer using SVM classifier technique. Int. J. Comput. Sci. Eng. 1(3), 127–130 ( 2009) 4. L. Shen, L.R. Margolies, J.H. Rothstein, Deep learning to improve breast cancer detection on screening mammography. Sci Rep 9, 12495 (2019) 5. H.P. Chan, D. Wei, M.A. Helvie, B. Sahiner, D.D. Adler, M.M. Goodsitt, Computer-aided classification of mammographic masses and normal tissue: linear discriminant analysis in texture feature space. Phys. Med. Biol. 40(5), 857–876 (1995) 6. X. Jin, A. Xu, R. Bie, P. Guo, Machine learning techniques and chi-square feature selection for cancer classification using SAGE gene expression profiles. in Proceedings of the 2006 International Conference on Data Mining for Biomedical Applications 2006 Presented at: BioDM’06; April 9, (Singapore, 2006), pp. 106–115 7. G.I. Salama, M.B. Abdelhalim, M.A. Zeid, Breast cancer diagnosis on three different datasets using multi-classifiers. Int. J. Comput. Sci. Inf. Technol. 1(1), 36–43 (2012) 8. E. Saghapour, S. Kermani, M. Sehhati, A novel feature ranking method for prediction of cancer stages using proteomics data. PLoS One 12(9), e0184203 (2017) 9. M.M. Eltoukhy, S.J. Gardezi, I. Faye, A method to reduce curvelet coefficients for mammogram classification. in Proceedings of the Region 10 Symposium. 2014 Presented at: IEEE’14; April 14–16, 2014, (Kaulalumpur, Malaysia, 2014), pp. 663–666 10. D. Ribli, A. Horváth, Z. Unger, P. Pollner, I. Csabai, Detecting and classifying lesions in mammograms with deep learning. Sci. Rep. 8(1), 4165 (2018) 11. V.K. Singh, S. Romani, H.A. Rashwan, F. Akram, N. Pandey, M. Sarke, conditional generative adversarial and convolutional networks for x-ray breast mass segmentation and shape classification. in Proceedings of the Medical Image Computing and Computer Assisted Intervention. Presented at: MICCAI’18; September 16–20. (Granada, Spain, 2018), pp. 833–840 12. V. Agarwal, C. Carson, Stanford University. Using deep convolutional neural networks to predict semantic features of lesions in mammograms (2015). https://cs231n.stanford.edu/rep orts/2015/pdfs/vibhua_final_report.pdf 13. F. Gao, T. Wu, J. Li, B. Zheng, L. Ruan, D. Shang, SD-CNN: a shallow-deep CNN for improved breast cancer diagnosis. Comput. Med. Imaging Graph 70, 53–62 (2018) 14. Y.B. Hagos, A.G. Mérida, J. Teuwen, Improving breast cancer detection using symmetry information with deep learning. in Proceedings of the Image Analysis for Moving Organ, Breast, and Thoracic Images. RAMBO’18; September 16, (Granada, Spain, 2018), pp. 90–97 15. J. Teuwen, S. van de Leemput, A. Gubern-Mérida, A. Rodriguez-Ruiz, R. Mann, B. Bejnordi, Soft tissue lesion detection in mammography using deep neural networks for object detection. In: Proceedings of the 1st Conference on Medical Imaging with Deep Learning. 2018 Presented at MIDL’18; July 4–6. (Amsterdam, The Netherlands, 2018) pp. 1–9 16. R. Dhaya, Deep net model for detection of Covid-19 using radiographs based on ROC analysis. . J. Innov. Image Process. (JIIP) 2(03), 135–140 (2020) 17. H. Jung, B. Kim, I. Lee, M. Yo, J. Lee, S. Ham, Detection of masses in mammograms using a one-stage object detector based on a deep convolutional neural network. PLoS ONE 13(9), e0203355 (2018) 18. S.N. Mishra, A. Patra, S. Das, M.R. Senapati, An adaptive local linear optimized radial basis functional neural network model for financial time series prediction. Neural Comput. Appl. 28(1), 101–110 (2017)
Breast Cancer Detection and Classification …
413
19. S. Mishra, P. Sahu, M.R. Senapati, MASCA–PSO based LLRBFNN model and improved fast and robust FCM algorithm for detection and classification of brain tumor from MR image. Evol. Intel. 12, 647–663 (2019) 20. S. Chen, D. Zhang, Robust image segmentation using FCM with spatial constraints based on new kernel-induced distance measure. IEEE Trans. Syst. Man Cybern B Cybern 34(4), 1907–1916 (2004) 21. L. Szilagyi, Z. Benyo, S.M. Szilagyii, H.S. Adam, MR brain image segmentation using an enhanced fuzzy c-means algorithm. In: Proceeding of the 25th annual international conference of the IEEE EMBS, (2003), pp. 17–21 22. W. Cai, S. Chen, D. Zhang, Fast and robust fuzzy c-means clustering algorithms incorporating local information for image segmentation. Pattern Recognit. 40(3), 825–838 (2007) 23. K.T. Vijay, Classification of brain cancer type using machine learning. J. Artif. Intell. 1(2), 105–113 (2019) 24. S. Krinidis, V. Chatzis, A robust fuzzy local inform ation cmeans clustering algorithm. IEEE Trans. Image Process 19(5), 1328–1337 (2010) 25. M.R. Senapati, P.K. Dash, Intelligent systems based on local linear wavelet neural network and recursive least square approach for breast cancer classification. Artif. Intell. Rev. 39(2), 151–163 (2013) Springer, ISSN 0269–2821 26. W.S. Tamil Selvi, J. Dheeba, N. Albert Singh, Computer aided detection of breast cancer on mammograms: a swarm intelligence optimized wavelet neural network approach. J. Biomed. Info. 49(2014), 45–52 (2014) Elsevier Inc 27. V. Chakkarwar, M.S. Salve, Classification of mammographic images using gabor wavelet and discrete wavelet transform. Int. J. Adv. Res. ECE 573–578 (2013) ISSN 28. K. Dembrower, P. Lindholm, F. Strand, A multi-million mammography image dataset and population-based screening cohort for the training and evaluation of deep neural networks—the cohort of screen-aged women (CSAW). J Digit Imaging 33, 408–413 (2020)
Detection of Diabetic Retinopathy Using Deep Convolutional Neural Networks R. Raja Kumar, R. Pandian, T. Prem Jacob, A. Pravin, and P. Indumathi
Abstract Diabetic retinopathy (DR) is the most prevalent disease among diabetic patients. DR affects retinal blood vessels and causes total loss of vision if it is not treated earlier. Around 50% of the diabetic population is affected by DR and thus necessitates the need for DR detection. Manual processes available consume more time, and hence a convolutional neural network is employed to detect DR at an earlier stage. The proposed method has three convolutional layers and a fully connected layer. This method gives higher accuracy (94.44%) with reduced hardware requirement than conventional approaches to detect and classify DR into five stages, namely no DR, mild, moderate, severe, and proliferative DR. Keywords Diabetic retinopathy · Deep learning · Convolutional neural network · Convolutional layer · Fully connected layer · Radial basis function
1 Introduction Diabetes is a disease that turns up when adequate insulin is not secreted by the pancreas or the cell does not respond to the produced insulin. The increase in blood sugar level causes damage in the retinal blood vessels called diabetic retinopathy. This disease at a severe stage leads to a total loss of vision. There are two important stages R. Raja Kumar (B) Mathematics Department, Sathyabama Institute of Science and Technology, Sholinganallur, 600119 Chennai, India R. Pandian Department of Electronics and Instrumentation Engineering, Sathyabama Institute of Science and Technology, Sholinganallur, 600119 Chennai, India T. Prem Jacob · A. Pravin Department of Computer Science and Engineering, Sathyabama Institute of Science and Technology, Sholinganallur, 600119 Chennai, India P. Indumathi Department of Electronics Engineering, MIT Campus, Anna University, 600044 Chennai, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_34
415
416
R. Raja Kumar et al.
Fig. 1 a An eye affected by diabetic retinopathy (Medical News Today, August 2017)
of diabetic retinopathy, namely non-proliferative diabetic retinopathy (NPDR) and proliferative diabetic retinopathy (PDR). In NPDR, the growth of new blood vessels which affect the already present blood vessels is less, whereas, in the PDR, the new abnormal blood vessels grow over the retina causing severe loss of vision (Fig. 1). As per the World Health Organization (WHO), about 70% of persons with diabetic retinopathy live in developing countries. Sometimes the diabetic retinopathy symptoms do not show up at an earlier stage so the ophthalmologist cannot treat the disease within the stipulated time as the screening process takes more time. Diabetic retinopathy is categorized into five stages, namely stage 1 denotes No DR—The person is normal and does not possess any symptom of DR, and stage 2 denotes Mild—a feature called microaneurysm is formed that are small balloon-like swellings in the retinal blood vessels. This prevents light from entering into the retina from leaking fluid into the retinal area. Stage 3 denotes Moderate—the blood circulation in the retinal blood vessels gets damaged due to the formation of exudates, and stage 4 denotes Severe—new abnormal blood vessels emerge in the retina affecting necessary blood vessels. Stage 5 denotes Proliferative DR—the growth of new blood vessels increases drastically causing the detachment of the retina from the basal blood vessels. This leads to permanent loss of vision.
Detection of Diabetic Retinopathy Using Deep Convolutional …
417
Deep learning is a subgroup of machine learning with enhanced performance. It does not require separate feature extraction, unlike machine learning. It goes deep into the network for analysis based on the extracted features in each layer. In deep learning, convolutional neural network (CNN) is an approach that is mostly applied to perceptible images. It has various convolutional layers to detect features in an image and fully connected layers to aggregate the results for class separation. Our proposed model consists of three convolutional layers and a fully connected layer. The convolutional layer gives a feature map of the extracted features like shape, color, texture, and size, and there is a layer called softmax after the fully connected layer to normalize the output into probabilities. This convolutional neural network requires nominal preprocessing. This is achieved with the help of the data augmentation process. The comparative results with the data augmentation process and without the data augmentation process are discussed in the following sections.
2 Related Work Shanthi et al. [1] used an algorithm by modifying basic AlexNet architecture to classify DR. This approach contains only four classes. Frazao et al. [2] used holistic texture and local retinal features to diagnose DR. This method produces low accuracy. Jebaseeli et al. [3] proposed a new algorithm named tandem pulse coupled neural network (TPCNN) algorithm for feature extraction and support vector machine (SVM) for classification. This method provides low sensitivity and also needs a separate algorithm for feature extraction. Devaraj et al. [4] have surveyed extract features like microaneurysms and exudates. Wan et al. [5] have done comparative work with different classifiers for analyzing classifier performance. Uma et al. [6] proposed a morphological-based approach for extraction of features like blood vessels, optic disk, finally the exudates. This method is applicable for a small set of images, and it requires an additional step called feature extraction. Omar et al. [7] used morphological operations-based approach to classify DR. The accuracy obtained with this method is not too good. Pratt et al. [8] used 10 convolutional layers and 3 fully connected layers to detect and classify DR. the accuracy and specificity obtained with this method are too low. Omar et al. [9] used a multiscale local binary pattern (LBP) texture model for feature extraction and radial basis function (RBF) for classification in their proposed work. This approach fails to classify DR into five stages as it concentrates only on exudate extraction. To detect shapes, higher-order spectra (HOS) were used in [10] for its nonlinear features. The SVM classifier was used to classify DR into five classes. Though very involved, the achievable performance measures were only a sensitivity of 82% and a specificity of 88%. Different stages of DR are identified in [11]. The neo-normal pandemic is to be detected which was discussed [12]. The authors have made use of deep neural networks for the detection of Covid-19. Interestingly in [13], retrieval of complex images was done using
418
R. Raja Kumar et al.
cognitive classification. The risk factors of DR are investigated in [14], after applying statistical techniques. The proposed work involves the classification of DR and various methods for investigation. Convolutional neural networks are employed to solve the problem. The reason to select CNN is that it is involved with the extraction of features (from image) and later split the data for training and testing purposes.
3 Proposed Framework The fundus camera is used to obtain the fundus images of the eye (i.e., taking a photograph of the rear of the eye) and databases like the Kaggle database were used to store those fundus images. The concept of a convolutional neural network is used to illustrate the framework of the five different classes of DR. The proposed framework is shown in Fig. 2b origins with the data collection. The input retinal fundus image from the Kaggle database is used in this work. The input retinal image is of size around 1 MB. It takes more time to process. As an initial stage, the input image is preprocessed wherein resizing and cropping is done to reduce the image size and also to remove unwanted portions from an image. The convolutional neural network plays a major role in extracting features from an image and classifying them based on the obtained features. The data are split for training as well as testing. The convolutional layer does the process of feature extraction by convolving the input image with the filter or kernel used. In each convolutional layer output, a feature map is obtained. The entire feature points grabbed from the
Fig. 2 b Proposed framework
Detection of Diabetic Retinopathy Using Deep Convolutional …
419
convolutional layer do not contain useful information, and therefore, the max-pooling is done to reduce the dimension. Finally, the fully connected layer connects all the neurons, such that the obtained features are converted into classes. The training and testing data are compared to obtain the performance metrics like accuracy, sensitivity, and specificity.
3.1 Preprocessing The images acquired from the Kaggle database are of various sizes. This makes the task convolutional neural network complex. Therefore, the images from the dataset are resized and cropped to a size of 227*227*3 to remove unwanted portions and to reduce memory consumption [5]. Some of the authors extracted the green channel from the image as it gives high sensitivity and moderate saturation than red and blue channels. The proposed work does not necessitate the process of green channel extraction as the data augmentation process and collects fine details from the image by scaling, translation, and rotation.
3.2 Convolutional Neural Network Convolutional neural network (CNN) is mainly applied to visual images as it requires less preprocessing time. A simple convolutional neural network architecture is given in Fig. 3. It consists of an input layer, output layer, and hidden layers. The convolutional layer, pooling layer, and fully connected layer form the hidden layer. Each convolutional layer has some kernel/filters which help in the extraction of features from the image. The filter operates on the original image to produce a feature map.
Fig. 3 Simple convolutional neural network architecture
420
R. Raja Kumar et al.
Fig. 4 Data augmented image
The obtained feature map contains feature points. All the feature points in the feature map are not necessary and hence max-pooling, which is the process of sub-sampling is done to reduce the dimensions of an image. Thus, the processing time is reduced. The fully connected layer neurons are fully connected to the previous layer neurons for class separation.
3.3 Data Augmentation The data augmentation is the process wherein the data are flipped horizontally, vertically, tilted, scaled, rotated, and translated to collect fine details of an image. In simple convolutional neural networks, the training performance is much higher than the testing performance. Hence, methods such as data augmentation, normalization, and regularization are adopted. The data augmentation process is portrayed in Fig. 4
3.4 Proposed CNN The proposed CNN architecture is comprised of three convolutional layers such a max pool layer, a fully connected layer, and a softmax layer. In a brief sense, the layers are as follows: conv_1 consists of eight 3*3 filters, maxpool_1 of size 2*2, conv_2
Detection of Diabetic Retinopathy Using Deep Convolutional …
421
Fig. 5 Proposed CNN architecture
consists of sixteen 3*3 filters, maxpool_2 of size 2*2, conv_3 consists of thirty-two filters of size 3*3, fully connected layer fc4, and finally ended with softmax layer. The softmax layer gives the class scores by normalizing the output of the fully connected layer into probabilities. The proposed architecture is given in Fig. 5. The activation function used in this proposed work is the rectified linear unit (ReLU). It decides which neurons to be activated such that the desired classification result is achieved. This function eliminates the drawbacks in the previous functions like gradient exploding or gradient vanishing and nonzero centered problem. The ReLU activation function with a feature point ‘y’ is given by f (y) = max(0, y)
(1)
The softmax function σ (y) is given by ey j σ (y) = K k=1
e yk
for j = 1, . . . K and y = (y1 , . . . , y K )e R K
where yj is the element of the input vector y.
(2)
422
R. Raja Kumar et al.
Fig. 6 Max-pooling operation
The max-pooling layer performs max-pooling which is a process of downsampling the feature map obtained and taking the maximum values. Therefore, the dimensionality along the width and height is greatly reduced with this process. High contrast features are obtained by this process. The operation of max-pooling is given in Fig. 6. Each layer has a dimension that can be calculated by the following formula. The calculation of layer size plays a crucial role in the design of a convolutional neural network. The output layer size (O) is given by O=
I − K + 2P +1 S
(3)
where ‘I’ represents the input layer width or height, ‘K’ represents the filter/kernel size, and ‘P’ represents the padding length, ‘S’ represents the stride length provides the number of shifts of the filter takes over the input image.
4 Experimental Results 4.1 Software Used The proposed framework uses a simulation platform of Matlab R2018b. An interactive mode is available in MATLAB 2018b enables the user to create the individual neural network using a deep neural network designer application. MATLAB performs matrix operations; i.e., it converts the input image into the matrix and then into a vector for processing. This is highly used for image processing applications.
Detection of Diabetic Retinopathy Using Deep Convolutional …
423
Fig. 7 Sample classification dataset
4.2 Dataset The dataset used in the framework is obtained from the Kaggle database [1]. The training and the testing images in the dataset consist of five stages, namely no DR, mild, moderate, severe, and proliferative DR. The images in the dataset consist of both the inverted and the non-inverted images. If a square or circular notch is present in the image, then the image is not inverted and vice-versa. Here, a zip application is recommended to extract zip files in the Kaggle database. 300 images from the Kaggle database are used for training as well as testing. This requires a high-end general processing unit (GPU). The sample classification dataset for DR stages is given in Fig. 7.
4.3 Simulation Results The proposed CNN architecture is comprised of three convolutional layers and a fully connected layer [10–15]. Out of 300 images, 70% (210) is used for training, and 30% (90) is used for testing. The Kaggle dataset is used in the proposed work. The image is resized to a size of 227*227*3. This is done to reduce memory consumption and
424
R. Raja Kumar et al.
Fig. 8 Proposed CNN network analysis
processing time. The architecture of the proposed work and the layer description is mentioned in the network analysis figure which is given in Fig. 8. The training images used for the proposed framework is shown in Fig. 9. The model accuracy and loss graph without data augmentation are shown below. As the images from the dataset are not flipped, rotated, scaled, or translated, only less detail from the image is captured. If the feature obtained from the image is less, the accuracy, thus resulted in 66.67%. Due to this, another problem occurs called over fitting where the performance of the training dataset is too high, and the testing dataset is too low. This can be well seen in the model accuracy and loss graph shown in Fig. 10. The data augmentation process overcomes the problem of overfitting, and hence, the performance metrics of the proposed framework are considerably enhanced. Backpropagation is the algorithm used in the convolutional neural network for loss calculation and minimization. The accuracy obtained with this process is 94.44%. This is clearly shown in Fig. 11. The classified result is shown in Fig. 12. The softmax layer produces class scores. Based on the class scores, the input image is classified into one of the following stages, namely no DR, mild, moderate, severe, and proliferative DR.
Detection of Diabetic Retinopathy Using Deep Convolutional …
425
Fig. 9 Training images
The performance metrics like sensitivity, specificity, accuracy, f-score, and precision are calculated with the help of the obtained confusion matrix as shown in Fig. 13. The performance metrics are given by the following formula and are shown in Table 1. (TP + TN) (TP + TN + FP + FN)
(4)
Precision =
TP (TP + FP)
(5)
Sensitivity =
TP (TP + FN)
(6)
Specificity =
TN (TN + FP)
(7)
Accuracy =
426
R. Raja Kumar et al.
Fig. 10 Model accuracy and loss without data augmentation
f − score =
2 ∗ TP (2*TP + FP + FN)
(8)
The parameters such as accuracy, sensitivity, specificity, precision, and f-score achieved as a whole are 94.44, 93.96, 98.54, 93.96, and 93.96%, respectively. If suppose the accuracy was 100%, there will not be any indication whatsoever about the various stages of DR. As an example, the 100% accuracy concerning the absence of DR being accurately diagnosed. Once that happens, there is no chance of any other stage of DR affecting the patient.
5 Conclusion In this proposed work, a deep convolutional neural network is designed to detect and classify diabetic retinopathy into five stages, as it is a deadly disease among diabetic patients. The results obtained from our proposed framework are quite high
Detection of Diabetic Retinopathy Using Deep Convolutional …
427
Fig. 11 Model accuracy and loss with data augmentation
when compared to the performance metrics obtained. Therefore, convolutional neural network-based diabetic retinopathy classification is more reliable and accurate. This consumes less time for classification to occur. Hence, the disease can be treated earlier and reduces the risk of vision loss. In the future, the images with higher resolution can be used for gaining high accuracy and other performance metrics.
428
R. Raja Kumar et al.
Fig. 12 Classified images
Table 1 Performance metrics of proposed CNN Performance Mild (%) Moderate (%) No DR (%) Proliferative DR Severe DR (%) metric per classes (%) Precision
95.23
93.75
96.15
90.9
93.75
Sensitivity
95.23
93.75
96.15
90.9
93.75
Specificity
98.48
98.59
98.36
98.68
98.59
f-score
95.23
93.75
96.15
90.9
93.75
Accuracy*
–
–
100
–
–
* implies if suppose the accuracy was 100%, there will not be any indication whatsoever about the various stages of DR
Detection of Diabetic Retinopathy Using Deep Convolutional …
429
Fig. 13 Confusion matrix
References 1. T. Shanthi, R.S. Sabeenian, Modified alexnet architecture for classification of diabetic retinopathy images. Int. J. Comput. Electri. Eng. 76, 56–64 (2019) 2. L.B. Frazao, N. Theera-Umpon, S. Auephanwiriyakul, Diagnosis of diabetic retinopathy based on holistic texture and local retinal features. Int. J. Info. Sci. 475, 44-66 (2019) 3. T. Jemima Jebaseeli, C. Anand Deva Durai, J. Dinesh Peter, Segmentation of retinal blood vessels from ophthalmologic diabetic retinopathy images. Int. J. Comput. Electri. Eng. 73, 245–258 (2019) 4. D. Devaraj, R. Suma, S.C. Prasanna Kumar, A survey on segmentation of exudates and microaneurysms for early detection of diabetic retinopathy. ILAFM2016 5(4), 10845–10850 (2018) 5. S. Wan, Y. Liang, Y. Zhang, Deep convolutional neural networks for diabetic retinopathy detection by image classification. Int. J. Comput. Electri. Eng. 72, 274–282 (2018) 6. P. Uma, P. Indumathi, Remote examination of exudates-impact of macular edema. Healthcare Technol. Lett. IET Publishers 5(4), 118–123 (2018) 7. Z.A. Omar, M. Hanafi, S. Mashohor, N.F.M. Mahfudz, Automatic diabetic retinopathy detection and classification system. in 7th IEEE Conference on System Engineering and Technology, (2017), pp. 162–166 8. H. Pratt, F. Coenen, D.M. Broadbent, S.P. Harding, Y. Zheng, Convolutional neural networks for diabetic retinopathy. in International Conference on Medical Imaging Understanding and Analysis, vol. 90 (2016), pp. 200–205
430
R. Raja Kumar et al.
9. M. Omar, F. Khelifi, M.A. Tahir, Detection and classification of retinal fundus images exudates using region based multiscale LBP texture approach. in International Conference on Control, Decision and Information Technologies (2016), pp. 227–232 10. U. Rajendra Acharya, C.K. Chua, E.Y. Ng, W. Yu, C. Chee, Application of higher order spectra for the identification of diabetic retinopathy stages. J. Med. Syst. 32(6), 481–488 (2008) 11. J. Nayak, P.S. Bhat, R. Acharya, C. Lim, M. Kagathi, automated identification of diabetic retinopathy stages using digital fundus images. J. Med. Syst. 32(2), 107–115 (2008) 12. R. Dhaya, Deep net model for detection of Covid-19 using radiographs based on ROC analysis. J. Innov. Image Process. (JIIP) 2(03), 135–140 (2020) 13. T. Vijayakumar, R. Vinothkanna,Retrieval of complex images using visual sa ‘ liency guided cognitive classification. J. Innov. Image Process. (JIIP) 2(02), 102–109 (2020) 14. V.C. Lima, G.C. Cavalieri, M.C. Lima, N.O. Nazario, G.C. Lima, Risk factors for diabetic retinopathy: a case– control study. Int. J. Retina Vitreous 2, 21 (2016) 15. M. Akter, M.H. Khan, Morphology-based exudates detection from color fundus images in diabetic retinopathy. in International Conference on Electrical Engineering and Information and Communication Technology (2014), pp. 1–4
Hybrid Level Fusion Schemes for Multimodal Biometric Authentication System Based on Matcher Performance S. Amritha Varshini and J. Aravinth
Abstract The performance of the multimodal system was improved by integrating both the physiological and behavioral characteristics of an individual. Usually, the fusion is carried out either in the score or feature level. Multimodal biometric system against the unimodal system is explained below. This system was considered in order to overcome several demerits that were found in the former system. The overall recognition rate of the biometric system was improved by implementing the multimodal systems. ECG, face, and fingerprint were integrated in the new level of fusion named hybrid fusion scheme. In the hybrid fusion scheme, the scores from the feature level fusion were fused along with the best unimodal system (ECG) by using score level fusion techniques. Feature vectors were obtained by processing the signal as well as the images obtained from the databases FVC2002/2004, Face94, and PhysioNet (MIT-BIH Arrythmia) after the process of feature extraction. Matching scores and individual accuracy were computed separately on each biometric trait. Since the matchers on these three biometric traits gave different values, matcher performancebased fusion technique was proposed on the specified traits. The two-level fusion scheme (score and feature) was carried out separately, for analyzing their performances with the hybrid scheme. The normalization of the scores was done by using overlap extrema-based min–max (OVEBAMM) technique. Further, the proposed technique was compared with Z-Score, tanh, and min–max by considering the same traits. The performance analysis of these traits with both unimodal and multimodal systems was done, and they were plotted using receiver operating characteristic (ROC) curve. The proposed hybrid fusion scheme has leveraged the best TPR, FPR, and EER rates as 0.99, 0.1, and 0.5, respectively, by using the normalization techniques with the weighted techniques like confidence-based weighting (CBW) method and mean extrema-based confidence weighting (MEBCW) method. Keywords Multimodal biometric system · Matcher performance · Hybrid level fusion · Weighted methods S. Amritha Varshini (B) · J. Aravinth Department of Electronics and Communication Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_35
431
432
S. Amritha Varshini and J. Aravinth
1 Introduction Biometric recognition systems are the technology that involves both physiological and behavioral traits of a person. These traits are given as the input, and the assurance of the person’s identity is confirmed by the system. These systems are used in almost all the fields. Hence, it is mandatory for securing the data. The three major stages of biometrics are as follows authentication or identification, verification, and authorization. Identification is performed with the process of matching. It confirms the identity of a person by comparing the images in the entire database with the scanned image. If the compared data matches with the scanned image for a percentage of about 85%, then the verification is successful and it authenticates for accessing the data. Biometrics is highly secured when compared to the conventional methods. The basic components of the biometric systems are as follows input data, processing unit (identification, verification, and authentication), and output data. Input data are acquired from the sensors. It can be either fingerprint or any other behavioral parameters (face, speech, iris, etc.). The processing unit includes various operations like image enhancement, normalization, feature extraction, and comparison of acquired data from the database. The general working of the system takes the biometric sample from an individual. The features are extracted, and it is then compared with the data that is already stored in the database. If the input sample matches with the data that are present in the storage, then it allows to access the resources. Multimodal biometric systems are designed using more than one trait like both physiological and behavioral. The inputs are acquired from multiple sensors, and they are processed. ECG biometric is advantageous when compared to other biometrical characteristics that are commonly involved. In this paper, face and fingerprints are interfaced along with ECG. Though these two are used commonly in recognition systems, it has several demerits and it is prone to many vulnerable activities. The major issues found in face recognition systems are 1. It is sensitive to artificial disguise 2. It has troubles with image quality and size 3. It is subjected to many false identifications 4. The influences of camera angle are more 5. Two-dimensional face recognition is commonly practiced in all the fields so it is considered to be insecure and it can be easily spoofed 6. People can be identified from the distance; it leads to privacy issues mostly. In the case of fingerprint, it can be recreated using latex. To overcome these disadvantages, fusion of ECG biometric along with face and fingerprint is suggested. When ECG is used as the biometric, many illegal activities can be easily avoided. Because it is completely unique to an individual, measuring of ECG from an individual is also not a tedious process, where it can be detected by using single-electrode system. Just like acquiring the verification data from an individual, ECG system also captures the data by placing the electrode on the surface of skin. ECG signal is considered to be heterogenous, because in the ECG there is a slight difference in their amplitude and time interval for every individual. The morphology of heartbeats also differs from person to person depending upon their body structure, living environment, and lifestyle. The only threat to ECG biometric is the presence
Hybrid Level Fusion Schemes for Multimodal …
433
of any cardiovascular diseases in the heart. Even then it works efficiently if the electrical transverse regions like sino-atrial node (SA node) and atrioventricular node (AV node) are not affected. The ECG biometric can be designed by measuring arrythmia or normal sinus rhythm. The level of fusion initiates the process of extraction from the raw data that are collected from the sensor level. In score level fusion, the matching scores are obtained from different matchers and it is easier to fuse the scores. But additionally, it requires various normalization or weighted techniques for enhancing the overall recognition rate. Similarly, in feature level fusion the performances of the matching techniques are not considered. In order to rectify these issues, a new level of fusion called hybrid fusion scheme is suggested in this the final scores from feature level fusion and the scores of best unimodal systems is combined by implementing score level fusion techniques. The weighted techniques are incorporated in the hybrid scheme, and hence, better accuracy rate for the proposed multimodal system is achieved using this method. The paper is arranged as follows, Sect. 2 deals with the literature survey that were executed previously related to the work. Section 3 gives the various methods that were used to process the data followed by the experimental results in Sects. 4 and 5 which gives the conclusion of the paper.
2 Related Works The authors “Aboshosha, Ashraf et al.” in their paper [1] “Score Level Fusion for Fingerprint, Iris and Face Biometrics” described the score level fusion using min– max normalization. The feature extraction of face and fingerprint was explained in detail. Face and fingerprint images were preprocessed. Minutiae algorithm was used in fingerprint for extracting the points. Whereas in face images local binary patterns (LBP) were implemented. Face images were divided into small cells, and the important features were extracted. The scores between 0 and 1 were obtained using min– max normalization technique. The technique has less accuracy, and it took more computational time. The authors “Bashar et al. and the authors “Abhishek et al.” in their paper [2–4] “ECG-based biometric authentication using multiscale descriptors: ECG-based biometric authentication” explained about ECG authentication system wherein the techniques like multiscale features extraction from ECG signals were implemented. In the preprocessing stage, nonlinear filters were used. Further, the filtered signal was divided into multiple segments. Mean classifier method was adopted by computing simple minimum distance. After determining the matching and non-matching factors, the accuracy rate of ECG biometric system was found out. This paper [5] by the authors “Mingxing He et al.” explains about the performance evaluation in the multimodal biometric system. Wherein the fusion of various traits like fingerprint, finger vein, and face were execution. The performances were validated for sum rule-based score level and support vector machine (SVM)-based classifiers. The new normalization scheme derived from min–max normalization technique was
434
S. Amritha Varshini and J. Aravinth
proposed in this paper. This normalization is exclusively for reducing high-score effect. Then the final results were compared between sum rule-based fusion and SVM-based fusion. The authors “Kabir et al.” described about the two-level fusion for multimodal biometric system in their paper [6–8]. The new techniques, namely matcher performance-based (MPb) fusion scheme and overlap extrema variationbased anchored min–max (OEVBAMM) normalization scheme, were proposed in the paper. The fusion was carried out in both score and feature levels, and the decision was made based on their performances. The major drawback of this paper is that the performances were indicated with respect to ROC and DET curves. The values of ROC are lesser than GAR, and the DET values were greater than FAR values. The fusion scheme was not described in dual or trilevel system. In the paper [9, 10] the authors gave the overview of multimodal biometrics, that includes the limitations of unimodal systems as well. He also explained in detail regarding the fusion strategies with various transformation or normalization techniques that can be incorporated. The papers [11, 12] from the authors “Rattani et al. and Aravinth et al.” explained about the techniques used to implement feature level fusion using the different biometric sources, namely face and fingerprint [13]. It gave a vital information about various techniques that were involved in the process of features compatibility and features reduction. The authors “Singh et al.” gave different techniques for fusing ECG with unobtrusive biometrics like face and fingerprint in their paper [12, 14]. This paper explains about the procedures involved in processing of ECG signals acquired from the online datasets. The first and foremost method involves ECG delineation. QRS complex delineator employs fiducial of QRS complex, i.e., QRS onset and QRS offset P wave delineation is calculated from QRS onset (beginning of heartbeat). Similarly, T wave is computed as follows QRSoffset + 80 ms to QRSoffset + 470 ms. The authors “Vishi et al.” explained about the various evaluation approaches in their paper [15]. They used score normalization and fusion techniques for the two modalities fingerprint and finger vein. The individual scores from the two sources were combined, and they were normalized using the traditional techniques, namely min–max, Z-score, and TanH. The best improvement rates were achieved up to 99.98%. The authors “Aravinth et al.” explained about the implementation of multimodal biometric systems for remote authentication purpose using multi-classifier-based score level fusion in their paper [12, 15].
3 Methodology See Fig. 1.
Hybrid Level Fusion Schemes for Multimodal …
435
Fig. 1 Block diagram of hybrid fusion-based multimodal biometric system
3.1 ECG Signal for Individual Authentication The acquired ECG signal undergoes several processes for assuring the quality of the signal to provide best recognition system. 1. Preprocessing of the signal is done in order to remove several artifacts like baseline wandering and noises. The noises were extracted from the signal using median filters. 2. Segmentation is the technique that is used for detecting the beats that are present in the signal. It is applied mainly to detect the R—peaks from the ECG signal. 3. Features are extracted from the signal for subsequent analysis of ECG. The features are described based on three different categories, namely (i) Morphology of RR peaks (ii) Interval, and (iii) Normalization of detected RR peaks. The morphology involves higher-order statistics (HOS) and QRS complex delineation. HOS gives the normal distribution of the signal in both the extreme ends with a center point. QRS complex delineation is done using very old algorithm called Pan–Tompkins algorithm. This repository consists of two different classes, namely QRS off and QRS on. QRS off is used for extracting the features from the signal. Euclidean distance is computed among the two different R peaks, and the remaining four points can be extracted from it by assigning the maximum and minimum intervals. The second category includes the intervals that are present in the signal. They are computed with respect to the R peaks, and they are as follows PQ, QR, RS, and ST and the last one is the normalized RR that provides the average value for the detected intervals. 4. Classification of the features is done using the machine learning algorithm called KNN. The individual scores and accuracy rate for
436
S. Amritha Varshini and J. Aravinth
the ECG biometric system are computed, and their receiver operating characteristic (ROC) for true positive and false positive values are plotted.
3.2 QRS Delineation QRS delineation is determined by using Pan–Tompkins algorithm. The raw signal is used as an input. According to the algorithm, the signal is processed in several stages, namely filtering and thresholding. The filtering is done using low-pass and high-pass filters, and this process is very much useful in removing the noises and other distortions present in the signal. The filtered signal is then differentiated to find the signal segments. Later then squaring and integration are done in order to make the signal more distinct. Hence, the integrated signal is further used for the peak detection purposes. The QRS complex is identified by adjusting the thresholds. The process of thresholding and filtering enables to detect the sensitivity and relatively false positive QRS complex. Two values, namely QRS on and QRS off, are assigned to detect the different waves in the signal.
3.3 Fingerprint Recognition The set of two databases was used for fingerprint authentication system, namely FVC2002 and FVC2004. Several image processing techniques were adopted for extracting the minutiae points. Image processing techniques, like histogram equalization and binarization, were carried out for removing the dots and for smoothening the edges. Fingerprint recognition includes various steps like preprocessing, feature extraction, post-processing, and matching. Preprocessing involves two different stages, namely histogram equalization and binarization, and these two techniques were executed to enhance the images. Binarization is used to convert the grayscale images into binary images. It is also used to indicate the number of ridges and furrows present in an image. Image segmentation is done in order to locate the region of interest (ROI) for extracting the minutiae points. Thinning and skeletonization were performed in the image for obtaining the minutiae points. The minutiae points were extracted by considering the two phenomena, namely bifurcation and termination. After extracting the minutiae points, the false points were removed in the postprocessing stage and the final stage is the fingerprint matching which was done using hamming distance and the matching scores were generated using the Flann-based matcher.
Hybrid Level Fusion Schemes for Multimodal …
437
3.4 Facial Recognition The face samples were collected from the database Face94. People having different facial expressions were taken. Among many images in the database, the top 150 images were taken and processed for finding its individual recognition accuracy. The set of features like eyes, nose, and lips was extracted from the image using haar cascade classifiers. The data transformation was done using principal component analysis (PCA). The eigenfaces were projected using the orthonormal basis, and the obtained eigenvectors from the analysis were compared in accordance with the Euclidean distance. The normalized images were used for the purpose of training using PCA algorithm. The face recognition with local binary patterns (LBP) was also performed in order improve the accuracy rate. The regions of face were first divided into smaller parts, and the histogram was computed. Later then, they were acquired in the form of vectors, which was extracted and concatenated. The regions were labeled based on the pixels of an image. 3*3 neighborhood pixel for each value is considered. The results were computed in both binary and decimal numbers. The matching scores computed using hamming and Flann-based matcher. The individual scores and accuracy were determined for face recognition, and the ROC is obtained against true positive and false positive values.
3.5 Score Level Fusion Technique Score level fusion is one of the common techniques that is used in case of the multimodal biometric system. It involves the matching scores. Because it is sufficient to determine the true positive and false positive values for the biometric system, these scores were acquired after the process of feature extraction and features matching. Initially, the features matching was executed separately for all the traits and the scores of unimodal systems are recorded. Further, the fusion is carried out by applying certain algorithms like sum and product rule and then the feature matching was carried out once again for the fused system. Since the matchers gave different values, the new system called “Matcher performance-based fusion (Mpb)” is performed, wherein the three biometric sources produced various set of values at the matching stage after the successful completion of feature extraction. The feature values from ECG, fingerprint, and face are denoted as x(k), y(k), and z(k), they were allotted these annotations with respect to the positions (x, y). The features encoding was carried out, and they were encoded as X xy (k), Y xy (k), and Z xy (k). The hamming distance was computed for all these features after encoding them. The matching of the features was for the set of feature values x(k), y(k), and z(k) to find its true positive (TP) and false positive (FP) values. The threshold was automatically fixed with respect to maximum TP values and minimum FP values. The EER, FPR, and TPR values were found out for the fused modality system using the below formula, where the values of TPR was higher when compared to FPR and EER.
438
S. Amritha Varshini and J. Aravinth
Threshold (Th) = max(TP) − min(FP)/K
(1)
where K represents the empirical parameter. The parameter K is used to control the variable threshold values. TPR =
TP TP + FN
(2)
FPR =
FP FP + TN
(3)
EER = TPR + FPR/2
(4)
3.6 Feature Level Fusion Technique In the case of feature level fusion, the extraction of features from the different modalities was carried out individually. Then the features were made compatible in order to perform feature concatenation as the features from three different biometric sources were varying in their dimensions. By performing features reduction and concatenation, the dimensions of the entire feature pointsets were reduced. The feature reduction was done separately for the traits, to make their dimensions equal. This process was executed prior to features pointsets fusion. The various dimensionality reduction techniques were implemented “neighborhood minutiae elimination” and were used in the case of fingerprint. “Points belonging to specific regions” were performed in face, and for ECG, “Neighborhood features elimination” were done. The reduced features were recorded in each category, and the fusion is done using the reduced feature pointsets. The fused features of the query images were matched with the features that were already stored in the database using the technique “point pattern matching” by Euclidean distance. This fusion process is none other than concatenating the features from various sources, where these features were said to have good differentiation compared to individual values. The TP and FP values were computed for the reduced pointsets in both the stage, i.e., before and after fusing. The TPR, FPR, and EER values were recorded.
3.7 Hybrid Level Fusion Technique By considering the several disadvantages in the above-mentioned fusion scheme, hybrid level fusion is suggested in case of multimodal biometric systems for improving its overall recognition rate. In score level fusion, the values are improved and the TPR is higher, but still the mismatch in the FPR and EER values along with
Hybrid Level Fusion Schemes for Multimodal …
439
TPR is prevailing. This is because, score level fusion requires additional transformation technique or weighted technique to give appropriate results. Similarly, in feature level fusion reduction in features is one of the major factors that reduce the TPR and increases the other two. Feature fusion can be considered for best authentication system in case if it includes the rate of features matching. Since it does not have the relevant information, the single level fusion scheme alone cannot be made reliable in the multimodal biometric system. The new level of fusion scheme can be used to overcome the limitations. Hybrid level fusion scheme includes the fusion of final scores from the feature level fusion and the score of best unimodal system. These two scores were combined by means of score level fusion. The results of hybrid level fusion were recorded. It was found that the TPR was higher and the mismatch among FPR and EER values was corrected. The performance of HBF scheme is compared with other two-level of fusion. The main aim of weighted hybrid level fusion was achieved by implementing confidence-based weighting (CBW) method and mean extrema-based confidence weighting (MEBCW) method. It was observed that the values were reduced comparatively in FPR and EER. The overall recognition rate was achieved using WHBF. The performances were compared with score level fusion under two cases, namely with and without involving weights.
3.8 Weighted Hybrid Level Fusion The improved recognition rate is degraded in the case of commonly used fusion techniques, due to dissimilar EER or TPR values. In order to provide better performance of the multimodal system, “Weighted rule-based fusion” is used. The two different weighted techniques, namely “Confidence-based weighting technique (CBW)” and “Mean Extreme-based confidence weighting technique (MEBCW),”are implemented. These techniques are applied on the fused system using the generated final scores. Confidence-based weighting technique (CBW)—In this technique, the highest weights were set to the system, resulted in best matching. The non-overlapping or false positive scores were considered and discarded. So, that the remaining true positive values gave better recognition rate. Mean extreme-based confidence weighting technique (MEBCW)—In the case of MEBCW method, it considers both overlapping and non-overlapping between the true positive and false positive values. Hence, the weighted techniques are applied in both score and hybrid level fusion, and their results are compared using the below formulae. To compute the confidence weights between the scores of feature level fusion (S 1 ) and the scores of best unimodal systems (S 2 ), two parameters were defined namely “a” and “b” and it can be represented as follows, a = [(max(FP) − mean (FP)) + (mean(TP) − min(TP))]
(5)
440
S. Amritha Varshini and J. Aravinth
b = [(max(TP) − mean(TP)) + (mean(FP) − min(FP))]
(6)
3.9 Overlap Extrema-Based Min–Max (OVEBAMM) Normalization Technique The suggested techniques for the ECG, fingerprint, and face are overlap extrema variation-based min–max normalization technique (OEVBAMM). From this technique, the normalized scores can be determined after executing the process of fusion. The normalization is carried out using the final fused scores. In this technique, only the overlapping scores in TP values are taken into account. As the first step, overlap extrema variation is calculated. Let us assume the anchor value “A” and the equation is as follows, A = Max(FPR)− Min(TPR)/Std(FPR)− Std(TPR)
(7)
Equation (7) shows the difference between the maximum value of FAR and the minimum value of TPR, and in the denominator, standard deviation is determined for the values and the anchor value is generated. With these values, it is very much easier to find the normalized scores for the fusion technique under two cases. (i)
When (S1) ≤ A S = S1 −
(ii)
Min(FPR, TPR) 2(A) − Min(FPR, TPR)
(8)
When (S1) > A S = 0.5 + (S1) −
A Max(TPR, FPR) − A
(9)
where S1 represents the scores from the fused system. In Eq. (9), 0.5 represents the threshold value that has been assigned. Hence, Eqs. (8) and (9) are used for generating the normalized scores (S) by employing OVEBAMM technique.
Hybrid Level Fusion Schemes for Multimodal …
441
4 Results and Discussions 4.1 Dataset Preparation The datasets used in the experiment are PhysioNet, Face 94, FVC2002/2004. In face & fingerprint, 80 users were selected. The features were extracted, and the samples were trained and tested. In the sample data, five of the images were training data and the remaining three were the testing data. All these databases are independent from each other because there is no common database contains “fingerprint and face” biometrics of the same person. Hence, the labeling of these databases was done and the custom dictionary is created. ECG recordings were extracted from MITBIH Arrhythmia Database and European ST-T database from Physionnet.com. ECG recordings were taken from healthy persons (Men Age Groups 20–45, Female Age Groups 20–50). The sampled frequency of the signal was about 128 Hz. Lead I ECG recordings were used in the experiments. Both raw and filtered signal were used, and they were represented in “mV.” The maximum duration of the signal taken was about 1 min. The first 40 s was considered to be the training data, and the remaining 20 s was the testing data. Each recording was sampled at 256 samples bits/seconds.
4.2 Feature Extraction Figure 2 shows the QRS delineation that was executed using Pan–Tompkins algorithm. QRS complex is shown. It also detects P, Q, R, S, and T waves. Figure 3 shows the detected R peaks that were derived after applying the algorithm in the input signal. Fig. 2 Extraction of QRS complex
442
S. Amritha Varshini and J. Aravinth
Fig. 3 Detection of R peaks
Fig. 4 Original image
Figure 4 shows the original image, and Fig. 5 is the minutiae points extracted image. The points were marked in red. The major points considered are termination and bifurcation. Figure 6 shows the set of features like eyes, nose, and lips were detected using haar cascade classifiers, and they were extracted for the training data and these were the features used for finding the matching scores.
4.3 Performance Analysis Score Level Fusion See Fig. 7.
Hybrid Level Fusion Schemes for Multimodal …
443
Fig. 5 Minutiae points extraction
Fig. 6 Face–feature extraction
The performance analysis of the system in single trait as well as the combination of two traits was analyzed. The traits were fused by implementing “Matcher performance-based fusion scheme.’ Because of this method, better performance was achieved. It was evaluated using ROC curve analysis. The highest rate of 0.99 was obtained in this designed system. The FPR and EER were 0.34 and 0.66. False positive rates were reduced gradually, and the desired rates were acquired (Table 1).
444
S. Amritha Varshini and J. Aravinth
Fig. 7 Performance analysis of face, ECG, and fingerprint employing matcher performance-based fusion scheme–score level fusion
Table 1 Performance evaluation of score level fusion techniques
Recognition system
FPR
TPR
EER
Fingerprint
0.64
0.95
0.79
Face
0.67
0.94
0.80
ECG
0.63
0.96
0.79
ECG + Fingerprint
0.42
0.98
0.7
ECG + Face
0.50
0.97
0.73
Face + Fingerprint
0.57
0.95
0.76
ECG + Face + Fingerprint
0.34
0.99
0.66
Feature Level Fusion See Fig. 8. Due to the issues in dimensions, the feature reduction techniques were used in all the three modalities individually. The fusion techniques were carried out with the set of reduced features. After analysing the feature fusion in the combination of two traits, the fusion of ECG, face, and fingerprint is performed. The performance analysis of feature level fusion was analyzed using ROC curve. It was found that the FPR and EER values were reduced. The TPR values were also less when compared with the previous fusion state. It may be because of feature elimination process. The values were TPR 0.97, FPR 0.28, and EER 0.62 (Table 2). Hybrid Level Fusion See Fig. 9 and Table 3.
Hybrid Level Fusion Schemes for Multimodal …
445
Fig. 8 Performance analysis of face, ECG, and fingerprint employing matcher performance-based fusion scheme–feature level fusion
Table 2 Performance evaluation of feature level fusion techniques
Recognition system
FPR
TPR
EER
Fingerprint
0.54
0.95
0.74
Face
0.46
0.94
0.7
ECG
0.29
0.97
0.63
Fingerprint + Face
0.62
0.88
0.75
Fingerprint + ECG
0.46
0.91
0.68
ECG + Face
0.41
0.93
0.67
ECG + Face + Fingerprint
0.28
0.97
0.62
The scores of feature level and the score of best unimodal system were fused using score level techniques. As observed ECG was the best system that gave very high rates. Hence by fusing, the scores were TPR 0.99, FPR 0.13, and EER 0.56. It gave the results without any dissimilarities. It was observed that both the FPR and EER rates were reduced, and the better TPR rate was obtained.
5 Conclusion The fusion technique named “Hybrid Fusion Scheme” was proposed for the biometric sources’ ECG, face, and fingerprint. This technique was executed by the final scores of feature level fusion for the same specified traits. The scores were FPR 0.28, TPR 0.97, and EER 0.62. These scores were fused along with the scores of best unimodal systems by implementing score level fusion techniques. By completed analysis, ECG
446
S. Amritha Varshini and J. Aravinth
Fig. 9 Performance analysis of matcher performance-based fusion scheme–hybrid level fusion
Table 3 Performance evaluation of hybrid level fusion techniques
Recognition system
FPR
TPR
EER
(ECG + Feature Level Fusion)
0.13
0.99
0.56
was found be the best single modality system among the three and the scores were FPR 0.29, TPR 0.97, and EER 0.63. After combining it was found that the scores were reduced ultimately in the case of FPR and EER, and they were FPR 0.13, TPR 0.99, and EER 0.56. Our aim providing reduced FPR and EER for the authentication system was achieved. In order to stabilize the system and to establish the accuracy rates of the designed system, weighted techniques like confidence-based weighting technique (CBW) and mean extreme-based confidence weighting technique (MEBCW) have been incorporated with the proposed hybrid fusion scheme. Hence, it was observed that system produced very low EER and FPR in the case of weighted hybrid fusion scheme. The scores were FPR 0.1, TPR 0.99, and EER 0.5. The dissimilarities among the EER and TPR values were commonly found in the case of other fusion schemes, but it was not found in hybrid level of fusion with weighting techniques. Hence, it is an added advantage of the proposed method. Thus, it was proved that if the TPR is high, then the system is considered as the best authentication system.
References 1. A. Aboshosha, K.A. El Dahshan, E.A. Karam, E.A. Ebeid, Score level fusion for fingerprint, iris and face biometrics. Int. J. Comput. Appl. 111(4), 47–55 (2015) 2. M. Be, A.M. Abhishek, T.J, V. K R, L.M. Patnaik, Multimodal biometric authentication using ECG and fingerprint. Int. J. Comput. Appl. 111(13), 33–39 (2015)
Hybrid Level Fusion Schemes for Multimodal …
447
3. M. He et al., Performance evaluation of score level fusion in multimodal biometric systems. Pattern Recognit. 43(5), 1789–1800 (2010) 4. W. Kabir, M.O. Ahmad, M.N.S. Swamy, A multi-biometric system based on feature and score level fusions. IEEE Access 7, 59437–59450 (2019) 5. S.A. Israel, W.T. Scruggs, W.J. Worek, J.M. Irvine, Fusing face and ECG for personal identification. in Proceedings–Application Imaging Pattern Recognition Working. vol. 2003. (2004) , pp. 226–231 6. W. Kabir, S. Member, M.O. Ahmad, M.N.S. Swamy, Weighted hybrid fusion for multimodal biometric recognition system. (2018), pp. 3–6 7. A. Rattani, D.R. Kisku, M. Bicego, M. Tistarelli, Feature level fusion of face and fingerprint biometrics. (2007), pp. 0–5 8. Y.N. Singh, S.K. Singh, P. Gupta, Fusion of electrocardiogram with unobtrusive biometrics: an efficient individual authentication system. Pattern Recognit. Lett. 33(14), 1932–1941 (2012) 9. S. Udhary, R. Nath, A multimodal biometric recognition system based on fusion of palmprint, fingerprint and face. in 2009 International Conference Advanced Recent Technology Communications Computing (2009), pp. 596–600 10. K. Vishi, V. Mavroeidis, An evaluation of score level fusion approaches for fingerprint and finger-vein biometrics. no. Nisk (2017) 11. J. Aravinth, S. Valarmathy, Multi classifier-based score level fusion of multi-modal biometric recognition and its application to remote biometrics authentication. Imaging Sci. J. 64(1) (2016) 12. S. Prasad, J. Aravinth, Serial multimodal framework for enhancing user convenience using dimensionality reduction technique. in Proceedings of IEEE International Conference on Circuit, Power and Computing Technologies, ICCPCT (2016), art. no. 7530162. https://doi. org/10.1109/ICCPCT.2016.7530162 13. P. Sharma, K. Singh, Multimodal biometric system fusion using fingerprint and face with fuzzy logic. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 7(5), 482–489 (2017) 14. S. Veni, S. Thushara, Multimodal approach to emotion recognition for enhancing human machine interaction—a survey. Int. J. Adv. Sci. Eng. Inf. Technol. 7(4), 1428–1433 (2017) 15. B. Saiharsha, A. Abel Lesle, B. Diwakar, R. Karthika, M. Ganesan,Evaluating performance of deep learning architectures for image classification. 2020 5th International Conference on Communication and Electronics Systems (ICCES), Coimbatore, India, (2020), pp. 917–922. https://doi.org/10.1109/ICCES48766.2020.9137884
Evolutionary Computation of Facial Composites for Suspect Identification in Forensic Sciences Vijay A. Kanade
Abstract Suspect identification is an integral and significant part of forensic sciences. The detection of face is an important issue when the face of a suspect is described by a witness. The article aims to simplify the process of creating facial composites and thereby generate the images of facial likeness by utilizing an advanced genetic algorithm. The disclosed procedure is comfortable for a forensic technician and natural for a witness at the same time. The proposed genetic algorithm evolves from the original algorithm by employing a feedback-loop mechanism, wherein the most optimal solution is obtained for the generated facial composite. Keywords Facial composite · Genetic algorithm (GA) · Suspect identification · Interactive evolution strategy
1 Introduction Evolutionary algorithms (EAs) are nature-inspired problem-solving methods that help in the optimization of the associated solutions. In the current research, the genetic algorithms are used for designing a computer-based automatic face generation platform. As it is well known in forensic sciences, suspect identification forms the most significant part of any forensic investigations. Such investigations generally depend on the witness’s description of the suspect. Currently, there are various techniques and algorithms of facial composite generation and face recognition that assists in visualizing a suspect’s face [1]. Today, ‘facial recognition technology’ helps law enforcement officials and police departments identify an individual of interest by scanning the available photos or video footages (Fig. 1). In the above process, the ‘extracting step’ is of great importance where the facial recognition is performed using genetic algorithms to track the facial data points and develop a face structure that matches the data residing in the database [2]. Hence, V. A. Kanade (B) Researcher, Pune, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_36
449
450
V. A. Kanade
Fig. 1 Facial recognition technology [7]
face recognition technology essentially requires stored data to perform the matching step of facial recognition. Further, with today’s technological advancements, even the criminals have advanced their modus operand, wherein the suspected criminals rely on using masks that can trick the face recognition algorithms embedded in smart cameras, or change their walking posture such that it eludes person identification. In such cases, the role of the witness becomes crucial. The proposed technique is inspired by the face generation tool EvoFIT; however, the proposed evolutionary method fine-tunes the results generated by previously known tools [1]. This optimization is achieved by evolving the results in a manner that simplifies the work of the involved technician or personnel and makes it easier for the witness to command the developed platform. Besides, our approach is independent of any stored data as opposed to facial recognition technology.
Evolutionary Computation of Facial Composites for Suspect …
451
2 Genetic Algorithm for Facial Composite Generation Genetic algorithms fall under the roof of evolutionary algorithms [3, 4]. A population is defined by a chromosome set where each chromosome symbolizes a potential solution of the algorithm. Chromosomes have numeric values that are identified as “genes.” The genetic algorithm begins with a random generation of the initial population. Further, the chromosomes undergo an iterative evolutionary process, which involves crossover, mutation, and replacement. On completion of each iteration, the initial population is replaced by the fittest population, and the feedback loop is taken into consideration which further optimizes the results in its second iteration. And the process of feedback-based optimization continues until the most appropriate facial composite structure is generated by the developed tool. The evolutionary process designed in this research consists of the following primary operators applied on a population size ‘N’, along with chromosome length denoted by ‘n’: 1.
2.
3.
4.
Selection: In the selection step, ‘N’ chromosome pairs are randomly selected from the parent population. Crossover: For each selected pair, two cross-points are selected randomly in the range [0, n]. Every gene identified between these points is switched with probability ‘pc’. Mutation: The swap mutation is used to obtain the gene values of each offspring chromosome which are swapped and changed with the probability ‘pm’. Replacement: In the final step, the previous population is replaced by the offspring in case the fitness value of the offspring is optimal than the parent population or vice versa. [5, 6].
2.1 Evolving Genetic Algorithm: Implementation The proposed genetic algorithm (GA) differs from traditional GAs in fitness evaluation and feedback-loop-based image optimization, wherein the users get involved in the algorithm. Such an arrangement helps the witness to visualize the suspect’s face and refine it without causing any burden on the technician involved as the algorithm operates on the evolutionary process. Here, as the witness tries to recollect the suspect’s face, the technician plots facial data points on the developed tool. Upon plotting the data points, the GA runs its course, and an optimal result is generated for the plotted data points. The optimal result is based on the fitness function, which is defined as a measure of the minimal distance between the plotted data points. The fitness represents the similarity of individual faces concerning the face structure described by the witness. As the results
452
V. A. Kanade
are generated, the primary face structure of the suspect is displayed to the witness for feedback. Based on the face generated, the witness further describes the facial structure of the suspect, and the additional data points are further plotted by the technician. This leads to refinement and fine-tunning of the facial composite generation process. Thus, an appropriate face is thereby generated by applying evolving computation to the traditionally known GAs. Pseudocode: Random initial population generation Repeat Fitness evaluation of individuals Reproduction Pair selection Crossover: Recombine pairs Mutation: Swap mutation Replace the initial population Feedback loop Run until an optimal solution is produced.
2.2 Experiments and Preliminary Results The platform is developed in Visual Studio 2017 by using C# language. In the first step, facial data points are plotted on the developed UI based on the description of the suspect given by the witness. On plotting the data points, GA is executed to produce an optimal result, wherein the basic facial structure of the suspect is generated from the plotted data points (Fig. 2). Further, in the next step, the generated facial structure is shown to the witness, and feedback is taken i to fine-tune the facial generation process. The witness may then provide additional input by describing the suspect further. The technician may then further plot the data point based on the witness’s description. The feedback from the witness further refines and molds the generated facial structure (Fig. 3). The above process continues until ‘m’ iterations based on the feedback input provided by the witness. Thus, the final generated face may potentially be that of the suspect as described by the witness (Fig. 4). Similar to the above case, the results are obtained using the evolved GA for five different facial composites separately, to determine the efficiency of the developed evolutionary computation in comparison to the standard GA. The obtained results for each facial composite are given in the table below: Initially, the facial data points (i.e., 12 in first case) are plotted for facial composite generation by the technician on the developed UI based on the description of the suspect provided by the witness. In the next step, the conventional GA is executed
Evolutionary Computation of Facial Composites for Suspect …
453
Fig. 2 Snapshot I: Primary facial structure generation based on ‘15 facial data points’
based on the fitness function, thereby yielding optimal fitness results (820.315 units) for corresponding plotted facial data points (12). In the next step, the optimized result produced during first iteration of GA undergoes evolution (selection, crossover, and mutation) for ‘m’ iterations to further yield the most optimized fitness result (735.896 units) for the respective facial data points (12). On obtaining the final optimized result, the facial composite is finally generated and displayed to the witness for further feedback. Thereby, the process continues until the final facial composite is accurately generated. Thus, the above table shows that the first facial composite where 12 facial data points were plotted, yielded the optimal fitness result of 735.896 units on employing the designed evolved GA, whereas simple GA yielded 820.315 units for the same facial composite having 12 facial data points. It has been identified that the fitness result of evolved GA generates a more accurate facial composite than when simple GA is used. Furthermore, for the next four facial composites generated via evolved GA process, 15, 9, 20, and 18 facial data points were potted, and the optimized fitness
454
V. A. Kanade
Fig. 3 Snapshot II: Facial optimization based on feedback loop with additional data (17 facial data points)
results obtained were 1053.774 units, 644.082 units, 1480.207 units, and 1144.917 units, respectively. Table 1 therefore validates the efficiency of the developed evolutionary GA as compared to the simple GA, for the plotted facial data points (corresponding to the generated facial composites), in the context of their fitness evaluation (i.e., results). The evolved GA also helps in generating better and accurate facial composites than when traditional GA is used.
3 Conclusion In this research, a face generation tool is designed to optimize the face generation based on the feedback loop. The tool uses GA for face generation and further incorporates the inputs from the witness to evolve the results further. Hence, the developed
Evolutionary Computation of Facial Composites for Suspect …
455
Fig. 4 Snapshot III: Optimized face after ‘m’ iterations (20 facial data points) Table 1 Evolved GA results Sr. No.
Facial data points for facial composite generation
Fitness evaluation G.A (1st iteration) (units)
E.A (‘m’ iterations) [Optimal result] (units)
1
12
820.315
735.896
2
15
1099.583
1053.774
3
9
717.845
644.082
4
20
1537.701
1480.207
5
18
1252.584
1144.917
456
V. A. Kanade
platform explores the interactive evolutionary algorithm for optimal face generation of the suspect. The tool is useful in criminal investigations that heavily rely on the descriptions provided by the witnesses.
4 Future Work The preliminary results of the conducted experiments are encouraging; however, in the future, we intend to widen the data set and validate the face generation process. This includes considering facial properties such as size, shape, hairstyle, beard, and aging factor Acknowledgements I would like to extend my sincere gratitude to Dr. A. S. Kanade for his relentless support during my research work. Conflict of Interest: The authors declare that they have no conflict of interest.
References 1. T. Akbal, G.N. Demir, E. Kanlikilicer, M.C. Kus, F.H. Ulu, Interactive nature-inspired heuristics for automatic facial composite generation. in Genetic and Evolutionary Computation Conference Undergraduate Student Workshop (2006) 2. Q. Qing, E.C.C. Tsang, Face recognition using genetic algorithm, 05 December (2014) 3. A.E. Eiben, J.E. Smith, in Introduction to Evolutionary Computing, (Springer, 2003) 4. A.E. Eiben, M. Schoenauer, Evolutionary computing. Info. Process. Lett. 82(1), 1–6 (2002) 5. B. Zahradnikova, S. Duchovicova, P. Schreiber, Generating facial composites from principal components. in MATEC Web of Conferences, EDP Sciences (2016) 6. G.N. Demir, Interactive genetic algorithms for facial composite generation. in GECCO’06, July 8–12, (Seattle, WA, USA (ACM), 2004) 7. The Future of Face Recognition (2017)
Automatic Recognition of Helmetless Bike Rider License Plate Using Deep Learning K. V. L. Keerthi, V. Krishna Teja, P. N. R. L Chandra Sekhar, and T. N. Shankar
Abstract Traffic incidents have risen exponentially in the recent years, and most of the people have suffered a severe or dead injury due to the practice of motorbike helmet-free driving. Even though the government has announced it as a punishable crime for not wearing a helmet, it has been violated at many places. Especially, when the traffic is heavy, the traffic constable cannot apprehend all the violators at once. This research work has proposed a model that can automatically identify the bike rider without a helmet and retrieves the bike owner’s information by recognizing the license plate. The TensorFlow object detection API is used to identify objects from video frames. The proposed model is trained by using faster R-CNN to recognize bike riders without a helmet. Then, the license plate is recognized using the Tesseract OCR engine and extract owner’s information. The experimental results of the model show superior performance with state of the art. Keywords Image processing · Helmet detection · Faster R-CNN · Optical character recognition · License plate recognition
1 Introduction Road injuries are a significant cause of concern nationwide. Lakhs of road accident cases are reported each year. India has about 1% of the world’s vehicle population and is responsible for around 6% of global road accidents. Around 70% of injuries include [1, 2] young people. It is a fact that about four people die every hour from road accidents because they do not wear a helmet when riding motorbikes. Despite its severity, governments have made it a legal crime not to wear a helmet while riding a motorcycle and have introduced various methods to capture the violators [3]. K. V. L. Keerthi · V. Krishna Teja · P. N. R. L. Chandra Sekhar (B) Gandhi Institute of Technology and Management, Visakhapatnam, Andra Pradesh, India e-mail: [email protected] T. N. Shankar Koneru Lakshmaiah Educational Foundation, Vaddeswaram, Guntur, Andhra Pradesh, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_37
457
458
K. V. L. Keerthi et al.
Nevertheless, the current techniques are not successful because they require considerable human intervention. In identifying violators, the captured images gave to the model through video monitoring or smartphone photographs taken manually at traffic signals [4, 5]. Therefore, automation is important for accurate and efficient identification of violators and identifying their license plate number to get their information. It then significantly decreases human involvement [6, 7]. However, identifying a bike rider without a helmet and number plate brings many challenges. (i) It requires very high-quality video and proper alignment of camera angle for detecting number plate accurately, (ii) capture the vehicle in motion and any weather condition (iii) requires a large amount of information processed in a short time, (iv) the region of interest objects that occur with other objects and classification is more complicated. In keeping with these difficulties, a framework for identifying bike riders without a helmet and recognizing their license plate is proposed to obtain the vehicle’s owner information. The rest of the paper organizes as follows: Sect. 2 describes existing work done to address this problem; Sect. 3 describes the proposed methodology, Sect. 4 gives the experimental setup and results, and the last section summarizes the paper.
2 Existing Work Over the years, several techniques have been proposed to address this problem, and each technique uses a different approach to obtain better results. In video surveillance and character recognition, automatic identification of bike riders number plate falls on the category of anomaly detection. Active surveillance systems usually consist of modeling, identification, monitoring, and classification of moveable objects [8]. In [9], Silva et al. proposed a methodology for a computational vision of helmet detection on public highways. They used the AGMM algorithm to segment bikes and the HOG features to classify the helmet and achieved 91.37% accuracy. However, the technique is computationally expansive due to Hough transform to locate the bike riders head. In [10], Chiverton uses the helmet’s geometrical structure and the difference of illumination at various portions to detect the helmet. They use circle arc detection and Hough transform. This technique’s drawback is often computationally expansive due to the helmet’s full-frame position and sometimes classify similarly shaped items. To minimize computational complexity in the classifier [11], Rattapoom et al. proposed to detect helmet by K-nearest neighbor (KNN). They extracted circularity, average strength, and average hues from each helmet quadrant and fed to classifier KNN. The device displays measured wearing of the helmet at a rate of 74%. To improve the classification accuracy in [4, 12], a hybrid method is proposed for determining bike rider by extracting HoG, SIFT, and LBP features and classify the region around the bike rider’s head to determine the helmet. SVM classifier is used
Automatic Recognition of Helmetless Bike Rider License …
459
to determine whether or not the bike rider wears a helmet. With a processing time of 11.58 ms per frame, they achieved 93.80% detection accuracy. Deep learning algorithms such as convolution neural networks (CNN) have recently taken helmet detection and are considered comparably better than conventional methods. Faster R-CNN developed by Microsoft accelerates bikes’ detection and recognition rate compared to other deep learning networks such as CNN, fast R-CNN, and YOLO. In [13], the authors propose two convolutional neural networks (CNNs) to classify whether the bike rider wears the helmet or not. One CNN is used to classify the frame artifacts into motorcyclists and non-motorcyclists, and another CNN is used to classify for the helmet. In [14–16], the authors classified the helmet, then recognized license plate for those without the helmet and passed to the OCR to identify the registered vehicle number with optical character recognition. This approach obtained more than 91% precision to locate the helmet and achieved 97% and 94% accuracy, respectively, for identifying alphabets and numbers. In summary, the methods discussed above have two drawbacks: They are either computationally expansive or passive, which implies that they detect only the helmet but do not identify the number plate. The proposed approach overcomes the drawbacks mentioned above and recognizes the number plate of helmetless bike riders automatically.
3 Proposed Method This section presents the proposed method of recognizing a helmetless bike rider license plate, which performs in five stages. In the first stage, the input frame is segmented and detects bike objects. Then, in the second stage, using the head of the bike rider, it detects whether he is wearing a helmet or not. In the third stage, the bike rider’s object without a helmet is fed to the model to detect the license plate. The license plate recognizes in the 4th stage and extracts the bike owner’s information in the final stage. The block diagram in Fig. 1 describes the proposed method where the input is a still video frame and passes through the object detection model to detect the bike, head, helmet, and number plate.
3.1 Bike Detection To detect the bike in the input frame, Faster R-CNN model is adopted to detect objects and labeled as bikes from various objects. Faster R-CNN extracts features from the Region of Interest (RoI)—bike in this case—and uses a single model for extracting features, classifying, and returning the bounding boxes.
460
K. V. L. Keerthi et al.
Fig. 1 The proposed method
3.2 Helmet Detection After the bike is detected, the objected is passed to Faster R-CNN to classify further whether there is a rider. VGG-16 and Faster R-CNN are used as a pre-trained model to distinguish persons, horses, and chairs to identify the rider’s head. The upper part of the object is considered as a region of interest while training the Faster R-CNN. Gabor filter is used over the occlusions, confirming the stability and orientation in representing facial segments by that even in different environmental conditions the head can also detect. The frame is then passed to the classifier to detect whether the head is with a helmet. The input frame without a helmet passes further to extract the number plate.
3.3 Detection and Recognition of License Plate Those frames classified as head without a helmet are further processed to detect the number plate using the Faster R-CNN model. Upon detecting the license plate, using the Tesseract optical character recognition model recognizes those characters in the license plate. The license number is then searched in the database to find the owner’s details to further action.
Automatic Recognition of Helmetless Bike Rider License …
461
4 Experimental Setup and Results The proposed method performs multiple tasks to detect helmetless bike riders and recognizes their license plate. The images are trained using the Faster R-CNN-InceptionV2 model, which uses mean average precision (MAP) to get the highest accuracy with TensorFlow object detection API. For better performance, the TensorFlow CPU Configuration is used. After training the model, the bike rider’s license plate who wears the helmet is extracted and recognized using the Tesseract OCR model, which takes an image and returns the string detected from the image. The resultant string is the license plate number and is used to extract the registered bike owner’s information. The model was implemented on the Windows operating system with Core i37100U processor 2.40 GHz and used Python 3.6.1, TensorFlow 1.15.0, and Tesseract 5.0.0.
4.1 Helmet and License Plate Detection For training, a dataset of 500 still images is taken manually from video surveillance where 250 images are bike rider with helmet and the rest are bike rider without a helmet. Initially, using OpenCV, the video frames are converted into frames using video capture with fps =0.5 s and the collected raw images are preprocessed, such as data formatting, data cleansing, and converted into feature transformation. The resultant optimal data are fed into the model. After data preprocessing, the images are labeled with the help of the labeling tool. These labeled images are used for training the model. The images are divided into training and testing in the ratio of 80:20. Figure 2 shows the sample dataset for helmet detection and recognition of the license plate.
Fig. 2 Sample inputs frames extracted from video surveillance
462
K. V. L. Keerthi et al.
Four regions of interest are identified, and four labels are created : bike, helmet, head, and license plates. The labeled image data are stored in the XML format to convert them into the CSV file, further storing all the information about the labeled objects’ coordinates. After this, TF records are created to serve as input data for the object detector training. Then, a label map is created as a training configuration file before training the image on the model. Faster R-CNN-Inception V3 model has been used for training in 2072 steps and in the last step exported the inference graph to run the model. After trained the model, tested the output for the test images given initially during the test directory. Figure 3 shows the experimental results of the Faster R-CNN model in detecting with helmet and without a helmet on the head of a bike rider. In (a), the head is recognized with 93% accuracy from the back angle and the bike’s license plate with 99%. In (b), head is recognized with 99% accuracy from the front angle as facial matching will help the model, and the license plate recognizes with 99% accuracy. Both a & b are the frames without a helmet. In (c), the helmet is recognized with 74% accuracy from the back angle and a license plate with 91% accuracy. The downfall in the accuracy is due to the long gap between the object and the camera. In (d), the helmet is recognized with 98% accuracy from the front angle and a license plate with 99% accuracy. Accuracy, in this case, increased compared to case 3 because of the camera’s front angle and nearness by which it can quickly identify as a helmet. Table 1 and Fig. 4 present the results of the experiments for the classification with bike and helmet from the backside and front side, with helmet and without helmet using the proposed Faster R-CNN method with the existing CNN method as its performance is highest among all other methods presented [13]. Faster R-CNN gives an accuracy of 99%, which is superior to the existing CNN method on the created dataset. The results show that Faster R-CNN gives better accuracy when the front side of video frames are used. There is a slight oscillation of performance is observed when the frames are captured from the backside. As the distance from the camera is increasing, the accuracy rate is decreasing. Whereas, the Gabor-Wavelet filters helped the model to gain better accuracy from the front side. In summary, for still images, the success rate of the model is 90% and for real time it is in the range of 70–80%. From the results it is evident that the model is best suitable for less traffic. The TensorBoard tool is used to evaluate the model. The TensorBoard evaluation tool provides metrics such as precision, recall, loss. The threshold score is computed based on the precision and recall by which the model removes the unnecessary boxes. In this model, the default threshold score is set as 0.5. Figure 5a represents the precision graph to measure how much accuracy the model predicts objects correctly and is usually in the range of 0–1. The proposed model obtains a precision value of 0.55, and it also observes over time, as precision increases on the increase in the number of steps. Figure 5b shows a visualization graph of recall, which measures how much accuracy the model will find all positives, and it is also between 0 and 1. The proposed model obtained a recall of 0.6. Figure 5c shows a visualization of
Automatic Recognition of Helmetless Bike Rider License …
463
Fig. 3 Detection of bike, head, helmet, and number plate both front and rear Table 1 Accuracy comparison Method Bike from back CNN 72 Proposed method 69
Bike from front
Helmet from back Helment from front
98.88 99
72 74
87.11 99
464
K. V. L. Keerthi et al.
Fig. 4 Detection accuracy graph
loss, which is a measure of the difference between a model’s predicted output and the ground truth label. The total loss obtained in the proposed model is 0.35. From Table 2, it is evident that the proposed method using Faster R-CNN performs much superior to the existing CNN method.
4.2 Detection of License Plate For character recognition, the Tesseract OCR engine is used. The Tesseract takes the input of cropped image of license plate in bike rider without helmet case. Tesseract needs a high-quality image as an input, and it cannot recognize the text from a rotated image. So, it is necessary to rotate the cropped license plate image before passing as input. In this model, the images are rotated manually for better accuracy. In Fig. 6a, all the alphabets and numbers are recognized correctly. In Fig. 6b, all the string literals except ‘3’ is recognized correctly. Here, 3 is identified as ‘S’. In Fig. 6c, all the string literals except ‘C’ are recognized correctly. Here, ‘C’ is identified as a special character. The Tesseract engine identifies the characters based on image constraints like lighting, structure, and clarity. Overall the model achieves 88% accuracy on the test dataset. The recognized characters are used to extract the bike owner’s information to take further action by the concerned.
Automatic Recognition of Helmetless Bike Rider License …
Fig. 5 Evaluation metrics of the proposed method: a precision, b recall, c loss
465
466
K. V. L. Keerthi et al.
Fig. 5 (continued) Table 2 Comparison of evaluation metrics Method Precision CNN Proposed method
0.51 0.55
Recall
Loss
0.65 0.6
0.42 0.35
Fig. 6 Results of Tesseract OCR Engine
5 Conclusion This paper proposes a model to automatically detect bike riders who do not wear a helmet and recognize their license plate to facilitate traffic authorities to take further action. Faster R-CNN is used with TensorFlow object detection API to detect bike rider without a helmet and then detect its license plate. Using the Tesseract optical character recognition tool, the characters are extracted on the license plate and extracted the owner’s details from the database. The experimental results reveal
Automatic Recognition of Helmetless Bike Rider License …
467
the best performance of the model with the existing CNN model. This model can be extended for night time surveillance and trained with more real-time data for detecting in dynamic scenes and is considered future work.
References 1. Y.S. Malik, Road accidents in India 2017, in Transport Research Wing (Ministry of Road Transport & Highways) 2. P. Mishra, P. Mishra, Vital stats-road accident in India, in PRS Legislative Research (Institute for Policy Research Studies, New Delhi) 3. R. Thakur, M. Manoria, RBFNN approach for recognizing Indian license plate. Int. J. Comput. Sci. Network (IJCSN) 1(5) (2012). http://www.ijcsn.org. ISSN 2277-5420 4. K. Dahiya, D. Singh, C.K. Mohan, Automatic detection of bikeriders without helmet using surveillance videos in real-time, in International Joint Conference Neural Networks (IJCNN), Vancouver, Canada, July 24–29, 2016, pp. 3046–3051 5. M.H. Dashtban, Z. Dashtban, H. Bevrani, A novel approach for vehicle license plate localization and recognition. Int. J. Comput. Appl. (0975 - 8887) 26(11) (2011) 6. S. Manoharan, An improved safety algorithm for artificial intelligence enabled processors in self driving cars. J. Artif. Intell. 1(02), 95–104 (2019) 7. M.H.J.D. Koresh, J. Deva, Computer vision based traffic sign sensing for smart transport. J. Innov. Image Process. (JIIP) 1(01), 11–19 (2019) 8. W. Hu, T. Tan, L. Wang, S. Maybank, A survey on visual surveillance of object motion and behaviors. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 34(3), 334–352 (2004) 9. R. Silva, K. Aires, T. Santos, K. Abdala, R. Veras, A. Soares, Automatic detection of motorcyclists without helmet, in Proceedings of Latin American Computing Conference (CLEI), Puerto Azul, Venezuela, October 4-6, 2013, pp. 1–7 10. J. Chiverton, Helmet presence classification with motorcycle detection and tracking. Intell. Transp. Syst. (IET) 6(3), 259–269 (2012) 11. R. Waranusast, N. Bundon, V. Timtong, C. Tangnoi, Machine vision techniques for motorcycle safety helmet detection, in 28th International Conference on Image and Vision Computing, New Zealand (IVCNZ, 2013), pp 35–40 12. S.A. Ghonge, J.B. Sanghavi, Smart surveillance system for automatic detection of license plate number of motorcyclists without helmet. Int. J. Comput. Sci. Eng. 2(1) (2018) 13. C. Vishnu, D. Singh, C.K. Mohan, S. Babu, Detection of motorcyclists without helmet in videos using convolutional neural network, in 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK (2017), pp. 3036–3041. https://doi.org/10.1109/IJCNN. 2017.7966233 14. M.J. Prajwal, K.B. Tejas, V. Varshad, M.M. Murgod, R. Shashidhar, Detection of non-helmet riders and extraction of license plate number using Yolo v2 and OCR method. Int. J. Innov. Technol. Exploring Eng. (IJITEE) 9(2) (2019). ISSN: 2278-3075 15. A. Mukhtar, T.B. Tang, Vision based motorcycle detection using HOG features, in IEEE International Conference on Signal and Image Processing Applications (ICSIPA) (IEEE, 2015) 16. L. Allamki, M. Panchakshari, A. Sateesha, K.S. Pratheek, Helmet detection using machine learning and automatic License Plate Recognition, in International Research Journal of Engineering and Technology (IRJET), vol. 06(12), December 2019
Prediction of Heart Disease with Different Attributes Combination by Data Mining Algorithms Ritu Aggrawal and Saurabh Pal
Abstract Heart disease is considered the most dangerous and fatal infection in the human body. This globally fatal disease cannot be identified easily by a general practitioner, and it requires an analyst or expert to detect it. In the field of medical science, machine learning plays important role in disease prediction to identify the heart infection features. In this perspective, this research work proposes a new technique to predict heart disease by using various classifier algorithms such as random forest, gradient boosting, support vector machine, and K- nearest neighbor algorithms. For this purpose, the classification accuracy and the obtained results of each predictor have been compared. In each analysis, machine learning classifier algorithms: random forest, gradient boosting, support vector machine, and K-nearest neighbor algorithms are used and finally defect, gradient boosting, which has calculated high accuracy with low error values and high correlation value when compared to other used algorithms. Keywords Feature selection methods: extra tree · Random forest · Gradient boosting · Support vector machine · K-nearest neighbor algorithms · Contingency coefficient · Adjusted contingency coefficient · Correlation coefficient and phi coefficient · Root mean square error
1 Introduction This research has used heart disease prediction-based analysis by using machine learning techniques. In recent days, machine learning plays an important role in various areas like e-commerce, business, diagnosis of disease, development of expert systems, etc. The main work of machine learning is to discover and identify the hidden pattern in a complex huge dataset. Heart disease is one of the most popular reasons for death all over the world. This disease does not cover particular sex but both males and females globally suffer from these problems. Every year the number of R. Aggrawal · S. Pal (B) Department of Computer Applications, VBS Purvanchal University, Jaunpur, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_38
469
470
R. Aggrawal and S. Pal
heart infected patients increased very rapidly, and due to this reason, various facts of disease being collected. These facts provide help in identify a heart infection. Data mining assists the medical healthcare system. The main objective of machine learning is to support the disease expert in diagnosis but does not replace the expert.
1.1 K-Nearest Neighbors (KNN) K-NN algorithm covers feature similarity and matching of the data point in the training set. (a) (b) (c) (d) (e) (f)
Initially, load training dataset and implement algorithms. Next, select the values of K as integer values which are the nearest data point. Next, evaluate the distance between the row of training data and test data. Next, arrange the value of distance in ascending order. Next, select peak k rows of the sorted array to test. End.
1.2 Random Forest Random forest is used for both classification and regression. This algorithm creates decision tree on the data sample for better prediction. (a) (b) (c) (d)
First, select the random sample of the dataset. Next, it generates a decision tree for every sample and predicts the results for every decision tree. Next, the voting method supports selecting predicted results. At last, the final prediction result is selected by the voting method.
1.3 Gradient Boosting Gradient boosting is a technique to solve regression and classification problems and generate prediction model for weak tree models. (a) (b) (c) (d) (e) (f)
Initially, select training dataset and fit decision tree on it. Next, provide the result of actual and predicted result. Next, fit the target variable as error residual by a new model. Next, previous predictions combine predicted residuals. Next, adjust the model on residuals that is still left. Next, control overfitting and observed accuracy.
Prediction of Heart Disease with Different Attributes …
471
1.4 Support Vector Machine Support vector machine fined an optimal boundary between possible outputs. (a) (b) (c) (d) (e) (f) (g)
Initially, select the training dataset. Next, train all training samples by SVM. Next, search the best decision for categories each feature. Next, ordering the features of the top percentage. Next, the remaining sample to retrain by SVM. Next, select a pair of features to the last sample in training. End.
2 Related Work Amin et. al [1] considered a cardiovascular disease and predicted accuracy by machine learning algorithms. The author used naïve Bayes and logistic regression for better prediction. Finally, they used hybrid techniques to calculate high accuracy. The hybrid model is organized by naïve Bayes and logistic regression algorithms. Verma et al. [2] discussed skin disease by various classifier algorithms in data mining as PAC, LDA, RNC, MNB, NB, and ETC. They generate an ensemble models by bagging and boosting algorithms. The authors measured high classification accuracy by gradient boosting algorithms. Gokulnath and Shantharajah [3] selected heart disease attributes by feature selection method in machine learning. They generate a model by support vector machine and calculate high classification accuracy. Wu et al. [4] considered coronary artery disease by various classification algorithms as deep learning and filtered method. The authors used a monocardiograph of deep learning and calculated high accuracy, sensitivity, and specificity. Alaa et al. [5] analyzed the risk in cardiovascular disease by various classifier algorithms. They imputed data by pipeline method and used 473 variables of cardiovascular disease. The authors calculated (0.77) values of receiver operating characteristics. Haq et al. [6] generated a model using lasso features selection algorithms in disease attributes, selected more important features, and then predicted 85% receiver operating characteristic by K-NN, ANN, DT, and NB. Vijayashree and Sultana [7] discussed heart disease problems by various machine learning algorithms as particle swarm optimization metaheuristic algorithms and support vector machine. The authors calculated high accuracy by (PSO + SVM) on selected features. Vivekanandan and Iyengar [8] identified heart disease problems by various machine learning algorithms. They used a differential evolution algorithm with optimal features selection and calculated high accuracy by fuzzy AHP and feedforward neural network.
472
R. Aggrawal and S. Pal
Khateeb and Usman [9] considered heart disease problems by the K-NN technique. They collect similar factors in heart disease by machine learning algorithms and select risk factor. The authors calculated high classification accuracy by K-nearest neighbor algorithm. Ramotra et al. [10] considered heart disease problems using a decision tree, naïve Bayes, and support vector machine classifiers. The authors used Weka and SPSS tool for heart disease prediction and calculated high accuracy by support vector machine by SPSS tool. They selected 24 important features from 46 related features. They calculated high accuracy for five different classes in machine learning algorithms. Narayan and Sathiyamoorthy [11] considered heart disease features and identified heart infection by machine learning algorithms. They develop a hybrid model as x 2 – DNN that calculated high classification accuracy for DNN and ANN. Gonsalves et al. [12] predicted the information the facts of a coronary heart infection and identify correlated attributes by machine learning algorithms as naïve Bayes, support vector machine, and decision tree. Finally, the authors calculated high accuracy by naïve Bayes. Manogaran et al. [13] developed a hybrid model by multiple kernel learning with adaptive neuro-fuzzy inference for heart infection. He modify in cuckoo search algorithm and calculated high sensitivity with specificity. Jayaraman and Sultana [14] identified heart infection seriousness by machine learning. The authors consider correlate features by cuckoo search algorithms and used neural networks for better prediction. They calculated high accuracy with low error rate with time by a neural network.
3 Methodology This section discussed the detailed study of the heart disease dataset and applied various machine learning techniques.
3.1 Data Description The dataset shown in Table 1 has been taken from the UCI repository. The various heart disease attributes and their domain values are given in table [15, 16].
3.2 Histogram A histogram is a graphical display of data points by various bars of different heights. It appears as a chart, but histogram groups have a collection of various ranges. The height of each bar describes various fall with different range [17, 18] (Fig. 1).
Prediction of Heart Disease with Different Attributes …
473
Table 1 Representation of heart disease features description S.
Code
Feature domain values
1
Age
Age in years
2
Sex
Sex (1 = male; 0 = female)
3
Cp
Chest pain type: 1 = typical angina; 2 = atypical angina; 3 = non-angina pain; 4 = asymptomatic
4
Trestbps Resting blood pressure (mg)
5
Chol
Serum cholesterol (mg/dl)
6
Fbs
Fasting blood sugar > 120 mg/dl: 1 = yes; 0 = no
7
Restecg Resting electrocardiographic results: 0 = normal; 1 = ST-T wave abnormal; 2 = left ventricular hypertrophy
8
Thalach Maximum heart rate achieved [71, 202]
No.
Exercise induced angina: 1 = yes; 0 = no
9
Exang
10
Oldpeak ST depression induced by exercise relative to rest: 0 and 6.2
11
Slope
The slope of the peak exercise ST segment: 1 = upsloping; 2 = flat; 3 = downsloping
12
Ca
Number of major vessels colored by fluoroscopy (values 0–3)
13
Thal
Exercise thallium scintigraphy: 3 = normal; 6 = fixed defect; 7 = reversible defect
14
num
0 = no presence; 1 = presence
Fig. 1 Representation of heart disease attributes by various range of bar
474
R. Aggrawal and S. Pal
Fig. 2 Representation of important features of heart disease by extra tree
3.3 Extra Tree Feature Selection Extremely randomized trees are an ensemble technique. This technique generates an original training sample and aggregates the results of multiple de-correlated decision trees collected in a forest. It is differing from random forest because of the construction of the decision trees in the forest [19–21] (Fig. 2). [0.07784633 0.06093372 0.10118647 0.06361897 0.07383465 0.02986432 0.03003928 0.07437752 0.1390865 0.08973886 0.06789341 0.08924798 0.102332]
3.4 Matrix Evaluation Different performance measures are evaluated using different matrices. These matrices are discussed in this section. Classification Accuracy (CA) =
TP + TN TP + TN + FP + FN
(1)
where TP, TN, FP, and FN are true positive, true negative, false positive, and false negative values in the dataset, respectively.
Prediction of Heart Disease with Different Attributes …
Root Mean Square Value(RMSE) =
475
2 m l Predicted j − Actual j m
(2)
where M = No. of rows in table. The contingency coefficient decided the dependency and independency between dataset variables. This coefficient is defined as the Pearson coefficient and based on chi-square statistic as: Contingency Coefficient (C) =
l2 N + l2
(3)
where l2 is chi-square. N total cases. The C is adjusted so it reaches a maximum of 1. when there is complete association in a table of any number of rows and columns by dividing C by C max . Therefore, C max is calculated as: Cmax =
(m − 1) m
C Adjusted Contingency Coefficient (C ) = C max ∗
(4)
m · l2 (m − 1) n + l 2
(5)
where m = Number of rows or number of columns [22, 23].
3.5 Proposed Model This research work has proposed new techniques, which are shown in Fig. 3 to search the best algorithms in different machine learning techniques. Initially, important features are selected by the extra tree and then applied various machine learning techniques: random forest, gradient boosting, support vector machine and K- nearest neighbor algorithms with different features group as like 3, 7, 11, and 14 and check their correlation strength as contingency coefficient, adjusted contingency coefficient, correlation coefficient, and phi coefficient with classification accuracy and root mean square error values.
476
R. Aggrawal and S. Pal
Fig. 3 Representation of proposed model for heart disease prediction
4 Results The machine learning efficiency is monitored by matrix values like TP, FP, TN, and FN values. In disease dataset, features valuable problems move between binary and decimal values with target variable 0 (zero) present and 1 (one) absent of heart infection in human. Pearson correlation calculates and measures the strength with the direction of a linear relationship between two variables in the dataset. The correlation matrix decide range between −1 (strong negative relationship), +1 (strong positive relationship), and 0 (weak or no linear relationship) [24] (Fig. 4). The confusion matrix is shown in Table 2. The results are obtained by classification on binary and decimal values with tenfold cross-validation. The experimental matrix
Prediction of Heart Disease with Different Attributes …
477
Fig. 4 Representation of features correlation matrix for heart disease Table 2 Representation of confusion matrix for heart disease attributes Algorithm
RF
SVM
K-NN
GB
Attributes (3)
a, b classified as 134 30 | a = 0 36 103 | b = 1
a, b classified as 137 27 | a = 0 40 99 | b = 1
a, b classified as 136 28 | a = 0 44 95 | b = 1
aa, b classified as 144 20 | a = 0 31 108 | b = 1
Attributes (7)
a, b classified as 129 35 | a = 0 30 109 | b = 1
a, b classified as 127 37 | a = 0 37 102 | b = 1
a, b classified as 133 31 | a = 0 50 89 | b = 1
a, b classified as 138 26 | a = 0 28 111 | b = 1
Attributes (11)
a, b classified as 127 37 | a = 0 37 102 | b = 1
a, b classified as 133 31 | a = 0 50 89 | b = 1
a, b classified as 132 32 | a = 0 51 88 | b = 1
a, b classified as 138 26 | a = 0 37 102 | b = 1
Attributes (14)
a, b classified as 128 36 | a = 0 29 110 | b = 1
a, b classified as 135 29 | a = 0 40 99 | b = 1
a, b classified as 123 41 | a = 0 46 93 | b = 1
a, b classified as 142 22 | a = 0 42 97 | b = 1
478
R. Aggrawal and S. Pal
Table 3 Representation of computational table for heart disease three attributes Statistical analysis
A3 (RF)
A3(SVM)
A3(K-NN)
A3(GB)
Classification accuracy
78.21%
77.88%
76.23%
83.16% 0.36
RMSE
0.39
0.47
0.41
Contingency coefficient
0.54
0.489
0.484
0.551
Adjusted contingency coefficient
0.763
0.691
0.685
0.78
Correlation coefficient
0.846
0.773
0.77
0.866
Phi coefficient
0.641
0.56
0.554
0.661
Table 4 Representation of computational table for heart disease seven attributes Statistical analysis
A7(RF)
A7(SVM)
A7(K-NN)
A7(GB)
Classification accuracy
78.54%
75.57%
73.26%
82.17%
RMSE
0.41
0.45
0.41
0.36
Contingency coefficient
0.495
0.453
0.418
0.551
Adjusted contingency coefficient
0.701
0.641
0.591
0.78
Correlation coefficient
0.782
0.717
0.671
0.866
Phi coefficient
0.57
0.508
0.46
0.661
represents the highest and lowest values of true positive, true negative, false positive, and false negative [25–27]. In Table 3, only three features (age, sex, and num) are used and check their correlation strength as contingency coefficient, adjusted contingency coefficient, correlation coefficient, and phi coefficient with classification accuracy and root mean square error values by machine learning classifier algorithms: Random forest, gradient boosting, support vector machine, and K-nearest neighbor algorithms. With the results, three attributes of gradient boosting have calculated a high accuracy of about 83.16% with 0.36 error value and high correlation value of 0.86 compare to other used algorithms. In Table 4, only six features (age, sex, cp, trestbps, chol, fbs, and num) are used and checked their correlation strength with classification accuracy and root mean square error values by machine learning classifier algorithms. With the results, seven attributes gradient boosting have calculated a high accuracy of about 82.17% with 0.36 error values and a high correlation value of 0.86 compared to other used algorithms. In Table 5, 11 features (age, sex, cp, trestbps, chol, fbs, restecg, thalach, exang, oldpeak, and num) are used. With the results observed for 11 attributes, gradient boosting has calculated a high accuracy of about 79.2% with a 0.39 error value and a high correlation value of 0.79 compared to other used algorithms. In Table 6, 13 features (age, sex, cp, trestbps, chol, fbs, restecg, thalach, exang, oldpeak, slope, ca, and thal) have used. It has been observed for 13 attributes, and gradient boosting has calculated a high accuracy of about 78.87% with a 0.38 error value and a high correlation value of 0.79 compare to other used algorithms.
Prediction of Heart Disease with Different Attributes …
479
Table 5 Representation of computational table for heart disease 11 attributes Statistical analysis
A11(RF)
A11(SVM)
A11(K-NN)
A11(GB)
Classification accuracy
75.57%
73.59%
72.6%
79.2%
RMSE
0.4
0.43
0.47
0.39
Contingency coefficient
0.453
0.429
0.407
0.502
Adjusted contingency coefficient
0.641
0.606
0.576
0.71
Correlation coefficient
0.717
0.682
0.655
0.795
Phi coefficient
0.508
0.474
0.446
0.58
Table 6 Representation of computational table for heart disease 14 attributes Statistical analysis
A14(RF)
A14(SVM)
A14(K-NN)
A14(GB)
Classification accuracy
78.54%
77.22%
71.28%
78.87%
RMSE
0.39
0.42
0.43
0.38
Contingency coefficient
0.495
0.475
0.388
0.499
Adjusted contingency coefficient
0.7
0.672
0.548
0.705
Correlation coefficient
0.781
0.755
0.616
0.798
Phi coefficient
0.57
0.54
0.42
0.575
5 Discussion From Tables 3, 4, 5, and 6, it is observed that the classification accuracy is continuously decreased, root mean square error values increases, and correlation values decreases. It is clear that the gradient boosting algorithm better performs in each experiment compare to other used algorithms: random forest, support vector machine, and K-nearest neighbor algorithms. The main objective of the model is to predict heart disease on different feature combinations and compare how accurate is the model output with different feature correlations. Figures 5 and 6 represent correlation strength as contingency coefficient, adjusted contingency coefficient, correlation coefficient, and phi coefficient
Fig. 5 Representation of accuracy and RMSE by different algorithms in heart disease 14 attributes
480
R. Aggrawal and S. Pal
Fig. 6 Representation correlation strength as contingency coefficient, adjusted contingency coefficient, correlation coefficient, and phi coefficient by different classifier algorithms
by machine learning different classifier algorithms. The correlation values of each algorithm are detected to continue to be different in each experiment.
6 Conclusion The dataset has been taken from the UCI repository for various heart disease problems. Total 14 attributes and 303 instances are taken in this experiment and test machine learning algorithms efficiency on different feature combinations such as classification accuracy, root mean square values and check their correlation strength as contingency coefficient, adjusted contingency coefficient, correlation coefficient, and phi coefficient. In each experiment, machine learning classifier algorithms: random forest, gradient boosting, support vector machine, and K-nearest neighbor algorithms are compared. As a result, it is observed in each experiment that the gradient boosting algorithm evaluated the highest accuracy with low error values and higher correlation value as compared to other used algorithms. In the future, the neuro-fuzzy, fuzzy genetic, and neuro-genetic combined model can be used and tested on various different dataset combinations for achieving a better prediction.
References 1. M.S. Amin, Y.K. Chiam, K.D. Varathan, Identification of significant features and data mining techniques in predicting heart disease. Telematics Inform. 36, 82–93 (2019) 2. A.K. Verma, S. Pal, S. Kumar, Prediction of skin disease using ensemble data mining techniques and feature selection method—a comparative study. Appl. Biochem. Biotechnol. 190(2), 341– 359 (2020) 3. C.B. Gokulnath, S.P. Shantharajah, An optimized feature selection based on genetic approach and support vector machine for heart disease. Cluster Comput. 22(6), 14777–14787 (2019)
Prediction of Heart Disease with Different Attributes …
481
4. J.M.T. Wu, M.H. Tsai, Y.Z. Huang, S.H. Islam, M.M. Hassan, A. Alelaiwi, G. Fortino, Applying an ensemble convolutional neural network with Savitzky-Golay filter to construct a phonocardiogram prediction model. Appl. Soft Comput. 78, 29–40 (2019) 5. A.M. Alaa, T. Bolton, E. Di Angelantonio, J.H. Rudd, M. van Der Schaar, Cardiovascular disease risk prediction using automated machine learning: a prospective study of 423, 604 UK Biobank participants. PloS one 14(5), e0213653 (2019) 6. A.U. Haq, J.P. Li, M.H. Memon, S. Nazir, R. Sun, A hybrid intelligent system framework for the prediction of heart disease using machine learning algorithms. Mobile Info. Syst. 2018, 1–21 (2018) 7. J. Vijayashree, H.P. Sultana, A machine learning framework for feature selection in heart disease classification using improved particle swarm optimization with support vector machine classifier. Programm. Comput. Softw. 44(6), 388–397 (2018) 8. T. Vivekanandan, N.C.S.N. Iyengar, Optimal feature selection using a modified differential evolution algorithm and its effectiveness for prediction of heart disease. Comput. Biol. Med. 90, 125–136 (2017) 9. N. Khateeb, M. Usman, Efficient heart disease prediction system using K-nearest neighbor classification technique. in Proceedings of the International Conference on Big Data and Internet of Thing, (2017), pp. 21–26 10. A.K. Ramotra, A. Mahajan, R. Kumar, V. Mansotra, Comparative analysis of data mining classification techniques for prediction of heart disease using the weka and SPSS modeler tools. in Smart Trends in Computing and Communications (Springer, Singapore, 2020), pp. 89–96 11. S. Narayan, E. Sathiyamoorthy, A novel recommender system based on FFT with machine learning for predicting and identifying heart diseases. Neural Comput. Appl. 31(1), 93–102 (2019) 12. A.H. Gonsalves, F. Thabtah, R.M.A. Mohammad, G. Singh, Prediction of coronary heart disease using machine learning: an experimental analysis. in Proceedings of the 2019 3rd International Conference on Deep Learning Technologies, 51–56 (2019) 13. G. Manogaran, R. Varatharajan, M.K. Priyan, Hybrid recommendation system for heart disease diagnosis based on multiple kernel learning with adaptive neuro-fuzzy inference system. Multimedia Tools Appl. 77(4), 4379–4399 (2018) 14. V. Jayaraman, H.P. Sultana, Artificial gravitational cuckoo search algorithm along with particle bee optimized associative memory neural network for feature selection in heart disease classification. J. Ambient Intell. Humanized Comput. 1–10 (2019) 15. M. Tanveer, A. Sharma, P.N. Suganthan, Least squares KNN-based weighted multiclass twin SVM. Neurocomputing (2020). https://doi.org/10.1016/j.neucom.2020.02.132 16. D.C. Yadav, S. Pal, Prediction of heart disease using feature selection and random forest ensemble method. Int J Pharmaceutical Res. 12(4), 56–66 (2020) 17. H. Lu, S.P. Karimireddy, N. Ponomareva, V. Mirrokni, Accelerating gradient boosting machines. in International Conference on Artificial Intelligence and Statistics (2020) pp. 516– 526 18. B. Richhariya, M. Tanveer, A reduced universum twin support vector machine for class imbalance learning. Pattern Recogn. 102, 107150 (2020) 19. Yuan, B. H., Liu, G. H.,: Image retrieval based on gradient-structures histogram. Neural Computing and Applications, 1–11 (2020) 20. M. Alizamir, S. Kim, O. Kisi, M. Zounemat-Kermani, Deep echo state network: a novel machine learning approach to model dew point temperature using meteorological variables. Hydrol. Sci. J. 65(7), 1173–1190 (2020) 21. D.C. Yadav, S. Pal, Prediction of thyroid disease using decision tree ensemble method. HumanIntell. Syst. Integra. 1–7 (2020) 22. M. Baak, R. Koopman, H. Snoek, S. Klous, A new correlation coefficient between categorical, ordinal and interval variables with Pearson characteristics. Comput. Stat. Data Anal. 152, 107043 (2020) 23. D.C. Yadav, S. Pal, To generate an ensemble model for women thyroid prediction using data mining techniques. Asian Pac. J. Cancer Prev. 20(4), 1275 (2019)
482
R. Aggrawal and S. Pal
24. M.A. Hasan, M.U. Khan, D. Mishra, A computationally efficient method for hybrid EEG-fNIRS BCI based on the pearson correlation. Biomed. Res. Int. 2020, 1–13 (2020) 25. R. Aggrawal, S. Pal, Sequential feature selection and machine learning algorithm-based patient’s death events prediction and diagnosis in heart disease. SN Comput. Sci. 1, 344 (2020) 26. A.K. Verma, S. Pal, S. Kumar, Prediction of different classes of skin disease using machine learning techniques. in Smart Innovations in Communication and Computational Sciences. Advances in Intelligent Systems and Computing, vol 1168. (Springer, Singapore, 2021) 27. V. Chaurasia, S. Pal, Machine learning algorithms using binary classification and multi model ensemble techniques for skin diseases prediction. Int. J. Biomed. Eng. Technol. 34(1), 57–74 (2020)
A Novel Video Retrieval Method Based on Object Detection Using Deep Learning Anuja Pinge and Manisha Naik Gaonkar
Abstract The recent research in computer vision is focused on videos. Image and video data has been increased drastically in the last decade. This has motivated researchers to come up with different methods for image and video understanding and other applications like action recognition from videos, video retrieval, video understanding, and video summarization. This article proposes a novel and efficient method for video retrieval by using the deep learning approach. Instead of considering low-level features, videos could be best represented in terms of the high-level features for achieving efficient video retrieval. The idea of the proposed research work is novel, where objects present in the query video are used as features and are used to match against all other videos in the database. Here, object detection is based on YOLOv3, which is the current state-of-the-art method for object detection from videos. This method is tested against YouTube action dataset. It was found that the proposed method has obtained comparable results as the other state-of-the art video retrieval methods. Keywords Video retrieval · Deep learning · Yolov3 · Object detection
1 Introduction Advances in technology in the field of data capturing, storage, and communication techniques have resulted in the evolution of a huge amount of image and video data. The availability and ease of to access the camera and social networks have resulted in the drastic increase in visual media content shared among people. With this everincreasing growth of digital videos, there is a need to develop efficient algorithms to get useful information from this video data. As a lot of work has been carried out upon video data, this decade has witnessed many significant research outcomes on digital data, e.g., videos and images. Taking into consideration video data, the focus is mainly on extracting semantic content from the videos. The various purposes of A. Pinge (B) · M. N. Gaonkar Computer Science and Engineering Department, Goa College of Engineering, Farmagudi, Ponda, Bhausaheb Bandodkar Education Complex, Ponda, Goa 403401, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_39
483
484
A. Pinge and M. N. Gaonkar
video understanding are video summarization, video captioning, video classification, video retrieval, action recognition from videos, etc. Computer vision is a domain that aims to develop algorithms that can make computers understand the contents present in digital data such as images and videos. In the last decade, computer vision gave state-of-the-art results with machine learning techniques. But over the last few years, deep learning models have outperformed the earlier machine learning methods in several fields; among them, computer vision is the significant one. Deep learning has multiple processing layers from which it learns the features at different levels of abstraction level. The factor that contributed to the huge boost in the use of deep nets was mainly because of the publicly available large datasets labeled datasets and also with the power of parallel computing with GPU. The use of GPUs has reduced the training time drastically, thus resulting in the acceleration of deep learning models. Convolutional neural networks (CNNs) have been found to be effective for understanding content in the images. CNN’s yield state-of-the-art results on image segmentation, recognition, detection, retrieval, etc. The main reason for the suitability of these networks is because of the capability to scale up to tens of millions of parameters, and also it handles a very large dataset for the learning process. With these features, CNNs are capable to learn prominent features from images. The results obtained by the use of CNN’s have encouraged to use of video data. To obtain these results, CNNs require an extensively long time for training by optimizing millions of parameters to obtain the final optimized trained model. These networks can learn invariant representation from the videos by back-propagating the information, via stacked convolution and pooling layers. Increase video retrieval is a process of effective search through the dataset and retrieves the most relevant videos to query presented by the user from the dataset [1]. Videos contain complex and variety of patterns including low-level features inbetween the frames and also high-level features across several frames. This makes the video retrieval more challenging. The main focus of this project is on developing an algorithm for fast and efficient video retrieval. After the great success of object detection in images, the idea is now extended to videos. And this same idea is used in this method proposed. Videos can be described by the objects present in it, and similar videos have the same kinds of objects. This forms the base for the proposal of a fast and efficient method for video retrieval. The layout of this article is organized as follows. Section 2 describes the related work done in video retrieval. Section 3 represents the proposed method of research work. Section 4 demonstrates the implementation details, and Sect. 5 depicts the performance analysis of the proposed method. Section 6 concludes the research work. Section 7 explains the discussions, and finally, Sect. 8 outlines the future scope of the research work.
A Novel Video Retrieval Method Based on Object …
485
2 Related Work The main aim of video retrieval is to retrieve relevant videos from the database similar to the query video provided by the user. The entire video retrieval process is complex as it is difficult to understand what exactly the user is looking out for. This requires the analysis of the given query video and makes the task of video retrieval challenging. The analysis is mainly based on semantic concepts. The user always has high-level features in mind while searching for videos. But there were no good concept detectors for the high-level features. Design of the concept detectors is a lengthy process. It involves data pre-processing, extraction of low-level features, and machine learning techniques for classification [2]. Surveillance cameras capture a huge amount of videos around the daily. Hence, searching a video sequentially amongst such huge data is time-consuming and costly. As a result, video content analysis has emerged as a field in the computer vision domain. In 2002, Dimitrova et al. [3] described the entire process of content-based video retrieval. It has four processes. These processes are feature extraction, structure analysis, abstraction, and indexing the video. Feature extraction is a crucial step in video retrieval. The entire indexing scheme is dependent upon attributes chosen for feature representation. It is not easy to map very easily extractable features such as color, texture, edges, shape, etc. into semantic concepts objects, people, and scenes. Considering audio domain features, e.g., energy, pitch, or bandwidth can be used for classification. Although in the video, the visual content is focused. Hence, the major features that could be present are text or captions present in the video. The second step in video retrieval is structural analysis. In this step, attempt is made to extract structure information from within the frames of the video. This represents temporal information. The third step is video abstraction. Video abstraction involves a process of creating a representation of video from the structure analyzed in the previous step. This abstraction is similar to extracting keywords from the text. And the final step is video indexing. This means tagging the videos from the database. Traditional methods focus on extracting low-level features from videos, and these features are then used to find the match with others. The main aim of researchers is to develop algorithms to automatically parse the text, video, and audio. Earlier search solutions were based on text matching which failed for the huge number of videos that have little relevance with metadata or with no metadata at all as the videos captured on mobile phones, wearable devices, or surveillance cameras do not have any metadata attached. To overcome this problem, content-based video semantic retrieval methods were developed. These methods depend on semantics rather than on textual metadata or low-level features, e.g., color, edge, etc. These semantic features include people, objects, actions, activities, scenes, text, or voice involved in the videos. In 2009, Karpenko [4] extended the tiny image data mining techniques to tiny videos. The dataset of 50,000 videos was collected from YouTube. Content-based copy detection experimented on this dataset. Based on the similarity matrix, the result for text-only video retrieval was improved. The tiny image descriptor was color and affine invariant. This same descriptor was used for the tiny videos.
486
A. Pinge and M. N. Gaonkar
This decade witnessed e-lecturing. This increased the lecture data available on the internet. Hence, there was a need for an efficient lecture video retrieval algorithm. In 2014, Yang and Meinel [5] proposed a method to retrieve the lecture videos widely present on the World Wide Web. It proposed automated indexing of video and also video search through the large video lecture database. Firstly, automatic video segmentation was done along with key-frame detection for visual content identification. Then, it used optical character recognition (OCR) technique on the key-frames to extract metadata and automatic speech recognition (ASR) on the audio present in the video to convert speech to text. It focused on two main parts visual content (text) and audio tracks (speech). For visual content, a slight difference between the frames was detected, and a unique slide frame is extracted considering it as a video segment. Then, video OCR analysis is done for extracting the textual data present on those frames. In a speech to text task, CMU Sphnix is used as ASR. The proposed might not work well when videos of different genres are embedded in the slides. To improve this, support vector machine (SVM) classifier was used. Also in order to make comparison, histogram of gradients (HOG) was applied. Another approach for video retrieval was proposed by Gitte et al. [6] in 2014. Here, the system for video mining from the multimedia warehouse has two steps: first is building the multimedia warehouse, and the second is retrieving the video from that warehouse. To retrieve the video from the warehouse, the presented query is first processed, and then, similarity matching is performed. Similarity matrix used here is Euclidean distance. Then, the final similar videos are listed. The two steps are called off-line processing and online processing. The videos are uploaded, and feature extraction is performed on each video. Key frame is chosen from the available frames, and the video is indexed. These indexes are stored along with key-frames and videos. In online processing, user presents a query video. This video is analyzed, and features are extracted. These features are then matched against the features of the videos stored in the warehouse. Euclidean distance is used as a similarity measure for matching query video and candidate video from the warehouse. In 2018, Iqbal et al. [7] proposed content-based video retrieval method for unconstrained video. The content-based retrieval segment the video and detect the objects of interest. To detect objects, the foreground is separated from the background, and then, localization is done on the frames extracted to obtain features. For video retrieval, the user submits a frame or short video. The image or the extracted frames are then converted to grayscale. Filters are then used to remove noise. This image is then segmented into four co-ordinates. Each of the segmented frames is then converted into eight orientations. For each orientation, various feature extraction methods and classification/clustering algorithms are used to retrieve videos. Object detection is one of the most challenging problems in the computer vision domain. It focuses on object localization and objects classification in the given input. The deep neural network has shown excellent results in object detection compared to other machine learning approaches.
A Novel Video Retrieval Method Based on Object …
487
Year
Paper title
Summary
2002
Applications of video content analysis and retrieval [8]
The proposed the entire process of video retrieval into four major steps. Feature extraction, structure analysis, abstraction, and indexing the video
2009
50,000 tiny videos: A large dataset for non-parametric tent-based Retrieval and Recognition [9]
It used a large dataset of 50,000 tiny videos collected from YouTube. Here, these tine videos are for classification instead of tiny images
2014
Content-based lecture video retrieval using speech and video text information
This paper proposed automated video indexing and video search for a large video database. Video segmentation is performed for key-frame detection. OCR is used to extract data textual data from these key-frames. Automatic speech recognition is used to extract information from audio tracks
2014
Content-based video retrieval system
This approach has two major steps: Building a multimedia warehouse and retrieving the videos from the warehouse. Video retrieving has sub-steps: video segmentation, key-frame detection, feature extraction
2018
Content-based video retrieval using convolutional neural network
Object detection is used for video analysis. The foreground is separated from the background, and localization is performed on each key-frame extracted. Eight-oriented frames are used for feature extraction
Objects are considered as features from videos in the process of video retrieval. The best-known algorithm for object detection is You Look Only Once (YOLO) [10]. Earlier methods of object detection are used classifiers for the task of classification. Instead, this technique treats object detection as a regression problem to bounding boxes and their associated class probabilities. An entire neural network can predict bounding boxes and class probabilities in a single evaluation. It works exactly as the name suggests You Look Only Once [11] at the image. This approach of predicting bounding boxes and class probabilities at the same time is called the unified approach. This algorithm treats the image globally and hence capable of extracting conceptual information. It is also capable capturing a general representation of the objects. Hence, it gives good results when applied to new domains. Bounding boxes are predicted based upon the entire image. It divides the image into an S × S grid. If a grid cell contains the center of an object, then that grid cell is responsible for detecting that particular object. Each of the grid cells predicts certain bounding boxes and confidence scores for the boxes. YOLOv3 [12] can perform multi-scale detections. The entire YOLO method is divided into two parts: feature extraction and detector. For a new image, first, the features are extracted at multiple (three) scales. These features are then passed onto three different detectors to get bounding boxes and probabilities. YOLOv3 uses Darknet-53 for feature extraction.
488
A. Pinge and M. N. Gaonkar
3 Method Proposed Step 1: Train the YOLOv3 model using Transfer Learning Transfer learning is a machine learning technique in which a model developed earlier for some task is reused for some other specific task as a starting point [13]. Transfer learning can be applied in two major ways. One is developing a model, and the other is using a pre-trained model approach. In developing a model approach, first, the source task is selected. Model is developed for that particular task. This is then reused for other tasks by fine-tuning. Another approach is directly using pre-trained weights instead of developing a model. Here, best fit pre-trained model is selected, and this is used as a starting point. The model is fine-tuned according to the task. Here, the second approach is used (Fig. 1). A.
B.
Dataset Preparation YouTube action dataset is used. It has 11 action categories. To train the model to identify the objects present in these videos, a dataset was created by taking the images from the videos of this dataset. For example, diving video has a diving board, water, and a person present in it or a golf game has a golf stick, person, or maybe a golf ball. The images of these objects to were taken. Approximately, 150 images of each class comprise the entire new dataset making total images for training as 1500 images for training and 150 images for validation. Finding the images from the videos was challenging because these videos had viewpoint changes, and the background was cluttered in many videos, illumination variations even with respect to object scale. Training the Model
Fig. 1 Overview of the proposed video retrieval method
A Novel Video Retrieval Method Based on Object …
489
YOLOv3 pre-trained model was trained for 100 epochs using Google Colab. This model was able to identify the objects from action videos of the database correctly. It took 19 h to train the model. Step 2: Use the trained model to find objects from the videos in the database For each of the videos in the database, the prominent objects were detected using the above-trained model. These objects were saved in a data structure. These are later required for similarity matching between the given query video and a particular video from the database. These objects now represent the features from each video, and hence, the actual videos are not visited again. And hence becomes an abstraction of the videos. Step 3: Use the trained model on the query video to retrieve similar videos Once all the videos have the objects detected, the next step is to identify the objects present in the query video. Hence, query video is passed to the YOLO algorithm to detect objects, and the objects from the query videos are retrieved. Step 4: Construct a similarity matrix Once objects from the query are video is extracted, the next step is to compute the similarity measure between query video and each video present in the database. To compute similarity measure, binary vectors are created depending upon the presence or absence of a particular object. In order to create two binary vectors, first unique objects present in query video and the objects present in a particular video are taken into consideration. The methods used for similarity measure are the Jaccard coefficient and simple matching coefficient. Given two vectors the following values are computed depending upon the pair of values from each of the vectors. Consider the two vectors a and b. M 01 = the number of attributes where a was 0 and b was 1. M 10 = the number of attributes where a was 1 and b was 0. M 00 = the number of attributes where a was 0 and b was 0. M 11 = the number of attributes where a was 1 and b was 1. The two measures used to compute are. A.
Jaccard Similarity Measure
The Jaccard similarity measure also known as the Jaccard similarity coefficient compares members for two binary sets to see which members are shared and which are distinct. It is a measure of similarity for the two sets of data, with a range from 0 to 100%. The higher the percentage, the more similar the two videos. J ( A, B) =
Number of 1 − to − 1 matches Number of all matches and mismatches (except both zero)
(1)
M11 M01 + M10 + M11
(2)
J (A, B) =
490
B.
A. Pinge and M. N. Gaonkar
Simple Matching Coefficient
It is used for matching and comparing the similarity between two vectors. SMC = C.
M00 + M11 Number of matches = Number of attributes M00 + M11 + M10 + M01
(3)
Dice Similarity Coefficient
It is used to measure the similarity between two given samples. Dice Coefficient =
2.|a ∩ b| |a| + |b|
(4)
Example of the similarity measures with respect to the proposed method. Video 1 = [Basketboard, Person] Video 2 = [Diving Board, Water, Person] Total unique objects = [ Basketboard, Person, Diving Board, Water] Video 1 = [1, 1, 0, 0] Video 2 = [0, 1, 1, 1] Jaccard coefficient
Dice coefficient
Simple matching coefficient
J(A,B) = 1/4 = 0.25
DSC = 2/8 = 0.25
SMC = 1/4 = 0.25
In most of the cases, the length of the unique vector obtained from the objects of two videos is limited to 4 to 5 according to the method proposed. Hence, any of the above similarity measures will yield almost the same result. The threshold determined here is 0.75. When the three measures considered Jaccard coefficient, Dice similarity coefficient, and simple matching coefficient, it was observed that the Jaccard and Dice coefficient and also simple matching coefficient work in a similar manner when used for this proposed method. Simple matching coefficient works the same as Jaccard as this method does not provide an increase to M00 state. As the objects are considered only if present in either of the two videos. Similarly, the Dice coefficient also works the same as the Jaccard coefficient. Hence, the Jaccard coefficient is used further. Step 5: Retrieve the most similar videos Once the similarity measure for each video from the database is calculated, the next step is to retrieve the videos with similarity measure above the threshold. The threshold is decided to depend upon the fact that few objects may be in common in the video even if the videos are dissimilar. Here, it is considered as 0.75.
A Novel Video Retrieval Method Based on Object …
491
4 Implementation Details 4.1 Hardware Processor requirements: A CPU with i3 or higher. Memory requirements: A minimum memory of 500 GB is required to process a video and store the output.
4.2 Software Languages used: Python. Software required. Desktop: Anaconda enabled with Jupyter Notebook for Python. Cloud: Google Colab for GPU support.
4.3 Dataset It has 11 categories of action [14]. This includes basketball shooting, cycling, horseback riding, soccer juggling, swinging, tennis, trampoline jumping, walking with dog, volleyball spiking, diving, and golf. This dataset is very challenging due to large variations in camera motion, object appearance and pose, object scale, view point, cluttered background, illumination conditions, etc.
4.4 Output Obtained See Table 1.
5 Performance Evaluation The accuracy of this method is basically dependent upon accurate object detection. And since YOLOv3 is till now the best algorithm for object detection, it yields perfect objects as trained using the newly created dataset. The performance of the system proposed is evaluated on precision and recall metrics. Precision–recall is a metric for measuring the outcome of the prediction. Precision is a measure of relevant information retrieved. The recall is a measure of actually relevant data retrieved. In information retrieval, the precision score for an exact match is 1.0. This means every
492
A. Pinge and M. N. Gaonkar
Table 1 Output obtained by proposed approach Cycling
Diving
Tennis
Basket board
retrieved document was relevant. When the recall score is 1.0, it means every relevant data was retrieved (Fig. 2; Table 2, 3 and 4). Precision =
No. of videos reteirved relavant to query video Total Number of videos retrieved
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
Fig. 2 Precision–recall plot for YouTube action dataset
Precision Recall
(5)
A Novel Video Retrieval Method Based on Object …
493
Table 2 Precision of the proposed method in detail Sr. No
Query video
Relevant videos
Total retrieved
Precision
1
1. avi (Basketball)
100
127
0.79
2
2. avi (Cycling)
124
140
0.89
3
3. avi (Diving)
140
150
0.93
4
4. avi (Golf)
108
150
0.72
5
5. avi (Horse ridding)
132
189
0.73
6
6. avi (Football)
105
140
0.75
7
7. avi (Swing)
117
150
0.78
8
8. avi (Tennis)
98
158
0.62
9
9. avi (Trampoline)
68
112
0.60
10
10. avi (Volleyball)
77
166
0.46
Table 3 Recall of the proposed method in detail Sr. No
Query video
Relevant videos
Total relevant videos present in database
Recall
1
1. avi (Basketball)
100
141
0.71
2
2. avi (Cycling)
124
145
0.85
3
3. avi (Diving)
140
156
0.90
4
4. avi (Golf)
108
142
0.76
5
5. avi (Horse ridding)
132
198
0.67
6
6. avi (Football)
105
156
0.67
7
7. avi (Swing)
117
207
0.57
8
8. avi (Tennis)
98
167
0.59
9
9. avi (Trampoline)
68
119
0.57
10
10. avi (Volleyball)
77
116
0.66
Table 4 Precision and recall for YouTube action dataset
Sr. No
Query video
Precision
Recall
1
1. avi (Basketball)
0.79
0.71
2
2. avi (Cycling)
0.89
0.85
3
3. avi (Diving)
0.93
0.90
4
4. avi (Golf)
0.72
0.76
5
5. avi (Horse ridding)
0.73
0.67
6
6. avi (Football)
0.75
0.67
7
7. avi (Swing)
0.78
0.57
8
8. avi (Tennis)
0.62
0.59
9
9. avi (Trampoline)
0.60
0.57
10
10. avi (Volleyball)
0.46
0.66
494
A. Pinge and M. N. Gaonkar
Recall =
No. of the videos retrieved videos relavant to query video Total number of videos relvant to query video in database
(6)
6 Conclusion A fast and efficient video retrieval method is proposed in this work. This method is basically dependent on object detection for video retrieval. Since the objects are the inseparable part of videos, the main idea here is to treat these objects as features to retrieve the video. Hence, the state-of-the-art algorithm for object detection is used to extract features “objects” as features. The algorithm used here YOLOv3. Transfer learning is performed using pre-trained weights of YOLO on the YouTube action dataset. For transfer learning, a dataset of images representing objects of interest was created. The YOLOv3 model was trained on this dataset for 100 epochs. This model gave satisfactory results despite the challenges involved in videos from the database. Similarity measures tried were Jaccard coefficient, Dice coefficient, and simple matching coefficient. It was observed that all three work in the same manner for this proposed method. Hence, Jaccard is used later. It can be said that this method is efficient because only features (objects) from the videos are extracted only once and stored, unlike other video retrieval methods. This proposed method is fast as to retrieve the video, only task involved is to obtain the features from the given query video and retrieve the similar videos based on the similarity matrix.
7 Discussion Objects are the prominent part of the videos that have motivated us to use the objects as features. The existing methods are using different approaches. Using the objects from videos is a novel approach and is expected to give better results. The entire proposed approach is dependent upon objects. Object detection should be accurate and efficient. The algorithm used here for object detection is YOLO which is the stateof-the-art algorithm for object detection. This yields accurate results after training the model on the objects of interest.
8 Future Work This algorithm can be extended to detect and identify the more complex objects. This can be done by training the YOLOv3 model on more number of clear images. Also,
A Novel Video Retrieval Method Based on Object …
495
the same concepts of object detection can be used for action recognition. The objects and context information can be utilized to detect the action undergone the videos.
References 1. A. Podlesnaya, S. Podlesnyy, Deep learning based semantic video indexing and retrieval. in Proceedings of SAI intelligent systems conference. (Springer, Cham, 2016), pp. 359–372 2. Y. Gu„ C. Ma, J. Yang, Supervised recurrent hashing for large scale video retrieval. in Proceedings of the 24th ACM international conference on Multimedia (2016), pp. 272–276 3. N. Dimitrova, H.J. Zhang, B. Shahraray, I. Sezan, T. Huang, A. Zakhor, Applications of videocontent analysis and retrieval. IEEE Multimedia 9(3), 42–55 (2002) 4. A. Karpenko, 50,000 Tiny Videos: A Large Dataset for Non-parametric Content-based Retrieval and Recognition (Doctoral dissertation) (2009) 5. H. Yang, C. Meinel, Content based lecture video retrieval using speech and video text information. IEEE Trans. Learn. Technol. 7(2), 142–154 (2014) 6. M. Gitte, H. Bawaskar, S. Sethi, A. Shinde, Content based video retrieval system. Int. J. Res. Eng. Technol. 3(06) (2014) 7. S. Iqbal, A.N. Qureshi, A.M. Lodhi, Content based video retrieval using convolutional neural network. in Proceedings of SAI Intelligent Systems Conference. (Springer, Cham, 2018), pp. 170–186 8. HOLLYWOOD2: Human Actions and Scenes Dataset. Retrieved from https://www.di.ens.fr/ ~laptev/actions/hollywood2/ 9. M. Jain, J.C. Van Gemert, C.G.M. Snoek,What do 15,000 object categories tell us about classifying and localizing actions?. in Proceedings of the IEEE conference on computer vision and pattern recognition (2015) 10. J. Redmon, et al., You only look once: unified, real-time object detection. in Proceedings of the IEEE conference on computer vision and pattern recognition (2016) 11. J. Redmon, A. Farhadi,YOLO9000: better, faster, stronger. in Proceedings of the IEEE conference on computer vision and pattern recognition (2017) 12. J. Redmon, A. Farhadi,Yolov3: an incremental improvement (2018). arXiv preprint arXiv: 1804.02767 13. G. Kordopatis-Zilos, S. Papadopoulos, I. Patras, I. Kompatsiaris, FIVR: Fine-grained incident video retrieval. IEEE Trans. Multimedia 21(10), 2638–2652 (2019) 14. UCF Sports Action Data Set, Retrieved from https://www.crcv.ucf.edu/data/UCF_Sports_Act ion.php 15. Recognition of human actions Retrieved from https://www.nada.kth.se/cvap/actions/
Exploring a Filter and Wrapper Feature Selection Techniques in Machine Learning V. Karunakaran, V. Rajasekar, and S. Iwin Thanakumar Joseph
Abstract Nowadays, huge amounts of data are generated by many fields such as health care, astronomy, social media, sensors, and so on. When working with such data, there is a need for the removal of irrelevant, redundant, or unrelated data. Among various preprocessing techniques, dimensionality reduction is one such technique used to clean data. It helps the classifiers by reducing training time and improving the classification accuracies. In this work, the most widely used feature selection techniques were analyzed in machine learning for improving the classification as well as prediction accuracies. Keywords Feature selection · Machine learning algorithms · Dimensionality reduction
1 Introduction Attribute selection (AS) plays a very vital role in preprocessing tasks in machine learning or data mining techniques. It is the process of finding the optimal features from a given dataset, i.e.., in the original dataset, those features which contribute more information are retained in the training dataset and the remaining features are eliminated. Attribute selection methods are broadly classified into two: i. ii.
Filter method Wrapper method.
In the filter method, the selection criterion is purely based on the filter function, i.e., the features are selected based on the filter function and not based on the classifiers. In the wrapper method, the selection criterion is based on classification accuracy, i.e., V. Karunakaran (B) · S. I. T. Joseph Department of Computer Science and Engineering, Karunya Institute of Technology and Sciences, Coimbatore, India V. Rajasekar Department of Computer Science and Engineering, SRM Institute of Science and Technology, Vadapalani campus, Chennai, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_40
497
498
V. Karunakaran et al.
selecting the best features is purely based on classification accuracy obtained from the classifiers. In the wrapper method, a classifier is selected based on the problem domain. The full article is organized in the following manner. Section 2 explains the process of feature/attribute selection by filter method, and Sect. 3 illustrates the process of feature/attribute selection by wrapper method. Finally, Sect. 4 concludes the research work.
2 Attribute Selection by Filter Method In this method, the selection criterion is purely based on the filter function. In this section, the most widely used techniques for attribute selection is described using the filter method. The general functionality of the filter method is given in Fig. 1. In feature selection techniques, information gain (IG) is a prominent method and it is also called as a Kullback–Leibler (KL) divergence. Given probability functions of A and B, KL is a non-symmetric measure of divergence between A and B: D(X ||Y ) =
X ( j) log X ( j)/Y ( j)
(1)
j
In reality, when given probability models A and B, KL divergence is the expected logarithmic difference between A and B. It is zero if and only if A and B are equal. Information gain can be defined as mutual information. Information gain IG (K) can be defined as the reduction in entropy archived by learning a variable K: IG(K ) = M(L) −
(L i /L)M(L i )
(2)
where H (L) stands for the entropy of the dataset. H (Lm) stands for the entropy of the mth subset generated by partitioning L based on attribute K.
Fig. 1 Attribute selection using filter approach
Exploring a Filter and Wrapper Feature Selection Techniques in Machine …
499
Here, the attributes are ranked based on the values obtained from the information gain—for attributes having high information gain, those features will be ranked higher than others because those features have substantial power in classifying the data. It evaluates each feature one by one and eliminates those features which contribute less information or unwanted features from the training set. This method is simple and cost-effectiveness when compared to other filter methods. This method is fast, even in such cases where the number of attributes is larger than the number of instances [1]. The gain ratio is also one of the well-known methods for feature selection, proposed by Mitchell in the year 1997. The ID3 algorithm is the best subset of features identified by the information gain method. C4.5 algorithm is a successor of the ID3 algorithm. In the C4.5 algorithm, the best subset of features was identified by using a gain ratio method (Salzburg 1994). In the information gain method, the split is determined based on the feature having the highest information gain—measured in bits. This yields better results but will favor splitting the features and adding to the dataset—which already have too many features. The limitation of the information gain method can be overcome by the gain ratio method. The gain ratio method will incorporate the value of split and will determine what proportion of IG is valuable for those splits [1, 2]. Mutual information is a well-known method for attribute selection, proposed by Battiti in the year 1994. In this article, the best subset is identified by using mutual information. The effectiveness of the mutual information feature selection method is evaluated using a neural network classifier and compared with principal component analysis (PCA) and random feature selection method (RFSM). In terms of classification accuracy, the mutual information feature selection method provides a better classification accuracy compared with random feature selection method (RFSM) and principal component analysis (PCA) [3]. The conditional mutual information concept was proposed by Cheng et al. in the year 2011 for selecting the best subset of attributes. This method not only identifies the best subset of attributes and also analyzes the synergy and redundancy. Mutual information feature selection is used for many applications, but this method does not have features synergy and analyzing redundancy. The conditional mutual information method considers the redundancy and synergy between the attributes and identifies the favorable attributes. The CMIFS method reduces the probability of choosing the best features as a redundant feature. The experimental result shows that the conditional mutual information feature selection method provides a better classification accuracy than the mutual information feature selection method [4]. Kira and Rendell proposed a RELIEF algorithm, a simple and well-known method for feature weight estimation. If the features are tightly dependent on each other, then it is difficult to identify which feature is contributing more information or quality of the feature. The relief algorithms identify the quality of attributes based on the values that are distinguishable between records/data and the near points. The pseudocode of the basic RELIEF algorithm is given as follows:
500
V. Karunakaran et al. Set all weights M [attributes] =0 for i =1 to j do Begin Randomly select an instance A; Find the nearest hit L and nearest miss O; For F= 1 to all_ attributes do M [attributes] = M [attributes] – diff (attributes, H) / j + diff (attributes, M) / j; end; Where, A-stands for instances in the dataset L- stands for a Nearest hit of instances O- stands for Nearest miss of instances
In the RELIEF algorithm, a diff() function is used for finding the difference between the values of features K for 2 instances. If the diff() function returns ‘1 then both instances have different attribute values. If the diff() function returns ‘0 then both instances have the same value. The RELIEF algorithm identifies the relevant attributes and irrelevant attributes using diff() function [5].
3 Feature Selection Using Wrapper Method Feature selection was carried out by the following methods, namely sequential forward selection (SFS) and sequential backward elimination (SBE) method. Both methods belong to the iterative method. In the sequential forward selection method, the training starts with an empty set ‘A’ gradually by adding the features into the training set one by one, using some evaluation function. In the end, the training set ‘A’ contains those features which contribute more relevant information during classification. The working model of sequential backward elimination is a reverse of sequential forward selection. The sequential backward elimination method starts with an entire training set ‘A’, gradually removing the attributes one by one using some evaluation function. In the end, training set ‘A’ contains those features which contribute more relevant information during the classification task. Sequential forward selection as well as sequential backward elimination methods should give an effectively reduced dataset and better classification accuracy. Both methods tend to be fascinated by local minima. The caveats of sequential forward selection as well as sequential backward elimination methods were overcome using randomized algorithms. All randomized algorithms tend to include randomness into their search procedure to escape from the local minima. One of the well-known randomized algorithms is a genetic algorithm. Mohamad (2004) used a genetic algorithm to identify the best subset of attributes from both high dimensional datasets and low dimensional datasets. The performance of attribute selection using a genetic algorithm was evaluated by using support vector machine. The experimental result shows that the selected subsets of the feature were good and obtained better classification accuracy for training data for both small dimensional and large dimensional datasets [6]. Zhang and Sun tried a feature selection by using Tabu search for the high dimensional dataset. The proposed method is compared with three other methods, namely
Exploring a Filter and Wrapper Feature Selection Techniques in Machine …
501
sequential forward selection (SFS), sequential backward selection, and genetic algorithm (GA). The experiment result shows that the reduced subset of features produces better classification results in a minimum amount of time. This method is capable of gaining local minima and global optimization [7]. Huang et al. proposed a hybrid genetic algorithm for feature selection. In this article, the author worked on both the wrapper method and filter method. Here, the work is divided into phases. The first phase is the outer stage optimization, the best subset of features was identified by using a wrapper method, and this method achieves global optimization. The second phase is an inner stage optimization, the best subset of features was identified by using filter method and this method achieves local optimization. Both the methods will assist each other to provide a better global predictive accuracy and high local search efficiency. Experimental results show that the proposed hybrid genetic algorithm provides excellent classification accuracy compared to the recursive feature elimination method and Battiti’s greedy feature selection method (MIFS) [8]. Ahmad tried both feature extraction and feature selection using principal component analysis and particle swarm optimization, respectively. This proposed method is compared against existing methods, namely principal component analysis (PCA) and principal component analysis with genetic algorithm (PCA-GA). The classification is carried out by a modular neural network classifier. The obtained experimental result depicts that the combining principal component analysis (PCA) and particle swarm optimization (PSO) provides a better detection rate and lesser false alarm rate compared to principal component analysis (PCA) and Principal Component Analysis + Genetic Algorithm methods (PCA + GA) [9]. Suganthi and Karunakaran tried a combination of instance selection and feature extraction using cuttlefish optimization algorithm and principal component analysis, respectively. The experiment was performed using four large datasets under the 4 criteria’s such as. i. ii. iii. iv.
With original dataset, Feature extraction using PCA, Instance selection using the cuttlefish optimization algorithm, Combination of instance selection (IS) and feature extraction (FE) using the cuttlefish optimization algorithm (COA) and principal component analysis (PCA).
The experimental results show that the combination of instance selection plus feature extraction using cuttlefish optimization algorithm and principal component analysis will take only a small amount of training time when compared to the other criteria’s [10]. Karunakaran et al. proposed cuttlefish optimization algorithm (COA) through Tabu search (TS) for both feature selection and instance selection. The experiment was analyzed with four large datasets obtained from the UCI repository. The method cuttlefish optimization algorithm through Tabu search was better in terms of detection rate (DR), accuracy, false positive rate (FPR)/type one error, and training time (TT) of classifiers. The experiment was carried out with SVM and KNN classifier [11].
502
V. Karunakaran et al.
Karunakaran et al. tried the bees algorithm to find the optimal subset of features from the entire dataset. Dataset used in this article is a weather dataset obtained from the UCI repository. The experimental result shows that the proposed method is better in terms of accuracy, type 1 error/false positive rate and detection rate compared to the entire dataset [12]. The below-mentioned methods tried with different metaheuristic for solving feature selection problems and achieved better classification results [13– 15] (Table 1).
Table 1 Analyzed various filter and wrapper feature selection methods S. No Authors and year
Description
1
Kullback and Leibler [16] Cover and Thomas [17]
Information gain (IG) is 1. Simple and cost-effective used for ranking features. 2. It also provides good According to the performance if features information gain value, the are larger instances 3. It will perform poor if the features were arranged in attributes have a large descending order. (First number of distinct values feature have a higher rank and the last feature have the lowest rank)
Inference
2
Mitchell [1] Salzberg [2]
Gain ratio (GR) is used for 1. Simple and cost-effective identifying the best 2. It will overcome the features among the entire limitation of the features information gain method. For example, if two features are having the same information gain. Then, the gain ratio will select which attribute has less number of categories/distinct values 3. If the attributes are more dependent on each other, then the performance leads to poor values
3
Kira and Rendell [5]
RELIEF algorithm will 1. If the features are estimate the quality of strongly dependent it will features based on how well produce a good result 2. The basic RELIEF their values are unique algorithm supports only between the instances and two-class problems near points (continued)
Exploring a Filter and Wrapper Feature Selection Techniques in Machine …
503
Table 1 (continued) S. No Authors and year
Description
4
R.Battiti [3]
Sequential forward 1. Implementation is simple selection (SFS) algorithm 2. Both the method begins with an empty provides an effectively feature set ‘S’ and reduced dataset and they gradually adds the features give better classification in the set by using some accuracy 3. Both methods achieve evaluation function. only local optima Finally, the feature set S contains only those features were contributing more information during classification Sequential backward selection (SBS) begins with all features and repeatedly removes a feature; the feature which contributes very limited information
Inference
5
Mohamad [6]
Genetic algorithm (GA) is used to search out and identify the potential informative features and support vector machine (SVM) is used for classification
6
Zhang and Sun [7]
Tabu search (TS) method 1. The method is compared used for solving high with genetic algorithm dimensionality problem by (GA), sequential forward selecting the best features selection (SFS), and from others in the dataset sequential backward selection method (SBS) 2. Tabu search (TS) method provides quality optimal subset and less computational time compared to existing systems (continued)
1. This method will overcome the limitation of tendency to become trapped in local optima 2. The experiment result shows the selected features are good to get better classification accuracy
504
V. Karunakaran et al.
Table 1 (continued) S. No Authors and year
Description
7
Huang et al. [8]
1. Both wrapper and filter 1. It performs both global methods are applied for search and local search 2. The method is compared finding the best subset with the feature of features elimination method and 2. It consists of two stages greedy feature selection i. Selecting the best subset (GFS) method of features by using the wrapper method (global 3. The hybrid genetic algorithm (HGA) search) provides better ii. Selecting the best subset classification accuracy of features using the filter than existing systems method (local search)
Inference
8
Ahmad [9]
1. Principal component analysis (PCA) is used for feature extraction 2. Particle swarm optimization (PSO) is used for feature selection 3. The performance is evaluated by modular neural network classifier
The combination of PCA and particle swarm optimization (PSO) provides a better detection rate, false positive rate, and classification accuracy than PCA and PCA + Genetic algorithm
9
Suganthi and Karunakaran [10]
1. Instance selection is carried out by using the cuttlefish optimization algorithm (COA) 2. Feature extraction using PCA 3. Combination of instance selection and feature extraction
The combination of the cuttlefish optimization algorithm (COA) and principal component analysis (PCA) provides a small amount of training time than other criteria
10
Karunakaran et al. [11]
Both feature selection (FS) and instance selection (IS) was carried out by using the cuttlefish optimization algorithm (COA) through Tabu search (TS)
This method provides better in terms of accuracy (ACC), false positive rate (FPR), and detection rate (DR) Computational time (CT) is less
11
Karunakaran et al. [12]
1. Weather dataset used 2. Optimal subset features identified by using the bees algorithm (BA)
This method provides better in terms of accuracy, false positive rate, and detection rate
Exploring a Filter and Wrapper Feature Selection Techniques in Machine …
505
4 Conclusion This article delivers a survey on attribute selection methods. The basic concepts of attribute selection methods were analyzed and discussed. The various attribute selection methods were broadly categorized into filter and wrapper methods, and an overview of the filter as well as wrapper methods was discussed. Many approaches were proposed for attribute selection. Most of the approaches still suffer from the problem of stagnation, i.e., the searching process will be stopped in local optima, instead of reaching into global optima and the methods that belongs to global optimization techniques does not contribute good accuracy and takes a huge amount of time to train the classifiers. Further, efficient searching techniques are required for identifying the best subset of attributes.
References 1. T.M. Mitchell, in Machine Learning. (WCB/McGraw-Hill, Boston, Massachusetts, 1997) 2. S.L. Salzburg, C4.5: Programs for machine learning. Morgan Kaufmann publishers, inc., Machine Learn. l6(3), 235–240 (1994) 3. R. Battiti, Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. 5(4), 537–550 (1994) 4. G. Cheng, Z. Qin, C. Feng, Y. Wang, F. Li, Conditional mutual information-based feature selection analyzing for synergy and redundancy. Etri Joixnal 33(2), 210–218 (2011) 5. K. Kira, L.A. Rendell, The feature selection problem: Traditional methods and a new algorithm. Aaai 2, 129–134 (1992) 6. M.S. Mohamed, Feature selection method using genetic algorithm for the classification of small and high dimension data. in Proceeding International Symposium. Information Communication Technology (2004) pp. 13–16 7. H. Zhang, G. Sun, Feature selection using tabu search method’. Pattern Recogn. 35(3), 701–711 (2002) 8. I. Huang, Y. Cai, X. Xu, A hybrid genetic algorithm for feature selection wrapper based on mutual information. Pattern Recogn. Lett. 28(15), 1825–1844 (2007) 9. I. Ahmad, Feature selection using particle swarm optimization in intrusion detection. Int. J. Distrib. Sens. Netw. 11(10), 806954 (2015) 10. M. Suganthi, V. Karunakaran, Instance selection and feature extraction using cuttlefish optimization algorithm and principal component analysis using decision tree. Cluster Comput. 22(1), 89–101 (2019) 11. V. Karunakaran, M. Suganthi, V. Rajasekar, Feature selection and instance selection using cuttlefish optimisation algorithm through tabu search. Int. J. Enterprise Netw. Manage. 11(1), 32–64 (2020) 12. V. Karunakaran, S.I. Joseph, R. Teja, M. Suganthi, V. Rajasekar, A wrapper based feature selection approach using bees algorithm for extreme rainfall prediction via weather pattern recognition through svm classifier. Int. J. Civil Eng. Technol. (IJCIET) 10(1) (2019) 13. O. Gokalp, E. Tasci, A. Ugur, A novel wrapper feature selection algorithm based on iterated greedy metaheuristic for sentiment classification. Expert Syst. Appl. 146, 113176 (2020) 14. S. Mahendru, S. Agarwal, Feature selection using Metaheuristic algorithms on medical datasets. in Harmony Search and Nature Inspired Optimization Algorithms (Springer, Singapore, 2019), pp. 923–937
506
V. Karunakaran et al.
15. A.A. Lyubchenko, J.A. Pacheco, S. Casado, L. Nuñez, An effective metaheuristic for biobjective feature selection in two-class classification problem. J. Phys.: Conf. Series 1210(1), 012086 (2019) IOP Publishing 16. S. Kullback, R.A. Leibler, On information and sufficiency. Ann. Math. Stat. 22(1), pp. 79–86. (1951) 17. T.M. Cover, J.A. Thomas, ‘Elements of information theory’, 2nd edition. Hoboken, Wiley-Inter science (2006)
Recent Trends in Epileptic Seizure Detection Using EEG Signal: A Review Vinod J. Thomas and D. Anto Sahaya Dhas
Abstract Epilepsy is a neurological brain disorder. Seizures are the key characteristics of epilepsy. Seizures occur in a community of neurons in certain parts of the cerebral cortex during which rapid bursts occur in uncontrolled electrical activity. More than 2% of the people in the world suffer from this brain disorder. Any abnormality in brain functionality can be easily identified by the use of an electroencephalogram (EEG). As the EEG recordings are usually a few hours long, the visual analysis of the EEG signal is a time consuming and laborious job. With the advancement in signal processing and digital methods, several systems can automatically detect seizures from EEG signal. This article reviews the state-of-the-art methods and concepts which give an orientation for future research work in the field of seizure identification. Here, various methods for the detection of epileptic seizures that mainly differ in feature extraction techniques were discussed. Different feature extraction techniques that involve extracting features from the frequency domain, time domain, frequency and time domain combinations, wavelet transform, and powerful deep learning methods are discussed with a comparison of the methods. When considering the EEG signal’s nonstationary existence and different artifacts affecting the signal, feature extraction based on deep learning approaches is found to be robust and appropriate for the identification of seizures. Keywords Seizure detection · Discrete wavelet transform · Epilepsy · Encephalogram · Artificial neural network · Deep learning
V. J. Thomas (B) · D. Anto Sahaya Dhas Vimal Jyothi Engineering College Chemperi, APJ Abdul Kalam Technological University, Thiruvananthapuram, Kerala, India e-mail: [email protected] D. Anto Sahaya Dhas e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_41
507
508
V. J. Thomas and D. Anto Sahaya Dhas
1 Introduction World Health Organization (WHO) estimates about 50 million people are affected by epilepsy. At a given period, the average proportion of the general population with active epilepsy is between 4 and 10 per 1000 individuals. An estimated five million individuals are diagnosed with epilepsy each year worldwide. It is estimated that 49 per 100,000 people with epilepsy are diagnosed each year in high-income countries. This number may be as high as 139 per 100,000, in low- and middle-income countries [1]. A person with epilepsy suffers from seizures characterized mainly by recurrent, unpredictable, and uncontrolled electrical surges which really affect the normal life of an individual. Some seizures are very hard to observe because they just lead to a minor confusion of the mind, while others seldom cause loss of consciousness leading to injury or fatality. Epileptic seizures are generally categorized as generalized seizures and partial seizures depending on the place at which the seizure begins. Partial seizures are also known as focal epilepsy. Some partial seizures that require a surgical operation in which a small portion of the cortex is removed or may be cured. The EEG is the primary examination for epilepsy diagnosis and the compilation of information on the type and location of seizures. Figure 1 illustrates the different forms of EEG signals that are used in the detection of epileptic seizures with the corresponding spectrum. They were taken from the Bonn University database. Set-A displays the EEG signals taken with eyes open from healthy subjects and Set-B suggests the same with eyes closed. Set-C and Set-D are interictal type, and epileptic signals. The epileptic signals reported during the seizure are indicated by Set-E.
Fig. 1 Different types of EEG signals and their spectra. ( Source Reference [43])
Recent Trends in Epileptic Seizure Detection …
509
Several feature extraction techniques have been developed for the automatic detection of epileptic seizures. Most techniques use hand-designed features taken from the frequency domain, time domain, time and frequency domain combinations. Tzallas et al. [2], Correa et al. [3], Chan et al. [4], Aarabi et al. [5], Srinivasan et al. [6], Meier et al. [7, 8], Minasyan et al. [9], and Abibullaev et al. [10] have suggested that many of the other researches use features derived from the wavelet domain. Nowadays, there are more efficient systems that use deep learning/machine learning approaches for feature extraction. Some other works use a different technique. For example, the authors in [11–13] use empirical mode decomposition. The authors in [14, 15] use a rational function, and the authors in [16] use a statistical-based approach. But due to the adaptability and self-learning capabilities, machine learning approaches are found to be robust and appropriate in many situations. Suppose, a dataset that includes positive and negative cases for a particular two-class problem have true positives (TP) and true negatives (TN). The algorithm which is selected may predict true positive cases as negative cases and vice versa; therefore, a false positive (FP) and false negative (FN) cases based on these values. They are indicated by FP and FN, respectively. Following are the main performance metrics used to evaluate the usefulness of different approaches. Sensitivity =
TP × 100 (TP + FN)
Specificity =
TN × 100 (TN + FP)
Accuracy =
(TP + TN) × 100 (FN + TP + FP + TN)
This article is structured as follows. Section 2 gives an overview of the seizure detection system. Section 3 provides an idea of wavelet transformation and the domain of time–frequency. Section 4 addresses different seizure detection methods in which characteristics are obtained from the time domain, frequency domain, and a mixture of both. Section 5 illustrates the latest techniques which employ deep learning approaches for feature extraction. Each section outlines the central idea of the papers along with the methods adopted for feature extraction and classification. The results obtained for each work and the input data used for the corresponding system also are discussed. Section 6 includes a conclusion and future research direction in this area.
2 Overview of the Detection Systems A general block diagram representation of all seizure detection methods is represented in Fig. 2. The brain’s electrical activity is often polluted by different noise and artifacts. It can impact the precision of detection. Therefore, to remove all sources
510
V. J. Thomas and D. Anto Sahaya Dhas
Fig. 2 General block diagram of the seizure detection system
of noise from the EEG data, proper preprocessing has to be performed. EOG, ECG, and EMG are the key sources of artifacts influencing the EEG signal. There are many techniques in the literature for signal conditioning. Klados et al. [17] use regression techniques for the removal of artifacts. Safieddine et al. [18] use a wavelet-based decomposition approach for the removal of artifacts. Casarotto et al. [19] used principal component analysis (PCA) to eliminate ocular artifacts and showed that PCA is more effective computationally than the methods of linear regression. The artifacts are removed by the independent component analysis (ICA) which are used by many authors [20, 21]. In addition to this, different filtering approaches are often used to eliminate artifacts, including adaptive filtering [22], wiener filtering [23], and sparse methods of decomposition [24]. The pre-processed signal is applied to the feature extractor block and various features are derived. Then the dimensionality of the feature set is be reduced by feature selection algorithms. Finally, the reduced feature set is applied to the classification unit and the output signal is an indication of seizure/non-seizure activity.
3 Wavelet Transform Fourier transform cannot provide time–frequency localization. Hence, the wavelet representation can eliminate this problem. These wavelets are wave-like functions and can have a finite duration, zero average value, and finite energy. The wavelet series uses a continuous-time signal that can be represented as f (t) =
s
ar 0,s r 0,s (t) +
∝ r =r 0
br,s r,s (t)
(1)
s
The signal is decomposed into different levels having a different resolution. The first part in the above equation is called the approximation function and the second part is called detail functions. Equation (1) represents the basic multi-resolution framework for representing any signal using a wavelet transform.
Recent Trends in Epileptic Seizure Detection …
511
Here r 0,s (t) and r,s (t) are the basis functions used to represent the signal and are analogous to the complex exponential signal in the Fourier transform. They actually represent the dilated and translated versions of basic functions called (t) and (t), respectively. Here, (t) is called the scaling function and (t) is called the wavelet function. There are different combinations of scaling and wavelet functions depending on which type of wavelet is used in a particular study. In Eq. (1), ar 0,s and br,s are called wavelet transform coefficients and they are analogous to Fourier coefficients in Fourier transform. where, r 0,s (t) = 2r 0/2 2r 0 t − s
(2)
Equation (2) represents the multiplication of scaling function translated by a unit “s” and dilated by 2r 0 , with the factor 2r 0/2 . r,s (t) = 2r/2 2r t − s
(3)
Equation (3) represents the multiplication of wavelet function translated by a unit “s” and dilated by 2r , with the factor 2r/2 . The factors ar 0,s and br,s in (1) are given by ar 0,s =
f (t)r 0,s (t)dt
(4)
f (t)r,s (t)dt
(5)
br,s =
Equations (4) and (5) represent wavelet transform coefficients and they are also known as approximation and detail coefficients. The advantage of wavelet coefficients is that each wavelet coefficient is a function of two variables r and s. Here, r and s represent the scaling and shifting of the wavelet, respectively. Therefore, the wavelet coefficients actually insist the contribution of a wavelet which is dilated and translated by some value. By taking the Fourier transform of the dilated and translated version of the basic wavelet, we can determine the frequency content. The parameter s gives information about the time instant where this wavelet occurs. Thus, the wavelet transform can provide time–frequency localization. One can choose a wavelet that is suitable for a particular application. This feature is very important when a nonstationary signal is analyzed like EEG. In any application, finding the suitable wavelet function, the number of decomposition stages, and then finding the appropriate features from these sub-bands are a challenging task.
512
V. J. Thomas and D. Anto Sahaya Dhas
4 Time Domain/Frequency Domain-Based Methods There are twelve articles summarized here. The basic aim is to demonstrate various approaches, interrelationships between approaches, and different opportunities in the same area to achieve changes in seizure detection. Hernández et al. [25] suggested a method to detect epilepsy by extracting 52 features from the time domain, frequency domain, and a combination of the two domains above. The set of temporal features that are implemented in this work are signal power, average value, first and second differences, the normalized version of first and second differences, and standard deviation. Some other temporal characteristics introduced in this work that attempt to account for the EEG signal’s nonstationary existence are Hjorth features [26], nonstationary index [27], higher-order crossings [28]. Some of the most common spectral features are derived from the continuum of power density that can be obtained by measuring the signal’s Fourier transform. Short-time Fourier transform is used here with the Hamming window and the last set of features derived in this research was from the time–frequency domain. To obtain the characteristics, the discrete wavelet transform used here is 10 distinct classification models. The freely accessible Bonn University dataset is used in this work. In terms of accuracy in the state of the art, the authors find that their models are same in performance and, in some cases, outperform the classifiers. Sriram and Raghu [29] present an approach in which a total of 26 features have been derived from the time domain, frequency domain, information theory, and statistical approach for the detection of focal and non-focal epileptic seizures. They used the Bern-Barcelona database for this purpose. Wilcoxon rank-sum test was conducted and they found that five features are insignificant for classification. They also found that removing outliers from the feature set improves classification accuracy. By using Turkey’s range test, the outliers were eliminated. Then, the 21 characteristics selected are fed to the optimized support vector machine classifier and obtained an accuracy of 92.15%, 89.74% precision, and 94.56% sensitivity. The result shows that the proposed method outperforms the existing methods which use the same database and employ some signal splitting techniques like empirical mode decomposition and discrete wavelet transform. Rational transform is a depiction of the time–frequency domain based on rational functions. It is a free parameter-based approach that uses some optimization algorithms such as particle swarm optimization (PSO) [30] to pick optimal bases. Samiee et al. [31] carried out seizure detection by using a rational discrete short-time Fourier transform. Here, the authors used PSO to find the optimal location of each EEG epoch’s pole, which gives the device of a compact time–frequency representation. To determine the efficiency of the proposed framework, the authors used the Bonn University dataset. With various algorithms such as Naïve Bayes, logistic regression, support vector machine, k-nearest neighbors, and multi-layer perceptron architectures (MLP), the system’s efficiency is evaluated. The selected feedforward MLP
Recent Trends in Epileptic Seizure Detection …
513
serves as the optimal classifier after a sequence of experiments. They had a sensitivity of 99.9%, a precision of 99.6%, and an accuracy of 99.8% for the two-class Bonn university dataset (E-A) classification. Wang et al. [32] suggested the identification of epileptic seizures by using extraction of multi-domain features and nonlinear analysis. A wavelet threshold denoising procedure is used to eliminate the dynamic low-frequency noise associated with the EEG signal. The signal is broken down into five frequency sub-bands using the fourth-order Daubechies wavelet, and the maximum, minimum, mean, and total variance measurements of the wavelet coefficients are derived from each subband. The characteristics derived from the time domain include mean, variance, variation coefficient, and total variation. The relative power spectral density determined from the fast Fourier transform coefficients is derived from the frequency domain. Due to the nonstationary nature of the EEG signal, features extracted via linear analysis may be sufficient. Therefore, using empirical mode decomposition and intrinsic mode functions, several additional features are extracted. By using principal component analysis and variance analysis, the dimensionality of the feature set is reduced. The dataset used for this study is Bonn university data. The set of input features are fed to a set of classifiers including linear discriminant analysis, Naïve Bayesian, k-nearest neighbor, logistic regression, and support vector machine. The proposed seizure detection method achieved an average accuracy of 99.25%. A seizure detection method based on time–frequency analysis was proposed by Tzallas et al. [33] using short-time Fourier transform, power spectral density, and many other time–frequency distributions. Different distributions of time–frequency used in this analysis are: Margenau-Hill, Wigner-Ville, Rihaczek, pseudo Margenau– Hill, pseudo Wigner-Ville, Born-Jordan, Butterworth, Choi-Williams, generalized rectangular, reduced interference, smoothed pseudo Wigner-Ville, and Zhao-AtlasMarks. By dividing the time by three equal-sized windows and frequency by five sub-bands, a time–frequency grid is created. By integrating the power spectral density over the time–frequency windows, the characteristics are extracted. Additonally, the total signal energy is also used as a feature. The principal component analysis is used here to decrease the size of the feature set and an artificial neural network is used as a feature set to be graded. A seizure detection approach based on time, frequency, time–frequency parameters and nonlinear analysis was proposed by Gajic et.al [34]. The five sub-bands of EEG signals that are of clinical interest [delta (0–4 Hz), theta (4–7 Hz), beta (14–30 Hz) and gamma (31–64 Hz)] are considered in this technique. Then, using time, frequency, time–frequency, and nonlinear methods, the different characteristics were extracted from these sub-bands. By using scatter matrices, the dimensionality of the function space is minimized. The reduced two-dimensional feature space is applied to a quadratic classifier to detect epileptic activity. The authors used the Bonn University dataset to detect the accuracy of the model. They reached a detection accuracy of 98.7% overall. Observation: The frequency domain approaches are ideal to have long data records. But time information will be absent. Even though the time domain methods are fast, it does not give information about frequency contents of the signal. A combination of the features from these two domains will give a better result. But a transform
514
V. J. Thomas and D. Anto Sahaya Dhas
that can inform us about time and frequency behavior simultaneously is preferable for efficient signal analysis. The wavelet transform can serve this purpose and the following are a set of papers that employ different possibilities of wavelet transform for feature extraction. Li et al. [35] used wavelet-based envelope analysis and neural network ensemble to detect normal, interictal, and epileptic signals. Here, the signal is decomposed into five levels or sub-bands using discrete wavelet transform. Daubechies wavelet of fourth order is used and the sub-bands (d1-d5 and a5) are applied to a Hilbert transformer and the sub-band envelopes resulting from it are obtained. The features selected are the average value, signal energy, standard deviation, and the peak value of the envelope spectrum in every sub-band. The features are classified by using the neural network ensemble (NNE). NNE is a mixture of neural networks and each has no association with other networks. Here also Bonn university dataset is used and the accuracy obtained is 98.78%. The advantage of this method is that by combining DWT with envelope analysis, they could improve the discrimination of the features. The authors tested the performance of the algorithm with different classifiers and they found that compared to other classifiers such as back propagation algorithm, KNN, support vector machine, and linear discriminant analysis, NNE gives better accuracy. For the identification of normal, pre-seizure and seizure EEG signals, Harpale and Bairagi[36] used pattern-adapted wavelet transform. An estimated wavelet was constructed by taking into account the seizure waveform from a single recording source. Continuous wavelet transform coefficients were calculated using patternadapted wavelet transform. The average power, variance coefficient, RMS value, power spectral density, etc., are different features derived using the above coefficients. These characteristics are fed to the fuzzy classifier for seizure, pre-seizure and regular EEG signs to be identified. The data collection used for this analysis was taken from the Boston Children’s Hospital CHB-MIT Scalp Dataset. The advantage of the method is that here the wavelet adapted to the situation is used to improve the accuracy. Moreover, pre-seizure EEG signal also is identified so that the patient can be alerted well before the seizure occurs. They achieved an overall classification accuracy of 96.48% and an accuracy of 96.02% for pre-seizure detection which is greater than the accuracy obtained by some other methods which use the same fuzzy classifier. For the detection of focal and non-focal EEG signals, Sharma et al. [37] used localized orthogonal wavelet filter banks with time–frequency. Epileptic signals are referred to as focal EEG signals, whereas signals obtained from the non-epileptic region of the brain are referred to as non-focal EEG signals. In this paper, the authors built a new class of filter banks that have a better localization of time–frequency than the equivalent Daubechies filter. The signal is decomposed using the above filter banks and wavelet coefficients were calculated. Then using these coefficients, different entropy-based features are extracted. The dataset used for this analysis was taken from the database of Bern-Barcelona. They could achieve a maximum accuracy of 94.25%, which is much greater than many other works using the same database to distinguish focal and non-focal EEG signals. The importance of this method is that
Recent Trends in Epileptic Seizure Detection …
515
some focal epilepsy can be cured by surgically removing the portions of the brain which contribute to epilepsy. For the detection of seizures, Kumar and Kolekar [38] have used a wavelet-based study. The signal is decomposed into five frequency sub-bands, namely delta, theta, alpha, beta and gamma, using discrete wavelet transform. But the relevant information regarding the seizure is carried by only three bands: theta, alpha, and beta. From these three sub-bands, distinct characteristics such as energy, variance, zerocrossing rate, and fractal dimension were assessed. The above features were applied to support vector machine classifiers. When compared to the current approaches, they achieved very high sensitivity and accuracy. Features such as fractal dimension, zero-crossing rate, and SVM kernel functions such as polynomial and Gaussian radial basis functions were responsible for the improved sensitivity. A new method using wavelet decomposition of multichannel EEG data is proposed by Kaleem et al. [39]. EEG signals from all channels are divided into segments of four seconds duration. These segments are decomposed into four frequency sub-bands like the previous work using discrete wavelet transformation [38]. Then, from each of these sub-bands, three characteristics are extracted. Feature 1 is the energy of the wavelet transform-related components of approximation and detail functions. Feature 2 is the amplitude spectrum sparsity, and feature 3 is the sum of amplitude spectrum derivative. CHB-MIT dataset is used for validation of the proposed methodology. They obtained 99.6, 99.8 and 99.6% accuracy, sensitivity, and specificity values, respectively. The obtained results are found to be outperforming some of the existing methods. Two aspects distinguish this method from many other methods. The first one is that there is no feature processing is required. The second is the feature that are extracted in a computationally efficient manner. A system in which the input signal is decomposed into six levels was proposed by Chen et al. [40]. But the important difference from all other approaches is that the inherent down-sampling operation associated with the wavelet decomposition is absent here. This method is called non-subsampled discrete wavelet transform. Therefore, the number of coefficients at any level is the same as that of the original input signal. Then forward one-dimensional FFT is applied to the detail coefficients at levels 3, 4, 5, and 6. Because these levels contain significant information associated with the seizure. The resulting Fourier spectra are used as feature vectors and are fed to the nearest neighbor classifier to identify epileptic seizure activity. Bonn university data is used in this study. The experimental results showed that the proposed method achieved 100% accuracy for the identification of EEG seizures. Observation: So far, various methods for epilepsy identification that uses domainbased (time, frequency, time–frequency) approaches were discussed. But these domain-based methods have some disadvantages. First, they are susceptible to detrimental variations in seizure patterns. This is because EEG data is nonstationary and its statistical characteristics can differ over time across patients and for the same patient. The presence of various artifacts that affect EEG signals like eye blinking, muscle artifacts, etc., were ignored for the data acquisition systems. These artifacts
516
V. J. Thomas and D. Anto Sahaya Dhas
Fig. 3 EEG signals corrupted by artifacts Source Reference [43]
are capable of modifying the original EEG characteristics and can influence the efficiency of the detection system. The effect of artifacts on the pure EEG signal is represented in Fig. 3. In this, (a) represents the EEG signal as clean. (b)–(d) reflects ictal signals damaged by muscle artifacts, eye blinking and white noise. Finally, most of the existing seizure detection systems use small-scale datasets collected from a small number of patients and hence may not be useful in clinical applications. Therefore, to overcome these disadvantages, the automatic seizure detection systems use deep neural networks for feature extraction. Features extracted using deep learning models are more discriminative and powerful than the above methods. The following are a set of articles that use machine learning/deep learning approaches for feature extraction.
5 Deep Learning/Machine Learning Approaches Several works in the seizure recognition literature use deep learning techniques for extracting features. Of these, the first was by Acharya et al. [41]. For seizure recognition, they used a 13-layer deep convolutional network. The first 10 layers were specifically for the extraction of features. For classification, the last three completely connected layers were used. In this research, the University Dataset of Bonn was
Recent Trends in Epileptic Seizure Detection …
517
used with a ten-fold cross-validation approach. The gain was that the separate steps for feature extraction and feature selection were not necessary. But to achieve the best results, the proposed approach needs a huge amount of training data. With an accuracy of 88.7%, this method could detect normal, preictal, and seizure groups. Lu et al. [42] identified epileptic seizure activity using a convolutional neural network with residual blocks. The residual networks involve neglect connections that use shortcuts to traverse over some layers. The performance of the layers is avoided to eradicate the gradient problem. A fixed-size kernel slides over an input that has a fixed stride. For each kernel, this operation gives a function map. Nonlinear activation functions are used to map input and output to simulate the nonlinear behavior of nerve cells. To avoid the problem of overfitting, batch-normalization and drop-out methods are used. Then fully linked layers are used for classification with the SoftMax activation function. The Bonn dataset is used to perform a three-class classification problem which categorizes the input signal into healthy, unhealthy, and seizure types. The obtained accuracy was 99%. To classify the given signal into focal and non-focal EEG signals, they used the Bern-Barcelona dataset. In this case, the accuracy obtained was 91.8%. Hussein et al. [43] developed a method that uses L1-penalized robust regression for feature learning. Here, rather than considering a large number of features, only a small number of very relevant features is used. The insignificant features are suppressed. This is especially good for less training data. L1 regularization essentially makes the feature vector sparse (smaller) by making most of its components zero. The remaining nonzero components are very useful. Therefore, by using this method, the most useful features associated with seizures can be identified. Here, EEG spectrum is considered and the extracted information is fed to a random forest classifier. One advantage of this method is that it performs well in ideal as well as real-life conditions. Since the used Bonn dataset did not have artifacts, models developed for the artifacts are used in this study to mimic the behavior of muscle, eye blinks, and white noise artifacts. The authors could obtain an accuracy of 100% for ideal conditions and an accuracy in the range of 90–100% for EEG data corrupted with noise, which is much better than the existing methods. Ulla et al. [44] employed a deep learning approach focused on one-dimensional pyramidal convolutional neural network (P1CNN) ensembles. Even though the CNN model excels in the domain (time, frequency)-based methods, the primary issue is that large amounts of training data are needed. It uses a refinement technique to solve this problem such that 61% fewer parameters are involved compared to traditional CNN approaches. This method used the Bonn university dataset. To overcome the limitation associated with a lesser amount of data, the authors propose two augmentation schemes which basically generate many instances from one dataset by using a sliding window method. This is the main feature of this work. The obtained accuracy was 99.1 ± 0.9%. A seizure detection system based on signal transform and convolutional neural networks was proposed by San-Segundo et al. [45]. For feature extraction, it uses two convolutional layers and a fully linked network of three layers for feature classification. Various models used as input are the deep learning model, Fourier transforms,
518
V. J. Thomas and D. Anto Sahaya Dhas
wavelet transform, and empirical mode decomposition. The inputs are arranged as an M × N matrix and the dimension depends on the signal transform used. The activation function rectified linear unit (ReLU) was used in all intermediate layers. The probability is reduced for the gradient vanishing problem. For two-class classification, the number of output layer has only one output and sigmoid activation function was used. For multi-class problem, the output layer contains many outputs corresponding to the number of classes and in that case SoftMax activation function was used. This work focuses on two key problems, the first of which is the detection of EEG focal and non-focal signals. The second one is the identification of normal seizures. Therefore, in this research, two sets of data were used, namely Bern Barcelona dataset and the epileptic seizure recognition dataset. The best accuracy was obtained when the Fourier transform was used to generate inputs to the deep learning model. The accuracy obtained was 99.5% for a two-class problem. Hussein et al. [46] used a deep neural network with optimized model architecture design for seizure detection as an alteration to their previous work [43]. Initially, to learn the high-level representation of various EEG waveforms, a deep long shortterm memory network (LSTM) is used. To consider the most important features associated with seizures, a fully connected layer is utilized. For classification, the extracted features are fed to a SoftMax network layer. The basic function of LSTMs is the recall of data over long periods of time. There are many advantages to this method. In the noisy and real-life circumstances, this method is efficient. It considered the presence of various artifacts that can harm an EEG signal. Since recurrent neural network (RNN) and LSTM are used, it can efficiently exploit the time dependencies in the EEG signals. This study used a dataset from Bonn University and obtained 100 percent accuracy for two-class, three-class, and five-class problems. But the pre-seizure activity was not identified by this approach. A cost-sensitive deep learning approach for seizure detection was proposed by Chen et al. [47]. A double deep neural network was created by them. As a classifier, the first DNN is used and the second DNN is used to classify the expense of misclassification. Thus, in the unlabeled data pool, they have created a utility feature to select the most insightful samples. The authors first used a one-dimensional CNN with several filters for feature learning. The layer progresses with a fully layer having a SoftMax function in the last layer. Then, they employed a recurrent neural network (RNN) with an LSTM unit and finally an RNN with a gated recurrent unit (GRU). All three methods were implemented in the double DNN model. Bonn university data was used in this study. The results showed that the best accuracy was obtained with one-dimensional CNN (97.27%). The accuracies obtained with LSTM and GRU were 96.82% and 96.67%, respectively. In the classification of safe, interictal, and ictal EEG signals, Zhang et al. [48] used a deep learning approach focused on the temporal convolutional neural network (TCNN). This method automatically learns features from the input data without any preprocessing. It is found that TCNN is more accurate, simpler, and clearer than recurrent neural networks such as LSTM, especially, in sequence modeling. In a temporal convolutional neural network, there exists a causal relationship between the network layers, so that there will be no missing information from the data. It mainly
Recent Trends in Epileptic Seizure Detection …
519
involves a process called causal convolution in addition to the one-dimensional fully convolutional neural network. Even though LSTM has a memory gate, it cannot remember all the data. Moreover, the model architecture of TCNN can be adjusted to any length. The proposed method used the Bonn University dataset for training and testing. They addressed fourteen different classification problems and the best accuracy of 100% was obtained with two-class problem (epileptic and non-epileptic). Observation: Feature extraction based on deep learning approaches eliminates the need for hand-crafted features and avoids the pre-processing step associated with feature extraction and makes the whole process fully automatic. Many deep learning methods in the literature perform better in feature extraction and classification. Onedimensional convolutional neural networks and recurrent neural networks are found to perform better among all the available methods for extracting features associated with the EEG time series data (Table 1).
6 Conclusion The objective of this proposed work is to review different methods for detecting seizures and to provide researchers in the same field with useful research guidelines. In this article, various methods for epileptic seizure detection that mainly differ in feature extraction techniques are discussed. Most of the works use temporal and spectral features or a combination of both. It is found in many cases that when the features from two domains are combined, it yields better results compared to the methods which use features from a single domain. This is because the EEG signal is nonstationary in nature and can differ between patients. The wavelet transform is a very powerful tool in EEG analysis as it can provide time–frequency localization. The selection of appropriate wavelets with suitable decomposition levels produces promising results. The EEG analysis was similarly performed by raw signal processing, extraction of features, choice of features, and classification of features. Most of the recent works use deep learning approaches for feature extraction. The deep learning approaches can remove the separate steps of feature extraction, feature selection, and can render the whole process for fully automated. Among various deep learning approaches, one-dimensional CNN and recurrent neural networks are very much appropriate for analyzing the EEG time series data. Various classifiers like support vector machine, k-nearest neighbor, artificial neural network, decision trees, random forest classifier, etc., are found to be producing very good results in seizure detection under different circumstances. But the selection of best algorithm for feature extraction and classification still need further investigations.
Sriram & Raghu [29]
Samiee et al. [31]
Wang et al. [32]
Tzallas et al. [33] 2009
Gajic et al. [34]
2
3
4
5
6
2015
2017
2013
2017
2018
Hernández et al. [25]
1
Year
Author
Sl. No.
Time domain, frequency domain, nonlinear analysis
Short-time Fourier transform, various time–frequency distributions
Time domain, frequency domain, nonlinear analysis
Rational discrete STFT
Time domain, frequency domain, information theory, statistical based
Time domain, frequency domain
Feature extraction method
Table 1 Summary of various seizure detection methods
Accuracy: 92.15% Precision: 89.74% Sensitivity: 94.56%
Accuracy: 94.25% (Five-class problem with SVM classifier)
Performance metrics
Quadratic classifier
Artificial neural network
Accuracy: 98.7%
Accuracy: 89% (Five-class problem)
Naïve Bayes, logistic Accuracy: 99.25% regression, SVM, kNN etc
Naïve Bayes, logistic Accuracy: 99.8% regression, SVM, Precision: 99.6% kNN etc Sensitivity: 99.9%
Support vector machine (SVM)
Ten different classifiers
Classifier
Advantages/limitations
Bonn University data
(continued)
(Sl. No. 1–6) Time domain features lack frequency information. Frequency Bonn University data domain features lack temporal characteristics. A combination of the two features gives better Bonn University data results. But major drawback is that the time frequency localization is absent Epileptic seizure recognition dataset
Bern-Barcelona database
Bonn University data
Dataset used
520 V. J. Thomas and D. Anto Sahaya Dhas
Author
Miyang Li et al. [35]
Harpale and Bairagi[36]
Manish Sharma et al. [37]
Kumar and Kolekar [38]
Kaleem et al. [39]
Chen et al. [40]
Sl. No.
7
8
9
10
11
12
Table 1 (continued)
2017
2018
2014
2017
2018
2017
Year
Discrete wavelet transform (Non subsampled)
Discrete wavelet transform
Discrete wavelet transform
Localized orthogonal wavelet filter banks
Pattern-adapted wavelet transform
Wavelet transform
Feature extraction method
k-nearest neighbor
SVM, Naïve Bayes, k-nearest neighbor, linear discriminant analysis, classification tree
Support vector machine
Least squares-support vector machine
Fuzzy classifier
Neural network ensemble
Classifier
Accuracy: 100%
Accuracy: 99.6% Specificity: 99.6% Sensitivity: 99.8%
Sensitivity: 98%
Accuracy: 94.25% Specificity: 96.56% Sensitivity: 91.95%
Accuracy: 96.48%
Accuracy: 98.78%
Performance metrics
Advantages/limitations
Bonn University data
(continued)
Bonn University data (Sl. No. 7–12) It provides time–frequency localization and Epileptic seizure produces good results. recognition dataset But the signal detection CHB-MIT scalp in the presence of noise dataset Boston and artifacts makes the situation worse. The nonstationary existence of EEG signal also makes the results unpredictable. Therefore, better methods that eliminate these drawbacks are needed
CHB-MIT scalp dataset Boston
Bonn University data
Dataset used
Recent Trends in Epileptic Seizure Detection … 521
Lu et al. [42]
Hussein et al. [43]
Ulla et al. [44]
Segundo et al. [45]
Hussein et al. [46]
Chen et al. [47]
Zhang et al. [48] 2018
14
15
16
17
18
19
20
2018
2019
2019
2018
2018
2019
2017
Acharya et al. [41]
13
Year
Author
Sl. No.
Table 1 (continued)
Double deep neural network
SoftMax network layer
Fully connected convolutional layers
Bonn University data
Dataset used
Accuracy: 100% (ideal conditions) 90–100% (real-life conditions)
Accuracy: 97.27%
Accuracy: 100%
Accuracy: 99.5% (two-class problem)
Advantages/limitations
Bonn University data (Sl. No. 13–20) Feature extraction based on deep learning approaches eliminates Bonn University data the need of hand-crafted features and avoid the pre-processing step Bern Barcelona associated with feature dataset and epileptic extraction and makes seizure recognition the whole process fully dataset automatic. Bonn University data One-dimensional convolutional neural Bonn University data networks and recurrent neural networks are found to perform better Bonn University data among all the available methods
Accuracy: 99% (Bonn Bonn University, dataset) Bern-Barcelona Accuracy: 91.8% datasets (Bern-Barcelona dataset)
Accuracy: 88.7%
Performance metrics
Convolutional neural Accuracy: 99.1 ± network ( CNN) 0.9%
Random forest classifier
Fully connected layers
Fully connected layer (3 layers)
Classifier
Temporal Convolutional neural Accuracy: 100% convolutional neural network (Two-class problem) network (TCNN)
One-dimensional CNN
Deep neural LSTM networks
Convolutional layers(two)
Pyramidal CNN
L1 penalized robust regression
Convolutional neural network with residual blocks
Deep convolutional neural network
Feature extraction method
522 V. J. Thomas and D. Anto Sahaya Dhas
Recent Trends in Epileptic Seizure Detection …
523
References 1. https://www.who.int/news-room/fact-sheets/detail/epilepsy 2. A.T. Tzallas, M.G. Tsipouras, D.I. Fotiadis, Automatic seizure detection basedon timefrequency analysis and artificial neural networks. Comput. Intell. Neurosci. 2007(4), 80510 (2007) 3. A.G. I Correa, E. Laciar, H. Patiño, M. Valentinuzzi, Artifact removal from EEG signals using adaptive filters in cascade. J. Phys. Conf. Ser. 90, 012081 (2007) 4. A.M. Chan, F.T. Sun, E.H. Boto, B.M. Wingeier, Automated seizure onset detection for accurate onset time determination in intracranial EEG. Clin. Neurophysiol. 119, 2687–2696 (2008) 5. A. Aarabi, R. Fazel-Rezai, Y. Aghakhani, A fuzzy rule-based system for epileptic seizure detection in intracranial EEG. Clin. Neurophysiol. 120, 1648–1657 (2009) 6. V. Srinivasan, C. Eswaran, N. Sriraam, Artificial neural network based epilepticdetection using time-domain and frequency domain features. J. Med. Syst. 29(6), 647–660 (2005) 7. R. Meier, H. Dittrich, A. Schulze-Bonhage, A. Aertsen, Detecting epileptic seizures in longterm human EEG: a new approach to automatic online and real-time detection and classification of polymorphic seizure patterns. J. Clin. Neurophysiol. 25, 119–131 (2008) 8. J. Mitra, J.R. Glover, P.Y. Ktonas, A.T. Kumar, A. Mukherjee, N.B. Karayiannis et al., A multistage system for the automated detection of epileptic seizures in neonatal EEG. J. Clin. Neurophysiol. 26, 218 (2009) 9. G.R. Minasyan, J.B. Chatten, M.J. Chatten, R.N. Harner, Patient-specific early seizure detection from scalp EEG. J. Clin. Neurophysiol. 27, 163 (2010) 10. B. Abibullaev, H.D. Seo, M.S. Kim, Epileptic spike detection using continuous wavelet transforms and artificial neural networks. Int. J.Wavelets Multiresolut. Inf. Process. 8, 33–48 (2010) 11. A.K. Tafreshi, A.M. Nasrabadi, A.H. Omidvarnia, Epileptic seizure detection using empir˙ ical mode decomposition, in Proceedings of the IEEE International Symposium on Signal ˙ Processing and Information Technology, ISSPIT, Sarajevo, December 16–19, 2008, pp. 16–19 12. L. Orosco, E. Laciar, A.G. Correa, A. Torres, J.P. Graffigna, An epileptic seizures detection algorithm based on the empirical mode decomposition of EEG, in Proceedings of the International Conference of the IEEE EMBS, Minneapolis, MN, USA, September, 2009, pp. 3–6 13. C. Guarnizo, E. Delgado, EEG single-channel seizure recognition using empirical mode ˙ decomposition and normalized mutual information, in Proceedings of the IEEE International Conference on Signal Processing (ICSP), Beijing, October 24–28, 2010, pp. 1–4 14. P.S.C. Heuberger, P.M.J. Van den Hof, B. Wahlberg, Modelling and Identification with Rational Orthogonal Basis Functions (Springer, London, 2005). 15. S. Fridli, L. Lócsi, F. Schipp, Rational function systems in ECG processing the project is supported and financed by the European Social Fund (grant agreement no. TAMOP 4.2.1./B09/1/KMR-2010–0003) 16. A. Quintero-Rincón , M. Pereyra, C. D’Giano, M. Risk, H. Batatia, Fast statistical model-based classification of epileptic EEG signals. Biocybern. Biomed. Eng. 38, 877–889 17. M.A. Klados, C. Papadelis, C. Braun, P.D. Bamidis, REG-ICA: a hybrid methodology combining blind source separation and regression techniques for the rejection of ocular artifacts. Biomed. Signal Process. Control 10, 291–300 (2011) 18. D. Safieddine, A. Kachenoura, L. Albera, G. Birot, A. Karfoul, A. Pasniu, Removal of muscle artifact from EEG data: Comparison between stochastic (ICA and CCA) and deterministic (EMD and wavelet-based) approaches. EURASIP. J. Adv. Signal Process. 2012 (2012) 19. S. Casarotto, A.M. Bianchi, S. Cerutti, G.A. Chiarenza, Principal component analysis for reduction of ocular artefacts in event-related potentials of normal and dyslexic children. Clin. Neurophysiol. 115, 609–619 (2004) 20. R. Vigário, Extraction of ocular artifacts from EEG using independent component analysis. Electroencephalogr. Clin. Neurophysiol. 103, 395–404 (1997) 21. R. Vigário, J. Särelä, V. Jousmäki, M. Hämäläinen, E. Oja, Independent component approach to the analysis of EEG and MEG recordings. IEEE Trans. Biomed. Eng. 47, 589–593 (2000)
524
V. J. Thomas and D. Anto Sahaya Dhas
22. P. He, G. Wilson, C. Russell, Removal of ocular artifacts from electro-encephalogram by adaptive filtering. Med. Biol. Eng. Comput. 42, 407–412 (2004) 23. B. Somers, T. Francart, A. Bertrand, A generic EEG artifact removal algorithm based on the multi-channel Wiener filter. J. Neural Eng. 15 (2018) 24. D.L. Donoho, Sparse components of images and optimal atomic decompositions. Constr. Approx. 17, 353–382 (2001) 25. D.E. Hernández, L. Trujillo, E.Z. Flores, O.M. Villanueva, O. Romo-Fewell, Detecting epilepsy in EEG signals using time, frequency and time-frequency domain features, in Computer Science and Engineering—Theory and Applications, Studies in Systems, Decision and Control, ed. by M.A. Sanchez et al. (eds.) (Springer International Publishing AG, part of Springer Nature, Berlin, 2018), p. 143 26. H. Bo, EEG analysis based on time domain properties. Electroencephalogr. Clin. Neurophysiol. 29(3), 306–310 (1970) 27. J.M. Hausdorff, A. Lertratanakul, M.E. Cudkowicz, A.L. Peterson, D. Kaliton, A.L. Goldberger, Dynamic markers of altered gait rhythm in amyotrophic lateral sclerosis. J. Appl. Physiol. 88(6), 2045–2053 (2000) 28. P.C. Petrantonakis, L.J. Hadjileontiadis, Emotion recognition from eeg using higher order crossings. IEEE Trans. Inf. Technol. Biomed. 14(2), 186–197 (2010) 29. N. Sriraam, S. Raghu, Classification of focal and non-focal epileptic seizures using multifeatures and SVM classifier, J. Med Syst Arch. 41(10) (2017) ˙ 30. J. Kennedy, R.C. Eberhart, Particle swarm optimization, in Proceedings of IEEE International Conference on Neural Networks, vol IV (IEEE Service Center, Piscataway, NJ, 1995), pp. 1942– 1948 31. K. Samiee , P. Kovacs, M. Gabbouj, Epileptic seizure classification of EEG time-series using rational discrete short time fourier transform. IEEE Trans. Biomed. Eng. (2013) 32. L. Wang, W. Xue, Y. Li, M. Luo, J. Huang, W. Cui, C. Huang, Automatic epileptic seizure detection in EEG Signals using multi-domain feature extraction and nonlinear analysis. Entropy (2017) 33. A.T. Tzallas, M.G. Tsipouras, D.I. Fotiadis, Epileptic seizure detection in EEGs using time– frequency analysis. IEEE Trans. Inform. Technol. Biomed. 13(5) (2009) 34. D. Gajic, Z. Djurovic, J. Gligorijevic, S. Di Gennaro, I. Savic-Gajic, Detection of epileptiform activity in EEG signals based on time-frequency and non-linear analysis. Front. Comput. Neurosci. 9, 38 (2015). https://doi.org/10.3389/fncom.2015.00038 35. M. Li, W. Chen, T. Zhang, Classification of epilepsy EEG signals using DWT-based envelope analysis and neural network ensemble. Biomed. Signal Process. Control 31, 357–365 (2017) (2017) 36. V. Harpale, V. Bairagi An adaptive method for feature selection and extraction for classification of epileptic EEG signal in significant states. J. King Saud Univ. Comput. Inform. Sci. (2018) 37. M. Sharma, A. Dhere, R.B. Pachori , U. Rajendra Acharya, An automatic detection of focal EEG signals using new class of time–frequency localized orthogonal wavelet filter banks. Knowl. Based Syst. 118, 217–227 (2017) 38. A. Kumar, M.H. Kolekar, Machine learning approach for epileptic seizure detection using wavelet analysis of EEG signals, IEEE, 2014 International Conference on Medical Imaging, m-Health and Emerging Communication Systems (MedCom) 39. M. Kaleem, A. Guergachi, S. Krishnan, Patient-specific seizure detection in long-term EEG using wavelet decomposition.Biomed. Signal Process. Control 46, 157–165 (2018) 40. G. Chen, W. Xie , T.D. Bui , A. Krzyzak, Automatic epileptic seizure detection in EEG using nonsubsampled wavelet–fourier features. J. Med. Biol. Eng. https://doi.org/10.1007/s40846016-0214-0 41. U.R. Acharya, S.L. Oh, Y. Hagiwara, J.H. Tan, H. Adeli, Deep convolutional neural network for the automated detection and diagnosis of seizure using EEG signals. Comput. Biol. Med. (2017). https://doi.org/10.1016/j.compbiomed.2017.09.017 42. D. Lu, J. Triesch, Residual deep convolutional neural network for eeg signal classification in epilepsy (2019). arXiv preprint arXiv:1903.08100
Recent Trends in Epileptic Seizure Detection …
525
43. R. Hussein, M. Elgendi , Z. Jane Wang, R.K. Ward, Robust detection of epileptic seizures based on L1-penalized robust regression of EEG signals. Expert Syst. Appl. 104, 153–167 (2018) 44. I. Ullah, M. Hussain, E. Qazi, H. Aboalsamh, An automated system for epilepsy detection using EEG brain signals based on deep learning approach. Expert Syst. Appl. 107 (2018) 61–71 45. R. San-Segundo, M. Gil-Martín, Luis Fernando D’Haro-Enríquez, José Manuel Pardo”, Classification of epileptic EEG recordings using signal transforms and convolutional neural networks”. Comput. Biol. Med. 109, 148–158 (2019) 46. R. Hussein, H. Palangi , R.K. Ward, Z. Jane Wang, Optimized deep neural network architecture for robust detection of epileptic seizures using EEG signals. Clin. Neurophysiol. 130, 25–37 (2019) 47. X. Chen, J. Ji, T. Ji, P. Li, Cost-sensitive deep active learning for epileptic seizure detection, in Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics (2018), pp. 226–235 48. Zhang, J., Wu, H., Su, W., Wang, X., Yang, M., Wu, J., A new approach for classification of epilepsy eeg signals based on temporal convolutional neural networks, in 2018 11th International Symposium on Computational Intelligence and Design (ISCID), vol. 2 (IEEE, 2018), pp. 80–84
Measurement of Physiological Parameters Using Video Processing M. Spandana, Pavan Arun Deshpannde, Kashinath Biradar, B. S. Surekha, and B. S. Renuka
Abstract This paper proposes a non intrusive and contactless method for estimation of physiological boundaries such as Heart Rate (HR), Inter Beat Interval (IBI) and Respiration Rate (RR), these are imperative markers of patients physiological state and essential to screen. Be that as it may, the majority of the measurements methods are association based, for example sensors are associated with the body which is regularly confused and requires individual assistance. The proposed strategy recommends a straightforward, ease and non-contact approach for estimating different physiological boundaries using a web camera continuously. Here, the breath rate and heartrate are gotten with the assistance of facial skin shading variation caused by body blood course. Signal preparing techniques such as autonomous segment examination (ICA), Fast FourierTransform (FFT), and Principal segment investigation (PCA) have been applied on the shading diverts in video accounts and the blood volume beat (BVP) is removed from the facial regions. The elements to be measured and looked at are IBI, HR, and RR. A decent understanding was accomplished between the measurements across every single physiological boundary. The proposed strategy has significant potential for progressing telemedicine and personal health care. Keywords Principal segment investigation · Independent component analysis · Fast fourier transform · Eulerian magnification · Inter beat interval · Blood volume beat · Heart rate · Respiratory rate
1 Introduction The human physiological signs, for instance, breath rate, beat, beat whim partner degreed blood O immersion and its checking acknowledges a work in end of flourishing conditions and irregular functions, Routinely, sensors, anodes, leads, wires and chest ties square measure utilized for seeing of cardiorespiratory development, M. Spandana (B) · P. Arun Deshpannde · K. Biradar · B. S. Surekha · B. S. Renuka JSS Science and Technology University, Mysore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_42
527
528
M. Spandana et al.
which can cause nervousness and oblige the patient at whatever point utilized for wide stretches of time. Focal points in diminishing the issues related with contact seeing frameworks has impelled investigation exploitation clear elective assessment ways for checking of physiological signs, for instance, the usage of the Photoplethysmography (PPG), Eularian video improvement and Independent parcel assessment [1]. At the regular heart cycle, meter changes inside the facial veins change the way length of the encompassing lightweight incidented with the tip objective that the resulting changes in live of reflected lightweight exhibit the plan of vehicle diovascular events. This facial video is considered as partner degree data and PPG signals are removed from the got video and independent component Analysis is utilized for extraction of Red, green and blue signals and winnowing methodology square measure utilized for isolating the sign and after sign is free and demonstrated [2]. The people rate is measurable by counting the measure of breath severy second through checking anyway as a rule the chest rises. Breath rates may increment with fever, infection, or option clinical conditions [3]. Respiratory rate ends up being a basic pointer of ailment. Eulerian Video Magnification (EVM) is partner degree stimulating video preparing approach that reveal simple breath developments inside video groupings [4]. It for the premier half contains of spatial decomposition,phase comparison, signal smoothing and prime acknowledgment
2 Literature Survey The paper [4] they stone-broke down regarding applied mathematics info from thetraffic division shows that physiological and mental elements causes the amount the amount accidents. To conquer that they projected the procedure that contains A digital camera that’s utilised to quantify the 60 min of drivers, therefore on monitor their actual exercises in real time. The 60 min is gotten by catching the colour varieties materializing attributable to blood course in facial skin. Blind supply, supported by the RGB shading amendment embrace in video following, is utilized for info extraction. Contrasted and existing business location prepare ment, the non-contact identification strategy will helpfully quantify the 60 min with acceptable preciseness, significantly throughout driving circumstances that need further pre-alert. The this paper [5] they dicused regarding pulse variation, it’s causes, effects and moniter. Heart rate fluctuation is aproportion of types between each heartbeat that demonstrates the impacts of weight on a human body. With the ascent of unfortunate dietary patterns and inactive ways in which of life over the globe. To screen the heart beat selection they develoved a model that catches video utilizing the camera, were isolated into Red, inexperienced and Blue (RGB) shading channels that, were then modified over to (HSI) shading model. Cheeks were chosen because the district vital
Measurement of Physiological Parameters Using Video Processing
529
to that strategy (BBHE) was applied. Application of Principal section investigation (PCA) on the 3 shading channels separated new head elements. Using these ways they live the heart beat to screen pulse selection. The paper [6] they counseled regarding the importance of remote sensing element in heath care to screen pulse then on and there applications. To screen they projected a framework, that framework utilizes wearable remote multimodal fix sensors, planned utilizing off-the-shelf components. These wearable sensors utilize a low-power ninepivot mechanical phenomenon estimation unit to live the metabolic process development moreover, a MEMs electronic equipment to record sound signals. Data handling moreover, combination calculations square measure utilised to work out the metabolic process return and also the hacking events. This framework utilizes wearable remote multimodal fix sensors, planned utilizing off-the-shelf elements. These wearable sensors utilize a low-power nine-hub mechanical phenomenon estimation unit to judge the metabolic process development additionally, a MEMs mouthpiece to record sound signs. Info getting ready what is additional, combination calculations square measure utilised to work out the metabolic process return and also the hacking functions. The paper [7] They documented awkward of convolution ways, for instance, for estimating heart rate there square measure distinctive commonplace ways accessible, for instance, cardiogram that is expensive and discomforts. Another business widget is oximetry sensing element that wants association to fingertips, is likewise inconvenient. They documented higher approach for pulse mensuration. They saw that in the guts cycle, because of volumetrical changes within the facial veins, the manner length of the incidence close lightweight is changed. And the circumstance of was functions are|ar|area unit|square live incontestable as a results of the following changes within the measure of mirrored lightweight. They have utilised sign partition ways like freelance section analysis(ICA) to isolate inexperienced shading band from red, blue and inexperienced band. After analytic inexperienced band they figure repose Beat Interval to work out the heart beat. The paper [8] they describe their observation on deceases relating to heart and respiration. To monitor these physiological parameters projected technique contains RGB band separation from video captured, selecting inexperienced band for pulse rate and red and blue for rate mensuration and for hard O containt within the blood they used optical sensing element placed at tip. The paper [9] they processed concerning would like of contact less estimation of physiological parameters. Heart rate could be a basic indispensable sign for clinical diagnosing. There is developing revenue in extricating it while not contact, particularly for populaces, as an example, untimely kids and therefore the recent for whom the skin is delicate and damageable by customary sensors. Hence they planned strategy to measure pulse through video handling technique. According to their technique, they take away pulse and beat lengths from recordings by estimating invisible head
530
M. Spandana et al.
movement led to by the Newtonian response to the deluge of blood at every beat. Method tracks highlights on the pinnacle and performs head half examination (PCA) to decay their directions into a bunch of phase motions. It at that time picks the phase that best relates to pulses passionate about its transient repetition spectrum. Finally, investigate the movement extended to the present phase and distinguish pinnacles of the directions, that compare to pulses. The paper [10] As per their summary Human visual framework has restricted spatio-worldly sensitivity, but various signs that fall below this limit are often informative, example human skin tone differs marginally with blood circulation. This variety, whereas unperceivable to the unaided eye, are often abused to separate heartbeat rate by amplifying caught video. They have used Eulerian video magnification(EVM)method. Eulerian Video Magnification, takes a typical video arrangement as data, and applies spacial disintegration, trailed by transient separation to the frames. The coming concerning sign is then increased to uncover shrouded information. The paper [11] the foremost half focuses on home eudaimonia and distant heath observant of crucial signs incorporates the high exactitude symptomatic gadgets additionally as simple ones and open for everybody. In this paper they planned basic and powerful strategy for estimating the beat rate. The technique contains for every rous’ channel, pixels esteems were extra severally for every frame. The signals noninheritable alongthese lines were sifted utilizing a FIR bandpass filter, Next autonomous and head half examinations were performed. After that boundaries square measure removed The paper [12] creators conceivably introduced precise estimation and observant of physiological boundaries, as an example, internal heat level, pulse, metabolic process patterns. Among them their elementary focus concerning rate measuring. According to planned technique, the video is gathered at a collection edge pace of thirty rate, that is sufficient to discretize the respiration movements. After recording, the pel at the degree of pit of the neck within the main fringe of the video. The respiration rate are often far from either in repetition or time domain. Obtained rate are often used to determine hospitalization, and it offers the open door for early intervention. Moreover, the rate has been discovered to be a lot of detrimental boundary among steady and shaky patients. Checking of human physiological signs, as an example, pulse, pulse inconstancy, breath rate, and blood chemical element immersion assumes a operate finally of medical issue what is a lot of, uncommon functions, as an example, cardiac arrhythmia, arrhythmia, bradypnoea, tachypnoea, apnoea, and hypoxemia. Conventionally, checking of cardio-respiratory action is accomplished by utilizing glue sensors, terminals, drives, wires what is a lot of, chest ties which can cause distress and compel the patient whenever used for in depth stretches of your time. In addition, these sensors might cause skin damage, contamination or unfavorable responses on people with delicate skin. In this paper [13] creators used Christian Johann Doppler measuring system result and Thermal imaging strategy to quantify and screen human physiological signs. The Doppler shift is associate degree estimation procedure equipped for distinctive invisible development of the chest divider materializing thanks to
Measurement of Physiological Parameters Using Video Processing
531
mechanical action of the center and lungs later on uncovering the cardio-respiratory signal. Thermal imaging is associate degree inactive estimation procedure, which can be used to spot the created radiation from specific areas of the form within the infrared (IR) scope of the magnetism vary to free the cardio-respiratory sign.
3 Methodology This section discusses the steps methodologies involved in Measurement of Heart Rate using Video Processing.
3.1 Methodology Involved in Measurement of Heart Rate Using Video Processing This section contains basic building blocks which represent methods involved in the heart rate measurement and respiratory rate system. Figure 1 represents building blocks of heart rate measurement system. It shows various techniques used in different stages of the implementation. The system consists of three main modules 1. Data Collection • VideoCapturing For video catching Open Computer vision library (Open CV) is used to consequently distinguish the directions of the face area in the primary casing of video recording. • ROI Selection Initially, the rectangle shape face region is chosen at the first edge of video recording. Selected face locale facilitators continued as before for entire succession of images. The ROI is a rectangle shaped part of forehead area. • Extracting PPG signal Separate the red, green, blue raw traces from the selected ROI and normalize them. 2. Data Pre-processing • Signal Separation The Independent Component Analysis is a computational and factual procedure utilized to separate free signals from blend of signals. Assume that we have n blends x1 , . . . , xn of n independent segments. Xj = aj 1 ∗ s1 + aj 2 ∗ s2 + · · · + aj n ∗ sn
(1)
532
M. Spandana et al.
Fig. 1 Block diagram for heart rate measurement
The time list t has dropped in ICA model, since according to expect singular parts are random factors rather than an ideal time signals. Thus the watched qualities xj (t), for example the microphone signals in the mixed drink party issue, are then a sample/acknowledgment of this irregular variable. The condition can be communicated utilizing vector-matrix notation. The equation can be expressed using vector-matrix notation. X =A∗S ⎤ ⎡ ⎤ ⎡ S1 a11 X1 ⎢·⎥ ⎢ · ⎢ · ⎥ ⎥ ⎢ ⎥ ⎢ X =⎢ ⎣ · ⎦ S = ⎣ · ⎦ and A = ⎣ · an1 Sn Xn ⎡
(2) · · · ·
⎤ · a1 n · · ⎥ ⎥ · · ⎦ · ann
X random vector whose parts square measure the mixtures x1 . . . xn S random vector whose parts square measure the sources s1 . . . sn A mixing matrix with elements aij
(3)
Measurement of Physiological Parameters Using Video Processing
533
• Signal Filtering For each casing pixels esteems are included independently for each ROI’s channel. From them just green signs are thought of and apply Fast Fourier Trans-structure (FFT)filtering calculation. 3. Parameter Extraction • Obtain Inter Beat Interval (IBI) The pinnacles esteems are found from green flag and compute a normal of the mean of sign and most extreme power of the signal. Then various pinnacles which is more prominent than this edge are calculated. Then pulse can get and show.
3.2 Methodology Involved in Measurement of Respiratory Rate Using Video Processing Measurement Figure 2 represents the block diagram of respiratory rate measurement. The Respiratory Rate System Consists of 1. Data Acquisition Two kinds of information are acquired. Both video and the breath signals are gotten during a polysomnography. RGB values are changed over into Gray scale worth and afterward choosing the ROI that is chest and mid-region area. Here buffer is created for storing the image pyramid captured during data acquisition. Management of this buffer helps to handle the overhead problem. 2. Data Processing The recorded video information is prepared in various steps. First step is, explicit movements are intensified utilizing Eulerian video magnification. The developments are separated utilizing optical stream calculation from the video information. The yield from the optical stream is then adjusted so as to acquire a sign of which the quality can be surveyed in examination with the control signals. 3. Signal Analysis By including the optical stream estimations of each edge we can get recurrence substance of the sign and it is examined by fourier change. Optical stream calculation contains a few presumptions: • An object pixel intensities do not change between consecutive frames. • Neighbouring pixels have similar motion. we should Consider a pixel I(x,y,t) in first frame. In the following casing it moves by separation (dx, dy) taken after dt time. Since pixels are the equivalent and force doesn’t transform, we can scientifically say that, I (x, y, t) = I (x + dx, y + dy, t + dt)
(4)
534
M. Spandana et al.
Fig. 2 Block diagram for respiratory rate measurement
Then apply taylor series for approximation of right-hand side, to get the following equation (5) f x u + fy v + f t = 0 fx =
∂f ∂f ; fy = ∂x ∂y
(6)
u=
dx dy ; y= dt dt
(7)
Above condition is called Optical Flow equation. It is utilized to discover fx and fy, they are picture angles. What’s more, ft is the angle along time. Be that as it may, (u, v) is unknown. Solving two obscure factors utilizing this one condition
Measurement of Physiological Parameters Using Video Processing
535
is preposterous. So there are a few techniques to take care of this issue and we utilized Lucas–Kanade strategy. Lucas-Kanade strategy takes a 3 × 3 sub-part around the pixel. So all the 9 focuses have the equivalent motion. For these 9 focuses we have find (fx, fy, ft). So now our concern becomes explaining 9 conditions with two obscure factors which is over-determined with least square fit technique better arrangement can be acquired. The following is the last arrangement which is two condition two obscure issue and illuminate to get the arrangement. ⎛ 2 ⎞−1 ⎛ ⎞ − fxi fti fxi fxi fyi u i i ⎠ ⎝ ⎠ = ⎝ i − fyi fti v fxi fyi fyi 2 i
i
(8)
i
In the basic way the thought is to follow some given focuses and get the optical stream vectors of those focuses. Be that as it may, this will be fine Until managing little motions, it will bombs when there is enormous motion. To defeated this difficult we have utilized pyramid method. In the pyramid, little movements are expelled when cross up and huge movements turns out to be little movements. Utilizing Lucas–Kanade strategy, we can get optical stream alongside the scale. These signs can have a low sign to noise proportion, making breath recognition difficult. Independent part examination (ICA) and head segment examination (PCA) are performed to improve the sign quality. 4. Extract Respiratory Rate The last yield signal is the normal stretch inside the estimation buffer. Using peak detection technique resultant sign is acquired. Top identification calculation follows as, using picture histogram a pinnacle location signal is produced. At that point, utilizing the zero-intersections of the pinnacle recognition signal and the nearby extrema between the zero-intersections, the histogram tops are found. The pinnacle identification signal, rN , is produced by convolving the histogram h with the pinnacle identification part pN , i.e., rN = pN h
(9)
4 Results This section deals with results part of the executed project. It includes the results obtained during every step of heart rate and respiratory rate measurement.
536
M. Spandana et al.
Fig. 3 Initial application UI
4.1 Heart Rate Measurement Results Figure 3 shows initial application UI, which contains key handler menu, and two rectangular boxes to detect face and forehead. The initial application UI contains key handler menu, which contains different key options for different functionalities. To measure heat rate, for that first should select region of interest and should lock it. To do so we need to press key “S”. Figure 4 shows window selecting and locking of ROI. The ROI window contains green channel band segregated from ROI region and also key handler menu. Key handler menu consists of option like restart, display data and exit option. After locking region of interest next to plot or display then we need press key “D”. Figure 5 shows display window which display hear rate in beat per minute(bpm). Figure 6 shows the command prompt output which displays the tag results in each steps of the heart rate measurement application.
4.2 Respiratory Rate Measurement The respiratory rate measurement application starts by executing python file. Figure 7 shows initial application widow contains video capture frame, raw signal display plot, and processed signal plot.
Measurement of Physiological Parameters Using Video Processing
537
Fig. 4 Window showing locking of ROI
Fig. 5 Display window
After initialization next step will be calibration step to identify the ROI using image pyramid method and then measurement process will start. Figure 8 shows the measurement window, which contains the raw signal display, processed signal plot and digital real time respiratory rate display.
538
M. Spandana et al.
Fig. 6 Command prompt output
Fig. 7 Initial application window
5 Conclusion Based on the results, it will be terminated that the mensuration of vital sign and rate will be performed victimization the planned non-invasive and cheap methodology. Conventionally, observance of cardio-respiratory activity is achieved by victimization adhesive sensors, electrodes, leads, wires and chest straps which can cause discomfort and constrain the patient if used for long periods of your time contactless physiological parameter. It provides easy, user friendly and harmless mensuration methodology that each Individual needs.
Measurement of Physiological Parameters Using Video Processing
539
Fig. 8 Measurement window
5.1 Applications (a) Heart rate and respiratory rate can be used to diagnose the cardiovascular and respiratory disease. (b) The proposed method can be used for monitoring the status of the driver such as drowsiness and the mental stability of the driver. (c) The proposed method can also be used to monitor the status of infants without any wearable devices. (d) The proposed method is helpful in remote areas for immediate health monitoring.
5.2 Future Scope The goals of this project were purposely kept within what was believed to be attainable within the allotted timeline. As such, many improvements can be made upon this initial design. For future works, the results of the heart rate and respiratory rate can be utilised as the basic criteria for detection of the brady cardia, trachy cardia and many other respiratory diseases. The techniques such as PPG and Eularien Magnification can also be used to measure the body temperature and oxy-haemoglobin and carboxyhaemoglobin content in the blood based on the movements of the blood in the certain region.
540
M. Spandana et al.
The obtained results can be inerfaced with the android device so that the proposed method is accessible by everyone. Further the android application can provide realtime doctor support for the patient who are using. The results of the patient will be accessible by the doctor in real-time. For this there is a need for application developed for doctors. This application will display all the measurement details of the patients. The doctor can also get alerts about serious patients who require instant medical care. Further the app can have a real-time chat with patients personal doctor. The other pshyological parameters like body temperature and SpO2 can also be mmeasured and used for diagnosis.
References 1. S.L. Bennett, R. Goubran, F. Knoefel, The detection of breathing behavior using Eulerianenhanced thermal video, in Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) (2015), pp. 7474–7477 2. Y.D. Patil, G.J. Chhajed, Heart rate measurement system using facial video processing. Int. J. Emerg. Trends in Technol. (IJETT) 4 (2017) (Special Issue) 3. S. Alam, S.P.N. Singh, U. Abeyratne, Considerations of handheld respiratory rate estimation via a stabilized video magnification approach, in IEEE-EMBS International Conference on Biomedical and Health Informatics, NV, USA, Las Vegas (2016), pp. 41–44 4. Z. Qi Zhang, G. Xu, W. Ming, Y. Zhou, W. Feng, Webcam based non-contact real-time monitoring for the physiological parameters of drivers, in The 4th Annual IEEE International Conference on Cyber Technology in Automation, Control and Intelligent Systems, China, Hong Kong, June 4–7, 2016 5. G. Uppal, P. Kalra, N.R. Prakash, Heart rate measurement using facial videos. IEEE 0973-6107 10(8), 2343–2357 (2017) 6. T. Elfaramawy, C.L. Fall, S. Arab, M. Morissette, F. Lellouche, B. Gosselin, A wireless respiratory monitoring system using a wearable patch sensor network. IEEE Sens. J. 19(2) (2019) 7. S.M. Kshirsagar, G.J. Chhajed, Heart rate, respiration rate and oxygen saturation level monitoring from video recording by smart phone camera. Int. J. Eng. Develop. Res. (IJEDR1402254) 2(2) (2014) 8. G. Balakrishnan, F. Durand, J. Guttag, Detecting pulse from head motions in video, in MIT CSAIL (2016) 9. H. Wu, M. Rubinstein, E. Shih, J. Guttag, F. Durand, W. Freeman, Eulerian video magnification for revealing subtle changes in the world, in Quanta Research Cambridge and MIT CSAIL (2015) 10. M. Lewandowska, J. Rumi´nski, T. Kocejko J¸edrzej Nowak, Measuring pulse rate with a webcam—a non-contact method for evaluating cardiac activity, in IEEE Proceedings of the Federated Conference on Computer Science and Information Systems (2011) 11. C. Massaroni, D. Simões Lopes , D. Lo Presti, E. Schena, S. Silvestri, Contactless monitoring of breathing patterns and respiratory rate at the pit of the neck: a single camera approach. J. Sens. 2018, ID 4567213 (2018) 12. A. al-naji, K. Gibson , S. Lee, J. Chahl, Monitoring of cardiorespiratory signal: principles of remote measurements and review of methods, in IEEE Translations (2017) 13. H. Fitriyah, A. Rachmadi, G.E. Setyawan, Automatic measurement of human body temperature on thermal image using knowledge-based criteria. J. Inform. Technol. Comput. Sci. 2(2) (2017)
Low-Dose Imaging: Prediction of Projections in Sinogram Space Bhagya Sunag and Shrinivas Desai
Abstract Computed tomography (CT) is one of the preferred medical diagnostic tools according to the medical survey. In CT, X-ray projections are acquired from different view angles to generate the tomographic images of the body. The current study has been evident that there is an adverse effect on health due to the excess radiation exposure. In this context, low-dose imaging is becoming as a clinical reality. Low-dose image is achieved by a sparse-view CT and usually possess complex artifacts and noise. Reconstructing the high-quality images from low dose is always remaining as a challenging task. To address this issue, the simple-averaging method is presented to estimate the missing projection data in the sinogram space. Reconstructed image quality is assessed by using the parameters such as PSNR, RMSE, and SSIM. Experimental results show that the proposed technique improves the image quality as compared to the conventional low-dose image. Keywords Computed tomography · Low-dose · Sparse-view
1 Introduction Medical imaging technology is rapidly growing over the past 30 years. It uses imaging modalities and processes such as CT, MRI, ultrasound, and X-ray to get more detailed information on the human body, which helps the doctor to diagnose and treat the patients effectively.
B. Sunag (B) · S. Desai School of Computer Science & Engineering, KLE Technological University, Hubli, Karnataka 580031, India e-mail: [email protected] S. Desai e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_43
541
542
B. Sunag and S. Desai
The usage of CT has increased rapidly all over the world in the last two decades. Image reconstruction in CT is a mathematical process where X-ray projections from different angles are taken to generate the tomographic images of the human body for proper diagnosis of disease. From sufficient projection angles, a high-quality image can be acquired. However, repeated scan or high radiation dose can damage body cells, including DNA molecules, which may lead to radiation-induced cancer. Therefore, the radiation dose is one of the critical issues in the medical field. Two promising schemes introduced to reduce the radiation dose [1]. One is lowdose CT, which is to change the hardware condition to lessen the tube current [2]. Another is sparse-view CT, which is to reduce the number of projections which will produce the complex artifacts and noise in the reconstructed image. Therefore, to improve the image quality due to sparse view, several conventional algorithms have been proposed. They are analytical and iterative reconstruction algorithm [3]. The proposed methodology uses an analytical filter back projection (FBP) algorithm to reconstruct a sparse-view CT image called a low-dose image. The main objective of the experiment is to generate a high-quality image from low dose. Therefore, to realize this, missing projection data due to sparse-view in sinogram space are filled with simple averaging of neighboring projection values. The paper is organized as follows: In Sect. 2, mathematical background along with its implementation details is discussed, and in Sect. 3, sparse-view reconstruction is discussed. In Sect. 4, the proposed method performance is compared and analyzed for sparse-view projection, respectively. Lastly, in Sect. 5, the conclusion and future work are given.
2 Literature Survey This section includes a survey on state-of-the-art techniques to improve the low-dose CT image quality for medical diagnostics. In the paper [5], authors have worked on 197 trauma patients of young age group to evaluate abdominal organ injuries and graded them to ASST scales. They compared the image quality and noise ratio of low-dose CT using ASIR-V and FBP with routine dose CT. ASIR-V with significant reduction in radiation dose performs better in assessing the multi-organ abdominal injury without harming image quality. ASIR-V hybrid iterative algorithm [6, 7] is used to study the human lung specimen and phantom for lesion detection, image noise, resolution, and dose reduction potential. The ASIR-V with low radiation dose minimizes the image noise and improves the image quality but did not considerably influence airway quantification values, while variation in measurements such as %WA and WT slightly increased with dose reduction. In [8], authors have considered 59 children to conduct experiment. MBIR is an iterative algorithm that has a better performance compared to ASIR in improving the image quality of contrast dose abdominal and low radiation dose CT to meet the diagnostic requirements.
Low-Dose Imaging: Prediction of Projections in Sinogram Space
543
In [9], 21 patients brain 3D-CTA axial and volume-rendered (VR) images were reconstructed from the 3D-CTA data using ASIR and MBIR. MBIR provides better visibility of small intracranial arteries such as AchoA and TPA without increasing the radiation dose. In [10, 11], the authors have compared deep learning reconstruction (DLR) images with state-of-the-art techniques reconstructed images for phantom, submillisievert chest, and abdominopelvic CT in improving image quality, noise, and detectability. In recent literature, some of the authors worked on the reconstruction of sparseview CT images using DenseNet and deconvolution [12], GoogleNet [13], and convolution neural network [14, 15]. Most of the literatures have not explored sinogram space for addressing low-dose imaging; hence based on this research gap, our proposed methodology is designed.
3 Mathematical Background 3.1 Radon Transform Radon and inverse radon transform mechanisms are used in the reconstruction of CT images from projections. Radon transform produces a line integral from object f (x, y) as a projection shown in Fig. 1. The projection of a 2D object at an angle is one dimensional. Series of 1D projection at different angles are stacked together to form a sinogram. The radon transforms mathematical equation is given below [4]. In CT, 1D projection of an object f (x, y) at an angle is given by, Fig. 1 Radon transform maps f on the (x, y)-domain into Rf on the (α, s)-domain
544
B. Sunag and S. Desai
Fig. 2 Radon transform (sinogram p)
∞ ∞ g(s, θ ) =
f (x, y)δ(x sin θ + y cos θ − r )dxdy
(1)
−∞ −∞
where θ is the angle of the line and s is the perpendicular offset of the line. The gathering of g (s, θ ) at different angles of θ is called the radon transform of an image f (x, y). Figure 2 presents the sinogram p from the image f using the radon function. p = radon( f, θ ) The inverse radon transform is used in CT to reconstruct a 2D image from a sinogram. Reconstructed image f (x, y) is given by, π
f (x, y) = B{g(r, θ )} =
g(x cos θ + y sin θ, θ )dθ
(2)
0
Figure 3 presents the reconstructed image f from projection data using the iradon function. i = iradon( p, θ )
3.2 Filtered Back Projection Filter back projection is an analytical reconstruction algorithm, which is one of the fasted methods to perform inverse radon transform. Hence, it is widely used in CT to
Low-Dose Imaging: Prediction of Projections in Sinogram Space
545
Fig. 3 a Original image f, b unfiltered reconstructed image
overcome the limitations of back projection. The only tunable parameter in FBP is a filter and is used to remove blurring (artifacts) in the reconstructed image. Using a ramp filter is the optimal way to suppress complex artifacts. So, FBP is a combination of ramp filtering and back projection. The ramp filter is given by, H hˆ (w) = [w]h (w)
(3)
Applying the projection-slice theorem changing the integration variables, it can be observed for f , f =
1 R ∗ H [ p] 4π
(4)
This means that the original image f can be reconstructed from sinogram p as shown in Fig. 4.
(a) Original image (f)
(b) Sinogram (p)
Fig. 4 Image reconstruction from projections
(c) Reconstructed Image
546
B. Sunag and S. Desai
Fig. 5 a Conventional CT, b sparse-view CT
3.3 Sparse-View CT In the conventional CT, the X-ray source positions are distributed uniformly over the angular range of 0°–360°, and 1000–2000 projection data are measured. In the sparse-view CT, projections g(f , θ ) are known for only a small set of increment angles θ distributed over 0–π as shown in Fig. 5b. The number of X-ray projections reduced to less than 100. By implementing a sparse view, there is a substantial reduction in the radiation dosage, and the images generated are hence called low-dose images. They will be suffering from complex artifacts and noise but lower radiation risk on the human body.
3.4 Sparse-View CT Reconstruction with FBP The experiment is carried out by varying the number of views. The objective is to analyze how the image quality is affected by drastically reducing the number of views. The proposed research work have considered 0–360° with an increment of 1° (360 views) as complete data and 0–360° with an increment of 2°, 4°, 6°, and 8° (i.e., 180, 90, 45, 22 views) as incomplete data. In Fig. 6, it can be observed that as the radiation dose is reduced, the quality of the reconstructed image is degraded. It suffers from artifacts and noise. These images are called low-dose images.
4 Proposed Methodology 4.1 System Model The Shepp–Logan phantom head model is used as an original image. The original image is subjected to radon transform for 0°–360° with an increment of 2°, 4°, 6°, and 8° separately. Due to a sparse view, data acquired at sparser angles. For example,
Low-Dose Imaging: Prediction of Projections in Sinogram Space
547
Fig. 6 Reconstruction of image from sparse projections
with 4° increment, angle projection data is acquired at 0°, 4°, 8°, 12° …, and the intermediate angle data are not acquired. An effort is made to predict projections at these intermediate angles using a simple-averaging method. The detailed process is explained in the next section. After predicting the missing intermediate projection values, the manipulated sinogram is transformed into an image domain using filtered back projection, which is inverse radon. As a result, the better quality image is reconstructed with minimal artifacts and noise as compared to the image reconstructed with a 4° increment angle without manipulation. Figure 7 represents the proposed model for image reconstruction.
4.2 Filling the Missing Projection Data in Sinogram Space The missing projection data in sinogram space due to sparse view is filled with a NULL (zero (0)) value. In the proposed method, only the alternate degrees between the sparse angles are filled with value 0. For example, 2° is the alternate degree between sparse angle 0° and 4° filled with 0. Figure 8 presents the missing projection value in sinogram space filled with 0.
548
B. Sunag and S. Desai
Fig. 7 Image reconstruction process model
Fig. 8 Sparse projections and predicted values filled with 0 in sinogram space
The Shepp–Logan phantom head model is used for experiment. Figure 6 represents reconstructed images for a sparse view. The reconstructed images for sparse angle 4°, 6°, and 8° are called low-dose images, as the number of dose on the object are reduced. These images exhibit complex artifacts and low spatial resolution. The challenge is to improve the quality of the image by applying suitable technique.
Low-Dose Imaging: Prediction of Projections in Sinogram Space
549
Fig. 9 Estimated projection value using simple average method
4.3 Predicting the Missing Data The missing projection data due to sparse views are estimated using simple averaging of neighboring projection data in sinogram space. In Fig. 9, it can be observed that the missing projection value is estimated and filled by using the averaging method.
5 Results and Discussion Comparison of low-dose and reconstructed image using proposed method is shown Fig. 10. It depicts that the proposed methodology yields a better quality image compared to the low-dose image for a different sparse angle.
5.1 Result Analysis The performance measuring parameters such as structural similarity index matrix (SSIM), peak signal-to-noise ratio (PSNR), and root mean square error (RMSE) were calculated and compared between low-dose and image reconstructed using the proposed method. Table 1 presents the recorded values. From Table 1, it is witnessed that the proposed technique produces better SSIM score, PSNR, and RMSE value compared to the conventional low-does method except
550
B. Sunag and S. Desai
Fig. 10 Comparison of low-dose and reconstructed images with proposed method
Table 1 Performance measures Increment angle in degrees
PSNR
SSIM
RMSE
Low dose
Low dose (with correction)
Low dose
Low dose (with correction)
Low dose
Low dose (with correction)
2°
79.8
78.5
0.91
0.95
0.026
0.030
4°
72.5
74.2
0.74
0.88
0.060
0.049
6°
68.0
70.5
0.67
0.83
0.101
0.075
8°
65.3
68.4
0.62
0.77
0.138
0.096
for lesser sparse angle, i.e., for 2° incremental angle (180 views). This means that the proposed method is not viable for a lesser sparse angle. SSIM up to 0.8 is clinically accepted (from literature study); hence, a sparse view with 6° is the optimal incremental angle beyond which image may not have diagnostic visibility.
6 Conclusion and Future Scope The sparse-view CT (low dose) images usually exhibit low spatial resolution and complex artifacts. To deal with such an issue, the proposed research work has presented the simple-averaging method which predicts the intermediate missing projection data in sinogram space to yield a better quality image. After incorporating sinogram-based correction for low-dose imaging, there are few chances of missing
Low-Dose Imaging: Prediction of Projections in Sinogram Space
551
finite tissue structure details that may affect the diagnostic decision. However, the proposed method overcomes the lost medical information due to sparse view. From the experimentation results, it is concluded that the proposed method performs better compared to the conventional low does method except for a lesser sparse angle. SSIM up to 0.8 is clinically accepted; hence, a sparse view with 6° is the optimal incremental angle beyond which image may not have diagnostic visibility. In the future, real-time sparse-view CT images will be considered for experiment, and the performance of FBP will be compared with proposed model.
References 1. A.C. Kak, Computerized tomography with X-ray, emission, and ultrasound sources. Proc. IEEE 67(9), 1245–1272 (1979) 2. M. K. Kalra, M. M. Maher, T. L. Toth, L. M. Hamberg, M.A. Blake, J. A. Shepard and S. Saini,” Strategies for CT radiation dose optimization,” Radiology, vol.230, pp.619–628,2004. 3. S.D. Desai, L. Kulkarni, A quantitative comparative study of analytical and iterative reconstruction techniques. International Journal of Image Processing (IJIP) 4(4), 307 (2010) 4. S.D. Desai, Reconstruction of image from projections-an application to MRI & CT Scanning, in Proceedings of International Conference ICSCI-2005 (2005) 5. Nam Kyung Lee, Low-dose CT with the adaptive statistical iterative reconstruction V technique in abdominal organ injury: comparison with routine—Dose CT with filtered back projection. Am. J. Roentgenol. 213, 659–666 (2019) 6. L. Zhang, Airway quantification using adaptive statistical iterative reconstruction-V on wide-detector low-dose CT: a validation study on lung specimen. Jap. J. Radiol. (2019) 7. A. Euler, A third-generation adaptive statistical iterative reconstruction technique: phantom study of image noise, spatial resolution, lesion detectability, and dose reduction potential, in AJR:210, June 2018 8. J. Sun, Performance Evaluation of Two Iterative Reconstruction Algorithms, MBIR and ASIR, in Low Radiation Dose and Low Contrast Dose Abdominal CT in Children (Italian Society of Medical Radiology, 2020) 9. N. Hamaguchi, Improved Depictions of the Anterior Choroidal Artery and Thalamoperforating Arteries on 3D CTA Images Using Model-based Iterative Reconstruction (Association of University Radiologists, 2020). 10. T. Higaki, Deep Learning Reconstruction at CT: Phantom Study of the Image Characteristics (The Association of University Radiologists, 2019) 11. R. Singh, Image quality and lesion detection on deep learning reconstruction and iterative reconstruction of submillisievert chest and abdominal CT, in AJR (2020) 12. Z. Zhang, A sparse-view CT reconstruction method based on combination of DenseNet and deconvolution. IEEE Trans. Med. Imaging 37(6) (2018) 13. S. Xie, Artifact removal using improved GoogLeNet for sparse-view CT reconstruction. Sci. Rep. (2018) 14. D.H. Ye, Deep Back Projection for Sparse-View CT Reconstruction (IEEE, 2018) 15. H. Nakai, Quantitative and Qualitative Evaluation of Convolutional Neural Networks with a Deeper U-Net for Sparse-View Computed Tomography Reconstruction (The Association of University Radiologists, 2019)
Transfer Learning for Children Face Recognition Accuracy R. Sumithra, D. S. Guru, V. N. Manjunath Aradhya, and Raghavendra Anitha
Abstract Identifying the missed and kidnapped children at the later age will be a quit challenging process. To overcome the challenge, this research work has proposed a new Children Face Recognition (CFR) application using Artificial Intelligence (AI) system. To the best of our knowledge, the existence of children face image dataset, which is created in a fruitful process has not been reported in the earlier literature. Hence, to this consequence, this research work has addressed the problem of developing the children face recognition model with a suitable dataset. A model has been proposed by using machine learning pipeline that consists of pre-processing, feature extraction, dimensionality reduction and learning model. To this end, an attempt has also been made to classify the face images of children by training a multi-classification algorithm with the ensemble techniques such as Bagging and Boosting. During the dataset creation, 40,828 longitudinal face images of 271 young children of age from 4 to 14 years are captured over the duration of 30 months. This extensive experimentation has analyzed that few projection vectors of k-NN classifier has achieved a high accuracy of about 93.05%. Keywords Children face recognition · Bagging · Boosting · Longitudinal face image
R. Sumithra (B) · D. S. Guru Department of Studies in Computer Science, University of Mysore, Manasagangotri, Mysuru, Karnataka 570006, India V. N. Manjunath Aradhya Department of Computer Application, JSS Science and Technology University, Mysuru, Karnataka, India R. Anitha Department of ECE, Maharaja Institute of Technology, Mysuru, Karnataka, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_44
553
554
R. Sumithra et al.
1 Introduction Face Recognition (FR) is a challenging area, where a great deal of work has been accomplished in biometric applications for pose variation, partial face identification and illumination variation to help the process of face recognition [11]. Face images are beneficial not just for the identification of individuals but also for disclosing other characteristics such as gender identification, expression recognition, a person’s emotional state detection, age estimation, ethnicity etc. A desirable biometric modality such as its uniqueness, universality, acceptability and easy collect ability is generated by the facial parameters. In the field of biometrics, face recognition is wellstudied problem that is still unsolved because of the inherent problems presented by human faces [10]. An aging recognition is now attracting more attention from the face recognition community as it is an inevitable phase and the appearance of human faces changes remarkably with the progression of age [4]. Aging involves differences in the shape, the texture and size of the face with regard to the facial recognition system. Output degeneration would be triggered by these temporal changes. Therefore, the ID issued by the government, e.g. driving license, Passport, has to be updated every 5–10 years. Aging refers to improvements over a period of time in a biometric trait, which can theoretically influence a biometric system’s accuracy [17]. Several efforts are being made to explain the aging impact on systems of face recognition. However, as a subject of the growth and development process of childhood, algorithm efficiency with respect to human aging has only begun to be thoroughly explored by researchers. As the face changes over time, the capacity to identify the individual becomes more difficult. Anthropological and forensic research have made a major contribution to demonstrating that children’s age-related changes vary from adult facial aging [20]. A model for the extraction of facial landmarks for the face recognition method was introduced by [12]. Deb et al. [3], examined the method of face recognition as a “man-machine” method, where human experts had to manually locate certain facial landmarks on a matching image. Then, based on 20 normalized distances obtained from these facial landmarks, matching was performed automatically. By exploiting both texture and shape characteristics of face images, the morph-able model proposed by Blang and Vetter [2] improves the use of 3D models in face recognition. Some of the most important developments in the field of face recognition in the last decade are Sparse Representation Coding (SRC) [26] and face recognition based on deep learning [22]. Most facial recognition techniques presume that faces (both geometrically and photo-metrically) can be positioned and correctly normalized where the alignment can be performed based on the locations of two eyes on the face. A face detection model considered to be a breakthrough was created by [25] because it allows faces to be identified even in the presence of background clutter in real time. On order to extract age-related characteristics, decomposition of facial attributes such as identity, expression, gender, age, race, pose etc., is necessary. Nevertheless, a few attempts has been made in the literature to identify a person through photographs of facial aging. Anthropomorphic [21], active appearance model [5],
Transfer Learning for Children Face Recognition Accuracy
555
aging pattern subspace [9], and aging manifold [8] are some of the typical aging face models. In infants, facial aging primarily includes craniofacial growth and development. The variations of the cranium and face year after year from childhood to young adulthood are identified in Karen T. Taylor’s book “Forensic Art and Illustration”. In non-adults, the rate of face changes is maximum, especially between birth to 5 years of age [18]. The precision of identification is highly dependent on the age of the subject. Nevertheless, the bulk of the study on adults aging found in the literature faces recognition. Reference [14] concluded that face recognition technology is not yet ready to recognize very young children reliably, especially face images taken at 3 years and above. As observed in the previous literature that the success of a face recognition system largely depends on the stability of the facial parameters in the face. Any highly sophisticated and efficient face recognition system will fail if the parameters of a face change regularly. One of the biggest issues is the vast amount of data that is required to fully understand the human face and its maturation process. To the best of our knowledge, there is no existing children face image dataset created in a very fruitful process. Hence, to this consequence, the problem of developing the children face recognition model has been addressed by creating a suitable database. During the dataset creation, 40,828 longitudinal face images of 271 young children of age from 4 to 14 years are captured, over the duration of 30 months. A model has been proposed by using machine learning pipeline consists of pre-processing, feature extraction, dimensionality reduction and learning model. To this end, an attempt has also been made to classify children face images by training a multi-classification algorithm using ensemble techniques, such as Bagging and Boosting. The following are the contribution of this study: • Creation of large sized longitudinal face images of a young children by maintaining the quality and quantity of our dataset. • Study of conventional techniques such as feature engineering, feature transformation and supervised classification for the recognition of a children. • Exploitation of traditional approach for the fusion of multi- classification using OR rule. • Employed an Ada-boosting and Random forest classifiers for boosting and bagging techniques. The rest of the paper is organized as follows: explanation of proposed model in Sect. 2. Section 3 reports a dataset creation and experimental results with analysis, and Sect. 4 gives a conclusion and future enhancement.
2 Proposed Model The overview of the our proposed model and its pictorial representation is shown in Fig. 1. The model has two stages, first one with feature representation using PCA and transformed FLD, and second one with the decision level fusion of multi-classifiers called ensemble technique, which is depicted in Fig. 2.
556
R. Sumithra et al.
Fig. 1 The pictorial representation of our proposed model
Fig. 2 The technical description of our proposed model
2.1 Feature Extraction and Its Representation of Children Face Images Pre-processing The aim of the pre-processing technique is to extract the face region from an image, using Viola-Jones face detection method by fixing the threshold and region size [25]. This research work has extracted only the face part from each image and pre-processed the images of same child at different age interval are shown in Fig. 3. Handcrafted Feature Computation During the feature extraction, the local and handcrafted features viz., Histogram of Orientation Gradient (HOG) and Multi-scale Local Binary Pattern (M-LBP) are utilized. The adopted features have widely used and highly discriminating in face recognition. Hence, they have utilized in our study, by hyper-parameter tuning.
Fig. 3 The Illustration of pre-processing a captured face image; b pre-processed image
Transfer Learning for Children Face Recognition Accuracy
557
HOG: The features are extracted by dividing the face image window into small spatial regions. Each detection window is divided into cells of size 32 × 32 pixels and each group of 4 × 4 cells are integrated into a block in a sliding mode, so blocks overlap with each other. Each cell consists of an oriented gradient (HOG) 9 bin histogram, and each block includes a concatenated vector of all its cells. MLBP: This operator works by thresholding a 3×3 neighborhood with the value of the center pixel, thus forming a local binary pattern, which is constructed as a binary number. Features calculated in a local 3 × 3 neighborhood cannot capture large-scale structures because the operator is not very robust against local changes in the texture. Therefore, an operator with a larger spatial support area is needed. The operator was extended to facilitate rotation invariant analysis of facial textures at multiple scales such as 3 × 3, 5 × 5, 7 × 7, and 9 × 9 (Ojala et al. 2002). Dimensional Reduction Using Subspace Technique To reduce the feature dimension, subspace methods called Principal component Analysis (PCA) [24] and Fisher Linear Discriminant (FLD) [15] is used to preserve the most dominating projection vectors. PCA Let W represent a linear transformation matrix mapping the function points from m-dimension to p-dimension, where p m, as follows: Y p = wT X p
(1)
is the linear transformation of the extracted features. Where {W i|i = 1, 2, . . . , m} is the set of n-dimensional projection vectors corresponding to the m largest non-zero eigenvalues in Eq. (1). FLD An example of class specific approach is the Fisher’s Linear Discriminant (FLD). This method selects W in such a way that maximizes the ratio of the scatter between class and the scatter within class, Eq. (2). T W SB W W = arg max T W SW W W
(2)
C where S B is between-class scatter matrix S B = i=1 Ni (xi − μ) (xi − μ)T and C C the SW is within class scatter matrix be defined as SW = i=1 X k ∈X i (x k − μi ) (xk − μi )T . Where μi is the mean image of class X i , and Ni is the number of samples in class X i . Where {Wi |i = 1, 2, . . . , m} is the set of generalization eigen vectors of S B and SW corresponding to the m largest generalized eigenvalues {λ i|i = 1, 2, . . . m}. The number of images in the learning set in general much smaller than the number of features in the image. This means that matrix W can be chosen in such a way that the projected samples’ in-class scatter can be rendered exactly null. This has been achieved by using PCA to reduce the size of the feature space to N-c, and then applied the standard FLD defined by Eq. (3), to reduce the size to c-1 [1].
558
R. Sumithra et al.
More formally, WFLD is given by: T T T = Wfld Wpca ... WFLD
(3)
T |W T W T SB W pca W | W ST W In com= arg max where Wfld = arg maxW |W T W pca and W pca T S W W pca W | pca W puting Wpca , only the largest c-1 projection vectors are selected.
2.2 Supervised Learning Models Learning Models In the literature, there are many learning models used for face recognition system. Thsi study has evaluated the quality of our dataset by conducting a number of baseline learning models. Therefore, ifferent classification method viz., k-Nearest Neighbor (k-NN) [13], Support Vector Machine (SVM) [23], Navies Bayes [19] and Decision tree [6], and the fusion of multi- classifiers using majority voting technique are used as shown in Fig. 4.
Fig. 4 Fusion of multi classifier using majority voting
Transfer Learning for Children Face Recognition Accuracy
559
Fig. 5 Ensemble technique: bagging and boosting
Ensemble Leaning A standard decision fusion technique called ensemble has been adopted for enabling a deeper understanding of our dataset. Ada-Boosting classifier in boosting [7] and Random Forest in bagging [16] techniques are used during by transfer learning as shown in Fig. 5.
3 Experiment Results 3.1 Longitudinal Face Images of Children This research work has created a own dataset of longitudinal face images of young children of 4–14 years old. In order to capture the face images of school children, a permission has been taken from the Director, the Board of Education (BOE), Northern region, Mysore, India. Our longitudinal face data collection has conducted in different Government schools in and around Mysore, India. To capture the children face images, parents and teachers have required to sign a consent form giving their permission to provide their child face images. In addition to the image, other metadata has also collected such as date-of-birth, gender, and date of capture, child name and father’s name. The face images have captured using 16 MP rear end camera of canon. Face images have collected in the school premises with a suitable setup made by the author as shown in Fig. 6. To maintain a degree of consistency throughout the database, the same physical setup and location with semi-controlled environment have used in each session. However, the equipment have to reassemble for each session; therefore, there is a variation from session to session. Due to the rapid growth in facial features of young children and to analyze the minute growth rate in more effectively and efficiently, a longitudinal face images of
560
R. Sumithra et al.
Fig. 6 Process of our dataset creation
Fig. 7 Histogram of our dataset used during experiment
children of 4–14 years old are created, over the period of 30 months in 10 different sessions for every 3–4 months interval. Our dataset comprises of 271 classes each class represents a child. Hence, the total number of children encountered in our study have 271. The minimum and maximum number of images taken from each child have 114 and 182 respectively. The sum of 40,858 longitudinal face images of 271 children have been considered for experimentation, among which 137 are male and 134 are female. The histogram of number of children and images are shown in Fig. 7 and the Longitudinal face images are shown in Fig 8.
3.2 Experimental Analysis In this section, the recognition has been computed by fixing the setup for its maximum performance through an extensive experimentation. During HOG feature extraction,
Transfer Learning for Children Face Recognition Accuracy
561
Fig. 8 Longitudinal Face image dataset
the block of windows represented by a 36-D feature vector that is normalized to unit length have been considered. For each 64 × 128 detection window is represented by 7 × 15 blocks, gives a total of 4184 features. Similarly for M-LBP, the rotation invariant analysis of facial textures are facilitated at multiple scales such as 3 × 3, 5 × 5, 7× 7, and 9 × 9. Obtained with 59 features from each scale therefore, sum of 256 features are computed. Then, the feature transformation has represented by using PCA and transformed FLD. After applying PCA and transformed FLD, features have been reduced to 25% from the total number of projection vector. Hence after feature transformation, it has been obtained with 64 (25% of 256) projection vector from the M-LBP feature and 1046 (25% of 4184) projection vector from HOG feature. By adopting this reduction techniques, it has been noticed that, there is an improvement in the rate of recognition and the computational time after feature transformation. The classification results of accuracy, f -measure, precision and recall for the HOG features are shown in Fig. 9, similarly, for MLBP features are shown in Fig. 10. From the computed performance, it has been observed that the rate of recognition is high with the combination HOG feature and k-NN classifier. Hence, it has experimented by using k-NN classifier for cumulatively increasing the number of projection vectors, during feature selection. The accuracy for cumulatively increased projection vectors is also increased and saturated (Curse of Dimensionality) at some projection point as shown in Table 1. The results obtained from the fusion classifiers have also appreciated. Ensemble classifier such as Ada-boosting for boosting technique and random forest for bagging technique have been utilized in this study. Ensemble of 10 decision trees have considered with row and column replacement. For Ada-boosting, the observation weights are equal to the number of samples taken for experimentation. The number
562
R. Sumithra et al.
Fig. 9 Classification results for HOG feature by varying different training and testing samples
Fig. 10 Classification results for MLBP feature by varying different training and testing samples
of learning cycle has 100 for all predictor combinations. 10k-fold validation with prior probabilities has been done empirically with classification margin distribution speed of 0.1 s. In Table 2, the combination of HOG features with Bagging classifier has the highest accuracy of 86.44%, at 70:30 training and testing split. There has been a huge variation in the rate of recognition between Bagging technique and Boosting. Hence Bagging is much superior then Boosting for our created dataset, which is depicted in Table 2.
Transfer Learning for Children Face Recognition Accuracy
563
Table 1 Accuracy for a Selected projection vectors using HOG feature for k-NN classifier Projection Accuracy 50:50 60:40 70:30 50.80 ± 0.10382 81.39 ± 0.07987 88.23 ± 0.20792 88.63 ± 0.12335 90.11 ± 0.12572 91.23 ± 0.18625 91.87 ± 0.09057 92.64 ± 0.09848 93.05 ± 0.13014 91.68 ± 0.14302 90.03 ± 0.14787 90.25 ± 0.13816 88.75 ± 0.16017 89.01 ± 0.15279
5 10 15 20 25 30 35 40 45 50 55 60 65 70
51.44 ±0.38295 81.98 ± 0.20951 88.46 ± 0.20859 88.96 ± 0.16279 90.33 ± 0.17087 91.28 ± 0.16475 91.98 ± 0.14208 92.71 ± 0.13732 93.07 ± 0.08559 91.71 ± 0.15777 90.32 ± 0.21474 90.43 ± 0.17375 89.15 ± 0.16972 89.35 ± 0.20781
Table 2 Results for ensemble classification recognition Classifier Features Split Accuracy F-Measure Boosting
HOG
MLBP
Bagging
HOG
MLBP
50-50 60-40 70-30 50-50 60-40 70-30 50-50 60-40 70-30 50-50 60-40 70-30
18.24 17.6 17.9 2.64 2.69 2.51 71.39 86.16 86.44 36.4 37.76 38.8
13.44 13.1 13.93 0.61 0.57 0.49 77.42 86.21 86.5 35.99 37.23 38.62
51.60 ±0.3977 82.39 ± 0.24057 88.55 ± 0.15816 89.22 ± 0.20576 90.53 ± 0.16804 91.24 ± 0.18816 92.04 ± 0.18402 92.64 ± 0.17166 93.13 ± 0.1488 91.82 ± 0.16175 90.41 ± 0.21342 90.52 ± 0.15426 89.38 ± 0.28191 89.66 ± 0.24056
Precision
Recall
13.5 14.12 14.26 0.47 0.56 0.71 71.43 86.22 86.5 36.19 37.6 38.65
17.46 16.79 17.17 2.46 2.48 2.32 90.11 86.5 86.95 38.14 39.41 40.91
4 Conclusion and Future Enhancement The aim of this study is to find the rate of recognition of young children using aging face images. Due to the drastic growth pattern in young children, the recognition through aging is esoteric task. To address such problem, a simple and efficient model that consists of feature representation and transfer learning models has been built. During feature extraction, local features such as M-LBP and HOG, and feature
564
R. Sumithra et al.
transformation such as PCA and FLD are adopted. For transfer learning model, different baseline models with their fusion using majority voting technique has been adopted. Also an attempt has been made with ensemble techniques such as Bagging (Random Forest) and Boosting (Ada-Boosting) for higher analysis. For experimentation, our own dataset of 40,828 longitudinal face images of 271 young children of age from 4 to 14 years are created, over the duration of 30 months. It has been concluded that, the high rate of recognition is obtained with the combination of HOG features and k-NN classifier by varying the projection vector. In future, the Deep Convolution Neural Network (D-CNN) technique can be adopted to enhance the performance of the model.
References 1. P.N. Belhumeur, J.P. Hespanha, D.J. Kriegman, Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans. Pattern Anal. Mach. Intell. 19(7), 711–720 2. V. Blanz, T. Vetter, Face recognition based on fitting a 3D morphable model. IEEE Trans. Pattern Anal. Mach. Intell. 25(9), 1063–1074 (2003) 3. W.W. Bledsoe, in Man-machine face recognition. Technical report PRI 22. Panoramic Research, Inc. 4. D. Deb, N. Nain, A.K. Jain, in Longitudinal study of child face recognition. arXiv preprint arXiv: 1711.03990 (2017) 5. G.J. Edwards, T.F. Cootes, C.J. Taylor, Face Recognition Using Active Appearance Models. Image Analysis Unit, Department of Medical Biophysics, University of Manchester, Manchester M13 9PT, UK 6. M.A. Friedl, C.E. Brodley, Decision tree classification of land cover from remotely sensed data. Remote Sens. Environ. 61(3), 399–409 (1997) 7. J.H. Friedman, Stochastic gradient boosting. Comput. Statistics Data Anal. 38(4), 367–378 (2002) 8. Y. Fu, T.S. Huang, Human age estimation with regression on discriminative aging manifold. IEEE Trans. Multimedia 10(4), 578–584 (2008) 9. X. Geng, Z.H. Zhou, K. Smith-Miles, Automatic age estimation based on facial aging patterns. IEEE Trans. Pattern Anal. Mach. Intell. 29(12), 2234–2240 (2007) 10. A.K. Jain, A. Ross, K. Nandakumar, Introduction to Biometrics (Springer, Berlin, 2011) 11. A. K. Jain, K. Nandakumar, A. Ross, 50 years of biometric research: accomplishments, challenges, and opportunities. Pattern Recogn. Lett. (2011). https://doi.org/10.1016/j.patrec.2015. 12.013 12. T. Kanade, in Picture processing system by computer complex and recognition of human faces. Ph.D. thesis, Kyoto University (1973) 13. J.M. Keller, M.R. Gray, J.A. Givens, A fuzzy k-nearest neighbor algorithm. IEEE Trans. Syst. Man Cybern. 4, 580–585 (1985) 14. B.-R. Lacey, Y. Hoole, A. Jain, Automatic face recognition of newborns, infants, and toddlers: a longitudinal evaluation, in 2016 International Conference of Biometrics Special Interest Group (BIOSIG) (IEEE, 2016) 15. S. Mika, G. Ratsch, J. Weston, B. Scholkopf, K.R. Mullers, Fisher discriminant analysis with kernels, in Neural networks for signal processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (cat. no. 98th8468) (IEEE, 1999), pp. 41–48 16. M. Pal, Random forest classifier for remote sensing classification. Int. J. Remote Sens. 26(1), 217–222 (2005)
Transfer Learning for Children Face Recognition Accuracy
565
17. K. Ricanek, S. Bhardwaj, M. Sodomsky, A review of face recognition against longitudinal child faces, in BIOSIG2015 (2015) 18. K. Ricanek, Y. Karl, Y. Wang, Y. Chang, C. Chen, Demographic analysis of facial landmarks. U.S. Patent 9,317,740, issued April 19, 2016 19. I. Rish, An empirical study of the Naive Bayes classifier, in IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, Vol. 3(22) (2001), pp. 41–46 20. L. Rowden, Y. Hoole, A. Jain, Automatic face recognition of newborns, infants, and toddlers: a longitudinal evaluation, in 2016 International Conference of the Biometrics Special Interest Group (BIOSIG) (IEEE, 2016) 21. A.S. Sohail, P. Bhattacharya, Detection of facial feature points using anthropometric face Model Concordia University, 2-9525435-1 SITIS (2006), pp. 656–664 22. Y. Sun, X. Wang, Tang, Deep learning Face Representation from predicting 10,000 classes, in Proceedings of IEEE Conference on Computer Vision and PR.PP, pp. 1891–1898 23. J.A. Suykens, J. Vandewalle, Least squares support vector machine classifiers. Neural Process. Lett. 9(3), 293–300 (1999) 24. M.A. Turk, A.P. Pentland, Face recognition using Eigen faces, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’91) (IEEE, 1991) 25. P. Viola, M.J. Jones, Robust real-time face detection. Int. J. Comput. Vision 57(2), 137–154 (2004) 26. J. Wright, A.G. Yang, S. Sastry, Y. Ma, Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31, 361–369 (2009)
A Fact-Based Liver Disease Prediction by Enforcing Machine Learning Algorithms Mylavarapu Kalyan Ram, Challapalli Sujana, Rayudu Srinivas, and G. S. N. Murthy
Abstract Health is prominent in mortal cheerfulness and financial progress across the world. Being healthy and active is also very important to the human lifestyle. Numerous major problems like liver disease, heart disease, and diabetes may decrease one’s activity level. One of the major issues is liver abnormality. In India, most individuals suffer from the deficiency of the liver. Most liver diseases occur due to some human faults like consumption of heavy alcohol, consuming fast foods, usage of identical needles while injecting drugs, etc. This may lead to many complications in the future. Due to liver malfunctioning, many disorders may take place such as liver cancers, hepatic angiosarcoma, pediatric hepatoblastoma, hepatitis B, and hepatitis C, etc. Machine learning techniques have been strived on numerous liver disorder datasets with Python programming language to find out the accurate and exact analysis of disease. The ultimate goal of this investigation is to predict the best accuracy after successfully executing machine learning algorithms that have been used on various liver disease datasets. Keywords BUPA · Classification algorithms · Feature selection · ILPD · Liver disorder · Performance measures · Prediction · Machine learning
1 Introduction The liver is the major organ that acts a massive role in the human body. It acts as a real filter that retrieves and eradicates many toxins. The liver ensures the metabolism M. K. Ram (B) · R. Srinivas · G. S. N. Murthy CSE Department, Aditya Engineering College (A), Surampalem, A.P., India e-mail: [email protected] R. Srinivas e-mail: [email protected] C. Sujana IT Department, Pragati Engineering College (A), Surampalem, A.P, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_45
567
568
M. K. Ram et al.
Fig. 1 Various liver disorders
of carbohydrates and fat, which accomplishes several activities like showing of bile, decomposition of RBC, etc. Bile is an essential element for the human digestive system. The toxins generated by the liver are wasted due to human faults or by increasing the content of ammonia in the human body. The pessimistic point in medical management is increasing the count of liver disorder patients. In India, patients suffering from liver disorders undergo with high morbidity and mortality rates. Liver tissue is incorporated with too many lobules and each lobe is organized with internal cells, the elemental metabolic cells of the liver. Several actions can influence and escalate liver disease risks. Figure 1 depicts the major liver disorders. Popular liver disorders are fatty liver, liver cirrhosis, hepatitis B and hepatitis C, and liver cancer. Major origin of liver diseases is genetic liver disorder, smoking, consumption of alcohol, consuming fast foods, obesity, and diabetes. This article aims to develop the best model to assist medical practitioners to find the finest treatment at preliminary stages for liver disorders. Numerous medical diagnosis mechanisms can be predicted by using machine learning techniques. Various liver complications are not resolved in the prior days because of abundant tools and a lack of awareness of a peculiar disease. The layout of the research work is described in the following manner. Section 2 describes the literature survey, and Sect. 3 represents the operational requirements. Section 4 illustrates the proposed methodology that considers different algorithms for predicting liver disorders. Section 5 exhibits the experimental results, and finally, Sect. 6 proposes the conclusion and future work.
2 Literature Survey In the past few years, plenty of research has been performed on liver disorders prediction using machine learning techniques throughout the world. In this section, various authors contributed and essence their work by using machine learning algorithms to predict liver disorders. Wu et al. [1] used various classification algorithms such as RF, NB, ANN, and LR to predict liver fatty disease early with accurate results. A comparison of these four
A Fact-Based Liver Disease Prediction by Enforcing …
569
algorithms has been considered based on their classification accuracy measure. To evaluate the performance, they were used ROC of all models with cross-validation. Finally, they concluded random forest as the best algorithm for the prognosis of fatty liver abnormality with the highest accuracy of 87.48 than other algorithms. Kumar et al. [2] described a new hybrid fuzzy-ANWKNN classification approach to predict liver disease with the highest accuracy and efficient manner. The existing fuzzy-ADPTKNN was not satisfied to predict liver disease. They were implemented on different datasets like the Indian liver dataset and Madhya Pradesh liver dataset. Finally, the author concluded a new hybrid approach gave the best performance than the existing approach. Sivakumar et al. [3] analyzed liver prediction based on some human quality factors. Chronic liver disease was predicted using some machine learning algorithms such as k-means clustering and statistical classifier decision tree approaches. Finally, all measurements have been taken then concluded C4.5 algorithm is superior to other classification calculations. Idris et al. [4] the authors analyzed liver disease prediction as early by implementing various supervised learning algorithms such as LR, SVM, and RF. The above algorithms got different accuracies but the authors increased accuracy using new algorithms, i.e., Ada boost classifier logistic and bagging random forest algorithms which got accuracies of 74.35 and 72.64, respectively. Nahar et al. [5] worked on liver disease prediction on various decision tree techniques like RF, J48, random tree, REPT tree, decision stump, LMT, and hoeffding. The authors used an Indian liver dataset to evaluate these techniques and obtained performances of all these algorithms. From the result analysis, the decision stump performance is excellent than other algorithms and obtained 70.76% accuracy. Arshad et al. [6] data mining classification algorithms such as Naïve Bayes, SMO, Bayes Net, and J48 are used to predict liver disease. In this article, the authors implemented the WEKA tool on the BUPA dataset and concluded that SMO gained the highest accuracy of 97.3% where Naïve Bayes gained less accuracy of 70.72%. Kumar et al. [7] various classification algorithms namely Naïve Bayes, C5.0, kmeans, random forest, and C5.0 with adaptive boosting have been implemented for finding out the liver disease prediction. Based on vulnerability, the latter classification algorithm has been introduced and it proved to predict the disease more accurately. Among all algorithms, random fast secured the highest accuracy with 72.10% but after implemented C5.0 with adaptive boosting, it got a more accurate result of 75.19%. Kuzhippallil et al. [8], the author described a new genetic algorithm, namely the XGBoost classifier and compared various algorithms and introduced envisaging procedures for liver disorder prediction. The author also explained anomaly detection techniques. Finally, the author concludes the new algorithm and obtained good accuracy and reduced the classification time. Pathan et al. [9] proposed a methodology using several classification algorithms like RF, Naive Bayes, J48, Ada Boost, and bagging. The author implemented all algorithms on ILDP using WEKA Tool. Finally, the author compared all results and finalized random forest provides accurate of 100% result.
570
M. K. Ram et al.
Muthuselvan et al.[10] proposed a liver disease prediction based on a computeraided diagnosis. Through computers, all liver data analyzed and stored all images in the database. This was very useful for further prediction. The author compared all four algorithms, namely K-star, Naïve Bayes, J48, random tree, and concluded that random tree secured the highest accuracy of 74.2% with a minimum time of 0.05 secs. Shapla Rani Ghosh and Sajjad Waheed [11] proposed liver disease prediction on various machine learning algorithms, namely logistic tree, REP tree, Naive Bayes, bagging, and K-star. The author was implemented on ILDP using WEKA Tool and evaluate the accuracy, precision, sensitivity, and specificity. Finally, the author revealed when compared to Naïve Bayes, K-star got efficient accuracy of 100%. Thirunavukkarasu et al. [12] illustrated several classification algorithms such as KNN, SVM, and logistic regression were utilized for liver disorder prediction. Based on severity, the classification algorithm has become a solution to predict the disease in advance. Dr. Vijayarani et al. [13] proposed two classification algorithms, namely SVM and Naïve Bayes for predicting liver disease. When compared to these two algorithms based on performance and execution time, SVM generated high accuracy. Dhamodharan [14] proposed two classification algorithms for predicting liver disorders such as Naïve Bayes and FT tree to uncover diseases like liver cancer, cirrhosis, and hepatitis, etc. After compared these algorithms on WEKA Tool, Naïve Bayes obtained good accuracy than FT tree. Bahramirad [15] proposed 11 classification algorithms such as K-star , Gaussian processes, logistic, linear logistic multilayer perceptron, RIPPER, rule induction, support vector machine, regression, logistic model trees, neural net, and classification and regression trees. The author used two datasets, namely the AP liver dataset and BUPA liver dataset. After observing the results, the accuracy performance of APL was better than BUPAL. Venkata Ramana et al. [16] reported that four classification algorithms such as Naïve Bayes, SVM, backpropagation NN, and C4.5 were used to predict early liver disease problems. These algorithms were implemented on various datasets and evaluated based on the performance of accuracy. Venkata Ramana et al. [17] reported that the common attributes in both NL and INDIA datasets are ALKPHOS, SGPT, and SGOT [22]. Two methods were applied for the analysis of these attributes. These methods are ANOVA and MANOVA. When compared these two methods, MANOVA was analyzed better than ANOVA. Venkata Ramana et al. [18] proposed that the combination of Bayesian classification with bagging and boosting techniques obtained the best accuracy of male 97.91 and female 91.16. for liver disease prediction [19].
A Fact-Based Liver Disease Prediction by Enforcing …
571
3 Operational Requirements In this article, an intake of two liver patient datasets was used to estimate the performance. 1.
The Indian Liver Patients Dataset (ILPD) retrieved from UCI ML database which consists of 583 records and 11 attributes.
Out of 583 records, 416 are liver disorder records and 167 are non-liver disorder patients records. This dataset consists of a chronicle of 441 male patients and 142 female patients [20]. Table 1 displays the particulars of ILPD dataset. 2.
The BUPA dataset from the UC Irvine Machine Learning [20, 21] Repository California state of the USA which consists of 345 records and 7 attributes.
Table 2 displays the particulars of the BUPA dataset. The first five variables are acquired from blood tests which may arise from the consumption of too much alcohol [21]. Table 1 Indian liver data with attributes Index
Attribute
Description
Range
Type
1
Age
2
TB
Patient’s age
4–90
Real numbers
Total bilirubin
0.4–75
Real numbers
3 5
DB
Direct bilirubin
0.1–19.7
Real numbers
ALB
Albumin
10–2000
Real numbers
6
A/G ratio
Albumin and globulin ratio
10–4929
Real numbers
7
SGPT
Alamine aminotransferase
2.7–9.6
Integer
8
SGOT
Aspartate aminotransferase
0.9–5.5
Integer
9
ALP
Alkaline phosphotase
0.3–2.8
Real numbers
10
Gender
Gender of patient
F/M
Categorical
11
Selector field
Diseased or not
0/1
Binominal
Table 2 Bupa liver dataset and attributes Index
Attribute
Description
Range
Type
1
Mcv
Mean corpuscular volume
[65, 103]
Integer
2
Alkphos
Alkaline phosphatase
[23, 138]
Integer
3
Sgpt
Alamine aminotransferase
[4, 155]
Integer
4
Sgot
Aspartate aminotransferase
[5, 82]
Integer
5
Gammagt
Gamma-glutamyl transpeptidase
[5, 297]
Real numbers
6
Drinks
No. of alcohols drunk per day
[0.0, 20.0]
Real numbers
7
Selector
Diseased or not
{1,2}
Binominal
572
M. K. Ram et al.
4 Proposed Methodology 4.1 Structure for Classification of Liver Patient Dataset The below flowchart represented the classification process of the Indian Liver Disease Dataset and BUPA dataset. This process begins with the dataset selection. In Fig. 2, the origin begins with data selection from both ILPD and BUPA datasets in CSV file format; thereby, preprocessing is done which is used to transform the raw data into a useful format. Repositories were classified based on the parameters such as liver and non-liver patients. Individually, both the datasets were imported to the Jupyter notebook, for ILPD, the training set and test set are 451 and 113 samples were considered, respectively. For BUPA dataset, 310 for training set and 27 elements for testing set. After the selection of the boundaries, the algorithms are determined for the execution using the current dataset. The outcomes of this classification model were isolated. The output can be shown distinctly analyzed as a classifier output.
Fig. 2 Flowchart for classification and prediction of the liver patient dataset
A Fact-Based Liver Disease Prediction by Enforcing …
573
Fig. 3 Histogram—numerical features for ILPD dataset attributes (No. of records on X-axis and range of features on Y-axis)
574
M. K. Ram et al.
Fig. 4 Histogram—numerical features for BUPA dataset attributes (Number of records on X-axis and range of features on Y-axis)
Figures 3 and 4 illustrate the pre-processed visualized histogram for ILPD and BUPA repositories after successful completion of the data regularization and conversion. This procedure will make all different values into one group. In the implementation phase, the proposed liver disease prediction an accurate classification algorithm has been considered and accomplished on the above two datasets. The algorithms are logistic regression, decision tree induction, Naive Bayes, SVM, RF, KNN, and gradient boosting selected for implementation on the datasets. The execution is performed by using Python Jupyter. After the successful completion of attaining all the supervised machine learning algorithms, the confusion matrix will be generated, analyzed, and discussed for all the intake algorithms.
A Fact-Based Liver Disease Prediction by Enforcing …
575
4.2 Feature Selection The feature selection is the finest idea in machine learning which highly impacts the performance of the classification model at concrete levels. It is an activity where one can select the features manually or in a mechanized way which provides us a prediction variable. The performance of the designed model may diminish if the data consists of irrelevant features. Feature selection methods are classified into three types such as filter-based, wrapper-based, and embedded.
4.3 Normalization In machine learning, normalization is one of the best techniques. Using this technique, all numeric attributes values are to be changed within the range of the same scale. In the current research, normalization is applied to two datasets (ILPD and BUPA).
4.4 Classification Using Machine Learning Algorithms Machine learning (ML) is one of the eminent domains in research that utilizes various algorithms to construct as well as analyze the models. Figure 5 describes the machine learning algorithms such as supervised learning techniques, unsupervised learning techniques, semi-supervised learning techniques, and reinforcement learning techniques [1]. In this study, the demonstration of different types of strategies applied to liver disorder prognostics is projected, several machine learning algorithms like multilayer perceptron, SVM, logistic regression, random forest, KNN, Naïve Bayes,
Fig. 5 Categories of machine learning
576
M. K. Ram et al.
AdaBoost, XGBoost, decision tree, gradient boosting, Bayesian and bagging, etc., are implemented [2]. The main task of this research is: 1. 2. 3.
Distinct classification algorithms are deployed for liver disease forecasting. Comparing different algorithms Finding the best algorithm for liver disease prediction.
In the current scenario, several classification algorithms are obtainable to forecast the disease at preceding stages and increase the life span of the sufferer [6]. For every algorithm, ten-fold cross-validation is furnished. The algorithms are discussed below.
4.4.1
Multilayer Perceptron Classifier
This classifier is originated from the Weka library and closely related to the neural network operator. There are three layers in MLP such as an input layer, an output layer, and in between hidden layers as depicted in below Fig. 7. In the research, all complex problems are easily solved by MLP. Some applications of MLP are health prediction, speech recognition, NLP, and image processing.
4.4.2
K-Nearest Neighbor (KNN) Classifier
The k-nearest neighbor is a statistical method that uses machine classification algorithms and regression lines. The outputs of this classifier are class membership and KNN regression is the value for the object. KNN applications are gene expression, get missing values, protein–protein prediction, pattern recognition, etc. KNN used to calculate the Euclidean distance to predict the class as follows: k (xi − yi )2 i=1
4.4.3
Logistic Regression Classifier
Logistic regression is an algorithm associated with machine learning and is used to solve the classification difficulties like prediction, detection, or used to diagnosis of an issue. It is a predictive algorithm based on probability. Various categories of logistic regression include binary logistic regression, multinomial logistic regression, and ordinal logistic regression. Binary logistic regression is used for liver disease prognosis that concerns intended variables which are present or not. In logistic regression, one of the cost functions is known as the sigmoid function.
A Fact-Based Liver Disease Prediction by Enforcing …
4.4.4
577
Decision Tree Classifier
In machine learning, a decision tree is one of the supervised algorithms that resolve classification and regression problems. Various categories of decision trees are like categorical variable decision tree and continuous variable decision tree. In the categorical variable decision, tree splitting is done through the Gini method whereas continuous variable decision looks after a continuous target variable.
4.4.5
Random Forest Tree Classifier
Random forest tree is used to solve classification and regression problems. Each decision tree trains on different observations. Random forests are very functional and give good accuracy in prediction.
4.4.6
Gradient Boosting Classifier
Gradient boosting is a distinct approach in machine learning algorithms that solved various classification and regression problems. This algorithm is used with CART trees of all base learners.
4.4.7
Support Vector Machine (SVM) Classifier
The SVM classifier was coined by Vapnik in 1979 [13]. In machine learning algorithms, one of the best-supervised learning algorithms is SVM. By using this algorithm, different prediction problems are solved. For the currently deployed model, SVM obtained the highest accuracy with more than 85%.SVM is based on learning theory and useful in ML. In Fig. 6 represents the performance accuracy of SVM classification algorithms. Figure 8 represents a 2D visualization of data points that are linearly separable related to SVM. There are three important concepts in SVM. 1. 2. 3.
Support Vectors: Data points near the hyperplane. Hyperplane: It is a decision plane which is divided between a set of classes. Margin: The space between two lines on the nearest data points of data classes. The SVM algorithm works as follows for a linearly separable dataset.
Pseudo Code for Support Vector Machine (SVM) 1.
Initialize the dataset D as (X 1 , y1 ), (X 2 , y2 ), … (X n , yn ), where X i is training tuple, X i ∈ Rd , yi is labeled class corresponding to X i and let yi ∈ (+1, −1)
578
M. K. Ram et al.
Fig. 6 System architecture
Fig. 7 Multilayer perception
2.
Corresponding to each training tuple X i let the weight Vector W i then the optimal hyperplane is computed as W · X +b =0
3.
(1)
where b is the bias. If any point lies above the separating hyperplane, then that point follows the following inequality
A Fact-Based Liver Disease Prediction by Enforcing …
579
Fig. 8 Two-dimensional case where the data points are linearly separable
W · X +b >0 4.
If any point lies below the separating hyperplane, then that point follows inequality W · X +b ri then Evaluate the solutions, sort the population and save the best location Perform the exploration by utilizing Eq. (7) end if Create new random individuals by utilizing 11 if ( f (xi ) < f (x∗ ) and rand < Ai then The updated position is obtained Update the values of Ai and ri by using Eq. (8) end if end for Detect the fittest individual x ∗ end while Output the fittest solution
Every population member is represented with a vector with only one dimension. Weights and biases are stored in each population member. And the whole population is a possible network structure. As stated before, the experiment utilized MLP with one hidden layer. The size of the one-dimensional vector is calculated using the following formula: n w = (n x × n h + n h ) + (n h × n o + n o )
(12)
where n w , n x , n h , and n o are the vector length, the size of the input feature, hidden unit number in the hidden layer, and the hidden unit (neuron) number in the output layer, respectively. Authors have used the mean squared error (MSE) function, which is computed as the mean of the squared differences between predicted and actual values. A mathematical perfect value is zero. n 1 (yi − yˆi )2 (13) MSE = n i=1 where y and yˆ denote the actual value and the predicted value, and n is total number of values. The process of weight optimization is visualized in Fig. 2.
Multi-layer Perceptron Training Using Hybridized Bat Algorithm Fig. 2 MLP training process by BAEABC
697
Start
Initialize weights and biases randomly
Set the weights and biases to MLP and calculate the accuracy of training dataset
Termination criteria met?
Optimize the vector of weights and biases by BAEABC
N
Y
Calculate the metrics of test dataset
End
The process of training an ANN using hybridized bat algorithm can be described with following pseudocode: This experiment has utilized two well-known datasets. The first dataset, Saheart is dataset composed of 462 male samples of heart disease in South Africa. Around two controls per hearth disease case is made. The second dataset that is used is the Vertebral dataset. Dr. Henrique da Mota gathered data for the set, and it contains in total 310 patients. The first dataset has nine and the second dataset six features.
698
L. Gajic et al.
Algorithm 2 Pseudo-code of the ANN training process with hybridized Bat algorithm Define population size n Initialize the wights and biases xi (i = 1, 2, . . . , n) Define the fitness function f (x) Define maximum number of iterations Max I ter Define counter t = 0 while t < Max I ter do for each solution do Set weights and biases of an ANN with solution’s decision variables Calculate fitness for current solution Optimize decision variables using Hybridized Bat algorithm end for Sort population by fitness function Save the current best solution end while Output the solution with minimal fitness function value from all iterations and test it on test dataset
The datasets are split into train and test sets. The train test covers two-thirds of the total samples, and the test set covers one-third of the total number of samples. The features are normalized utilizing the following formula: X norm =
X i − X min X max − X min
(14)
X norm represents the normalized value, and the ith input feature can be labeled as X i , while X min and X max represent the minimum and the maximum value of that particular feature, respectively. Each element of the vector is initialized between −1 and 1. The number of population members is set to 50, and the whole process is iterated for 250 iterations. 30 independent runs were made to obtain unbiased results. The control parameters of BAEABC are outlined in Table 1.
Table 1 Control parameters of BAEABC Parameter Notation Size of the population Maximum number of iteration Minimum frequency Maximum frequency Loudness Pulse rate Loudness adaption parameter Pulse rate adaption parameter
N MaxIter f min f max A r α γ
Value 50 250 0 2 0.9 0.5 0.9 0.9
Multi-layer Perceptron Training Using Hybridized Bat Algorithm
699
For performance measurement, five different metrics are used: 1 1 TP d FP (TP + FP)(TN + FN) 0 TP + TN Accuracy: accuracy = TP + FP + TN + FN TN Specificity: specificity = TN + FP TN Sensitivity: sensitivity = FN + TP √ Geometric mean (g-mean): g-mean = specificity × sensitivity
1. Area under the curve (AUC): AUC = 2. 3. 4. 5.
In the metrics equations, TN represents the true negative and TP the true positive value, while the false negative and false positive are denoted by FN and FP. The collected empirical results are matched to other metaheuristic methods (GOA, PSO, GA, BAT, ABC, FF, MBO, BBO, and FPA), and the results of these algorithms are taken from [25]. To make the right comparison, the simulation configuration is done similarly as it is described in the [25]. The results of Saheart dataset are presented in Table 2, and the results of Vertebral dataset are presented in Table 3. Figure 3 presents the comparison of the proposed BAEABC and other metaheuristic methods best results on five metrics. The accuracy convergence graph is depicted in Fig. 4, which shows the accuracy over the course of iterations. The best accuracy on the Saheart dataset and Vertebral dataset is achieved after 100 iterations. Taking into account, the simulation outcomes the suggested BAEABC method achieved high performance and accuracy compared to other metaheuristic approaches. The BAEABC performed with the best statistical result on all five metrics on the test with the Saheart dataset, while in the case of the Vertebral dataset test, the algorithm resulted in the highest AUC and accuracy. In the event of worst and average statistical results on the Saheart dataset evaluation, the proposed method has the highest results on the accuracy and AUC metrics. The Vertebral dataset evaluation resulted in the highest values on all metrics in case of the worst and average statistical results. The proposed method has 81.312% accuracy on the Saheart dataset 95.658% accuracy on the Vertebral dataset.
5 Conclusion In this work, the authors wanted to show that a combination such as a hybridized algorithm together with a neural network is very competitive compared to other optimization systems. The comparative analysis suggests that such a system provides
700
L. Gajic et al.
Table 2 MLP training results on Sahart dataset Algorithm Metric AUC Accuracy BAEABC
GOA
PSO
GA
BAT
ABC
FF
MBO
BBO
FPA
Best StdDev Mean Worst Best StdDev Mean Worst Best StdDev Mean Worst Best StdDev Mean Worst Best StdDev Mean Worst Best StdDev Mean Worst Best StdDev Mean Worst Best StdDev Mean Worst Best StdDev Mean Worst Best StdDev Mean Worst
0.81312 0.00385 0.78112 0.76128 0.78793 0.01379 0.75555 0.72685 0.79897 0.02382 0.76022 0.71546 0.78241 0.01242 0.76671 0.73326 0.78846 0.01025 0.76642 0.74252 0.81250 0.03388 0.74454 0.66560 0.77902 0.00402 0.77276 0.76086 0.79790 0.02234 0.76113 0.71599 0.78775 0.00905 0.77369 0.75036 0.80520 0.02171 0.76480 0.72489
0.79125 0.07168 0.73853 0.79382 0.79114 0.02378 0.73122 0.67722 0.77848 0.02768 0.72658 0.64557 0.75949 0.01757 0.71814 0.68354 0.75949 0.01840 0.72405 0.68987 0.76582 0.02120 0.71160 0.67722 0.74051 0.01259 0.71730 0.69620 0.76582 0.02126 0.72932 0.68354 0.75316 0.01395 0.72911 0.69620 0.76582 0.02307 0.72869 0.68354
Specificity
Sensitivity
G-Mean
0.95198 0.02798 0.83712 0.77982 0.57407 0.03809 0.49383 0.42593 0.95192 0.05845 0.85096 0.70192 0.94231 0.03645 0.82372 0.75962 0.89423 0.03292 0.83654 0.76923 0.95192 0.05373 0.82276 0.72115 0.55556 0.01709 0.51667 0.48148 0.89423 0.04101 0.83782 0.75000 0.91346 0.02976 0.83910 0.78846 0.93269 0.04802 0.84231 0.77885
0.92645 0.01685 0.85254 0.78152 0.91346 0.02645 0.85449 0.79808 0.61111 0.06018 0.48704 0.38889 0.55556 0.03587 0.51481 0.40741 0.59259 0.03248 0.50741 0.42593 0.61111 0.07355 0.49753 0.29630 0.84615 0.01650 0.82147 0.79808 0.62963 0.05246 0.52037 0.40741 0.61111 0.04508 0.51728 0.40741 0.64815 0.05728 0.50988 0.38889
0.72217 0.03179 0.65425 0.78066 0.71238 0.02879 0.64913 0.58653 0.68990 0.02874 0.64126 0.57689 0.67792 0.01702 0.65030 0.61048 0.67779 0.01780 0.65086 0.61715 0.71909 0.03546 0.63644 0.53109 0.68172 0.01288 0.65137 0.62361 0.70408 0.02611 0.65881 0.59706 0.69696 0.02235 0.65776 0.60359 0.71050 0.02813 0.65336 0.59377
Multi-layer Perceptron Training Using Hybridized Bat Algorithm Table 3 MLP training results on Vertebral dataset Algorithms Metric AUC Accuracy BAEABC
GOA
PSO
GA
BAT
ABC
FF
MBO
BBO
FPA
Best StdDev Mean Worst Best StdDev Mean Worst Best StdDev Mean Worst Best StdDev Mean Worst Best StdDev Mean Worst Best StdDev Mean Worst Best StdDev Mean Worst Best StdDev Mean Worst Best StdDev Mean Worst Best StdDev Mean Worst
0.95658 0.00485 0.94138 0.93385 0.95140 0.00501 0.94060 0.93204 0.94108 0.01703 0.91911 0.87570 0.94495 0.00465 0.93508 0.92602 0.94796 0.04299 0.91209 0.77247 0.95441 0.01995 0.92189 0.87484 0.94968 0.00434 0.94072 0.92946 0.94882 0.01766 0.92346 0.87957 0.95097 0.00660 0.93875 0.92645 0.94065 0.00836 0.92433 0.90538
0.91114 0.00822 0.87195 0.85385 0.88679 0.00885 0.86321 0.84906 0.90566 0.03349 0.85031 0.77358 0.88679 0.01049 0.85786 0.83962 0.89623 0.05471 0.83553 0.68868 0.90566 0.03265 0.84371 0.77358 0.87736 0.00810 0.86384 0.84906 0.90566 0.02796 0.85660 0.76415 0.88679 0.01106 0.86730 0.84906 0.88679 0.01506 0.86730 0.83962
701
Specificity
Sensitivity
G-Mean
0.92428 0.02896 0.88517 0.87895 0.90667 0.00948 0.88444 0.86667 0.90323 0.07023 0.78925 0.61290 0.83871 0.02198 0.82151 0.77419 0.90323 0.05272 0.81075 0.67742 0.93548 0.05522 0.81828 0.67742 0.90667 0.01259 0.87911 0.85333 0.87097 0.06460 0.78387 0.64516 0.87097 0.02896 0.82473 0.74194 0.83871 0.03425 0.77742 0.70968
0.96587 0.03585 0.90534 0.88568 0.87097 0.02553 0.81183 0.77419 0.97333 0.05628 0.87556 0.76000 0.92000 0.01706 0.87289 0.84000 0.93333 0.08269 0.84578 0.62667 0.94667 0.05216 0.85422 0.74667 0.83871 0.01794 0.82688 0.77419 0.98667 0.05083 0.88667 0.73333 0.93333 0.01836 0.88489 0.84000 0.96000 0.02351 0.90444 0.86667
0.88312 0.05897 0.85638 0.84125 0.87547 0.01307 0.84723 0.82540 0.87203 0.03254 0.82942 0.73441 0.87203 0.01026 0.84664 0.82540 0.87841 0.04071 0.82617 0.68533 0.87375 0.02786 0.83483 0.76031 0.86559 0.00860 0.85250 0.82540 0.89515 0.02750 0.83203 0.77598 0.87547 0.01285 0.85403 0.83163 0.86559 0.01676 0.83816 0.80215
702
L. Gajic et al.
Fig. 3 Algorithm comparison of best result on five metrics
optimal results, better than the other proposed solutions. The main goal of this algorithm was to improve and speed up the process of training an ANN using calculated values of weights and biases instead of letting the algorithm itself to produce values of weights and biases using backpropagation, which is a much slower process as already mentioned in the paper. It is confirmed that swarm intelligence algorithms (metaheuristics) indeed can speed up the learning process of artificial neural networks, with an equal amount of success as in the other optimization problems.
Multi-layer Perceptron Training Using Hybridized Bat Algorithm
703
Fig. 4 Convergence graph
References 1. W.S. McCulloch, W. Pitts, A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5(4), 115–133 (1943) 2. M.S. Shanker, Using neural networks to predict the onset of diabetes mellitus. J. Chem. Inform. Computer Sci. 36(1), 35–41 (1996) 3. J.J. Palop, L. Mucke, Amyloid-β-induced neuronal dysfunction in alzheimer’s disease: from synapses toward neural networks. Nature Neurosci. 13(7), 812–818 (2010) 4. O. Er, F. Temurtas, A.Ç. Tanrıkulu, Tuberculosis disease diagnosis using artificial neural networks. J. Med. Syst. 34(3), 299–302 (2010) 5. Y. Lu, S. Yi, N. Zeng, Y. Liu, Y. Zhang, Identification of rice diseases using deep convolutional neural networks. Neurocomputing 267, 378–384 (2017) 6. B. Liu, Y. Zhang, D. He, Y. Li, Identification of apple leaf diseases based on deep convolutional neural networks. Symmetry 10(1), 11 (2018)
704
L. Gajic et al.
7. J. Orbach, Principles of neurodynamics. Perceptrons and the theory of brain mechanisms. Arch. General Psychiatry 7(3), 218–219 (1962) 8. Y. Freund, R.E. Schapire, Large margin classification using the perceptron algorithm. Mach. Learn. 37(3), 277–296 (1999) 9. R. Hecht-Nielsen, Theory of the backpropagation neural network, in Neural Networks for Perception (Elsevier, Amsterdam, 1992) 10. F.-C. Chen, Back-propagation neural networks for nonlinear self-tuning adaptive control. IEEE Control Syst. Mag. 10(3), 44–48 (1990) 11. M. Dorigo, M. Birattari, T. Stutzle, Ant colony optimization. IEEE Comput. Intell. Mag. 1(4), 28–39 (2006) 12. X.-S. Yang, A new metaheuristic bat-inspired algorithm, in Nature Inspired Cooperative Strategies for Optimization (NICSO 2010) (Springer, Berlin, 2010), pp. 65–74 13. D. Karaboga, B. Basturk, Artificial bee colony (abc) optimization algorithm for solving constrained optimization problems, in International Fuzzy Systems Association World Congress (Springer, Berlin, 2007), pp. 789–798 14. A.A. Heidari, S. Mirjalili, H. Faris, I. Aljarah, M. Mafarja, H. Chen, Harris hawks optimization: algorithm and applications. Future Gener. Comput. Syst. 97, 849–872 (2019) 15. T. Bezdan, M. Zivkovic, E. Tuba, I. Strumberger, N. Bacanin, M. Tuba, Multi-objective task scheduling in cloud computing environment by hybridized bat algorithm, in International Conference on Intelligent and Fuzzy Systems (Springer, 2020), pp. 718–725 16. T. Bezdan, M. Zivkovic, M. Antonijevic, T. Zivkovic, N. Bacanin, Enhanced flower pollination algorithm for task scheduling in cloud computing environment, in Machine Learning for Predictive Analysis, ed. by A. Joshi, M. Khosravy, N. Gupta (Springer, Singapore, 2021), pp. 163–171 17. N. Bacanin, T. Bezdan, E. Tuba, I. Strumberger, M. Tuba, M. Zivkovic, Task scheduling in cloud computing environment by grey wolf optimizer, in 2019 27th Telecommunications Forum (TELFOR) (IEEE, 2019), pp. 1–4 18. M. Zivkovic, N. Bacanin, E. Tuba, I. Strumberger, T. Bezdan, M. Tuba, Wireless sensor networks life time optimization based on the improved firefly algorithm, in 2020 International Wireless Communications and Mobile Computing (IWCMC) (IEEE, 2020), pp. 1176–1181 19. N. Bacanin, E. Tuba, M. Zivkovic, I. Strumberger, M. Tuba, Whale optimization algorithm with exploratory move for wireless sensor networks localization, in International Conference on Hybrid Intelligent Systems (Springer, Berlin, 2019), pp. 328–338 20. M. Zivkovic, N. Bacanin, T. Zivkovic, I. Strumberger, E. Tuba, M. Tuba, Enhanced grey wolf algorithm for energy efficient wireless sensor networks, in 2020 Zooming Innovation in Consumer Technologies Conference (ZINC) (IEEE, 2020), pp. 87–92 21. T. Bezdan, M. Zivkovic, E. Tuba, I. Strumberger, N. Bacanin, M. Tuba, Glioma brain tumor grade classification from MRI using convolutional neural networks designed by modified FA”, in International Conference on Intelligent and Fuzzy Systems (Springer, Berlin, 2020), pp. 955–963 22. E.T.I.S. Nebojsa Bacanin, T. Bezdan, M. Tuba, Optimizing convolutional neural network hyperparameters by enhanced swarm intelligence metaheuristics. Algorithms 13(3), 67 (2020) 23. N. Bacanin, T. Bezdan, E. Tuba, I. Strumberger, M. Tuba, Monarch butterfly optimization based convolutional neural network design. Mathematics 8(6), 936 (2020) 24. T. Bezdan, E. Tuba, I. Strumberger, N. Bacanin, M. Tuba, Automatically designing convolutional neural network architecture with artificial flora algorithm, in ICT Systems and Sustainability, ed. by M. Tuba, S. Akashe, A. Joshi (Springer, Singapore, 2020), pp. 371–378 25. A.A. Heidari, H. Faris, I. Aljarah, S. Mirjalili, An efficient hybrid multilayer perceptron neural network with grasshopper optimization. Soft Comput. 23(17), 7941–7958 (2019) 26. J.C. Duchi, E. Hazan, Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) 27. M. D. Zeiler, Adadelta: an adaptive learning rate method (2012) 28. D.P. Kingma, J. Ba, Adam: a method for stochastic optimization (2014)
Multi-layer Perceptron Training Using Hybridized Bat Algorithm
705
29. D. Karaboga, B. Akay, A modified artificial bee colony (ABC) algorithm for constrained optimization problems. Appl. Soft Comput. 11(3), 3021–3031 (2011) 30. M. Tuba, N. Bacanin, Hybridized bat algorithm for multi-objective radio frequency identification (RFID) network planning, in 2015 IEEE Congress on Evolutionary Computation (CEC) (2015), pp. 499–506
Bayes Wavelet-CNN for Classifying COVID-19 in Chest X-ray Images S. Kavitha and Hannah Inbarani
Abstract In digital image processing, removing noise from the images is essential for better classification, especially for medical images. A novel hybrid approach of Bayes wavelet transform and convolutional neural network (BayesWavT-CNN) is used for classifying the chest X-ray images into ordinary and COVID-19 images. Bayes wavelet transform denoising method is used to denoise the images before classification. A simple eight-layer CNN is developed for the classification of denoised images. The proposed model achieved the highest test accuracy of 97.10% for the 20 epochs, and it is compared with the SVM and CNN developed on the same images without denoising. The results indicate that the developed hybrid model provides excellent performance in accuracy and receiver operator characteristic (ROC) curve analysis. Keywords Denoising · COVID-19 image data · Bayes wavelet transform · Simple CNN · Support vector machine (SVM) · Receiver operator characteristic (ROC) curve
1 Introduction The coronavirus infection is spreading very rapidly all over the world through droplets in the air. As of November 9, 2020, there have been more than 50,000,000 confirmed cases of coronavirus disease is reported globally to the World Health Organization. The increasing number of COVID-19 patient data creates a challenge in the research of medical image processing to detect the disease at the earliest. So far, several research works have been done to expose the applications of artificial intelligence techniques in the detection of COVID-19 cases [1, 2]. Recently, CNN is used in many of the research works to detect the coronavirus infections present in the clinical images [3, 4]. In this work, a deep learning model CNN is proposed which is
S. Kavitha (B) · H. Inbarani Department of Computer Science, Periyar University, Salem, Tamil Nadu, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_55
707
708
S. Kavitha and H. Inbarani
integrated with wavelet transform to detect the infections in the chest X-ray images with less computational power and speed. Usually, digital images such as medical images are affected by the noise due to the environment and transmission media, etc. [5]. In digital image processing, it is required to pre-process the images to remove the noise present in the digital images. It is an important step to be taken before images are analyzed for features extraction and classification [6]. To obtain high-quality images in the process of image, denoising from the noisy images is a classical problem as it is difficult to distinguish noise, edge, and texture in the noisy images [7]. Many researchers have proposed different denoising methods such as smoothing filters, frequency domain denoising, and wavelet transform methods to remove the noise from different types of image datasets [8–11]. This article presents the Bayes wavelet transform to remove the noise in the chest X-ray images since its performance is desirable based on the peak signal to noise ratio (PSNR), mean squared error (MSE) metrics, and visual image quality. Deep learning in artificial intelligence plays an important role in processing images in a better way. Though many algorithms exist for classification such as logistic regression, decision tree, and support vector machines, etc. CNN is a wellknown deep learning network used by many researchers for image classification and recognition problems. By applying relevant filters, CNN can capture the spatial and temporal dependencies in an image [12]. CNN is a suitable deep learning technique to deal with complex dataset compared to traditional neural network [13]. The objective of this paper is to present the novel hybrid approach of combining the Bayes wavelet transform denoising method with the simple CNN to classify the COVID-19 X-ray image dataset. The remaining part of the paper is structured as follows: Sect. 2 describes the literature study, and the next Sect. 3 is followed by the motivation of research. Section 4 explains the dataset used for analysis, and the proposed methodology is presented in Sect. 5. Section 6 discusses the findings, and finally, Sect. 7 concludes the research work.
2 Related Work There are various studies in pre-processing as reported in Table 1 focusing on different denoising techniques which are presented in related work along with the research works done on the classification of diseases present in the medical images using machine learning and deep learning algorithms.
Bayes Wavelet-CNN for Classifying COVID-19 …
709
Table 1 Related works for this study Authors
Techniques
Result
Vikas Gupta et al. [14]
BayesInvariant wavelet, VisuShrink and Sureshrink methods to denoise the images
BayesInvariant wavelet-based method performance is best compared with the VisuShrink and Sureshrink methods
Gurmeet Kaur et al. [15]
Wavelet transform and filtering The wavelet transform method techniques is good compared with the filtering methods
Jyotsna Patil et al. [16]
Wavelet transform and spatial filtering
It is concluded that the wavelet transform is best suitable than spatial filters as these filters causes over smoothing and blurring of image
Suvajit Dutta et al. [23]
Feed-forward neural network, Deep neural network, CNN
DNN performs better than other networks
Nitish Srivastava et al. [24]
Neural network with dropout function
Dropout overcomes the problem of over fitting
Dominik Scherer et al. [25]
Max pooling, sub-sampling
It is concluded that max-pooling operation is better than sub-sampling operation for capturing invariance in images
Sathish et al. [26]
Fuzzy C-means clustering and CNN
Achieved 87.11% accuracy for classifying tumors in brain MRI images
El Boustani et al. [27]
Probabilistic neural network It is concluded that the and CNN along with root mean RMSprop optimizer performs square propagation (RMSprop) well with the highest accuracy optimizer
Abbas et al. [3]
CNN with decomposition and transfer learning to classify COVID-19 in chest X-ray images
Wang et al. [4]
CNN with transfer learning to Achieved a total accuracy of detect COVID-19 infections on 89.5% CT scan images
Liu et al. [28]
The ensemble of the bagged tree algorithm with statistical textural features to detect COVID-19
The classification accuracy is 94.16% attained
Perumal et al. [29]
Feature extraction using Haralick features with transfer learning to detect COVID-19 like pneumonia in CT images
An accuracy of 93% is achieved
An accuracy of 93.1% is achieved
710
S. Kavitha and H. Inbarani
3 Research Motivation Since the COVID-19 infections are rapidly increasing every day, it is a challenge for the researchers to find a better classification approach to classify the COVID19 image dataset. Due to the noise present in the digital images, it is required to denoise the images as a pre-processing step before using it for classification. In this work, the Bayesian approach [6] of wavelet transform denoising is used because it gives better quality images by removing the irrelevant data for feature extraction and classification. Hence, in this methodology, the Bayes wavelet model is integrated with CNN with less number of layers to classify the images in a better way with less computational power and speed.
4 COVID-19 Chest X-Ray Dataset The data set has nearly 350 Chest X-ray scans which are categorized into two classes as COVID and normal. Among the 350 images, 234 images are normal X-ray images, and 110 images are COVID-19 X-ray images. The images are grayscale, and the dimensions of the images are 256 × 256. The image dataset is split into the ratio of 80 and 20% for the training and testing process, respectively.
5 Proposed Method The novel hybrid approach of combining the Bayes wavelet transform denoising method with simple CNN is proposed in this work for the better classification of images. Figure 1 shows the proposed hybrid model which consists of three phases: (i)
(ii)
(iii)
In the first phase, Gaussian noise is added in the images with the sigma value as 0.15 because it is evaluated that the wavelet-based denoising method provides high-quality images for Gaussian noise than the speckle noise. In the second phase, a Bayes wavelet transform method is adopted to eliminate the noise added in the images. In this method, the noisy image signal is first decomposed using the discrete wavelet transform (DWT) into the various wavelet coefficients. These coefficients are threshold by using the BayesShrink thresholding function, and then, the denoised image is achieved by taking the inverse transform of the threshold coefficients. In the third phase, a simple CNN architecture is developed with three convolutional layers, two max-pooling layers, two dense layers, and dropout function. Finally, the denoised images are classified using the CNN into normal and COVID-19 images, and the results are compared with the SVM [30] and CNN developed without applying Bayes wavelet transform.
Bayes Wavelet-CNN for Classifying COVID-19 …
COVID-19 chest X-ray images Data set (344x256x256)
711
Add Gaussian noise
Read the images (Gray scale)
Architecture of Convolutional Neural Network Pooling Layer (2x2)
Convolutional Layer (64 filters)
Convolutional Layer (32 filters)
Flatten Layer
Pooling Layer (2x2)
Dense Layer
Convolutio nal Layer (16 filters)
Dense Layer
Bayes Wavelet Transform
Classification (COVID or Normal)
Fig. 1 Methodology of proposed method
5.1 Adding Gaussian Noise to the Images During the first phase, the input images from the dataset are added with Gaussian noise. It is an additive noise which is the sum of random Gaussian distributed noise and true pixel values [17]. It is represented using the probability density function p as: p(g) =
(g−μ)2 1 √ e− 2σ 2 σ 2π
(1)
where g is the Gaussian random variable (graylevel), µ is a mean of average value of g, σ is a standard deviation, and and σ 2 is the variance of g in which the Gaussian distribution is characterized by the mean μ and the variance σ 2 .
5.2 Denoising the Images Using Bayes Wavelet Transform BayesShrink [18] soft thresholding data-driven method is applied with three-level wavelet decomposition which sets the different thresholds for each sub-band to denoise the chest X-ray images added with Gaussian noise. This method estimates a threshold value to minimize the Bayesian risk by assuming the Gaussian distribution for the wavelet coefficients in each detail sub-band [6]. The threshold value is given by the following equation:
712
S. Kavitha and H. Inbarani
Tb =
σv2 σx2
if σv2 < σx2 otherwise max W j
(2)
where W j is the wavelet coefficients in each scale j, σ v 2 is the noise variance which is estimated from the sub-band by the median estimator, and σ x 2 is the original image variance. The variance of the degraded image for each sub-band can be calculated as: σ y2 =
J 1 2 W J j=1 j
(3)
where J is the total number of coefficients in the sub-band, and σ x is calculated by the following equation: σx =
max σ y2 − σv2 , 0
(4)
5.3 Proposed Simple Convolutional Neural Network A simple CNN is adopted in this model with three convolutional layers, two pooling layers, and two dense layers for classifying the denoised images. The convolutional layer has several filters that perform convolution operation on the input image to extract features. Then apply the activation function on the output to get a feature map [19]. The proposed CNN has 16, 32 and 64 filters in the first, second, and third convolutional layers, respectively. Each convolutional layer has a 3 × 3 kernel size. The proposed model uses the Relu activation function [20] in each convolutional layer. It is defined by the following equation: g(x) = max(0, x) =
x if x > 0 0 if x 0
(5)
The above equation states that it has output zero if the input is less than or equal to zero, and it outputs a linear function if x is greater than zero. Two max-pooling layers [21] are used in this model with 2 × 2 filters to downsample the images without losing any information. Max pooling decreases the dimensions of activation maps and increases the strength of feature extraction by selecting the highest value from the group of neurons at the previous layer. A flattening layer is placed between the convolution layer and the dense layer. It transforms the twodimensional matrix of a vector which is fed into the dense layer for classification. This model adopted two dense layers for the classification. The dense layer connects every neuron of the previous layer with every neuron of the current layer. The last
Bayes Wavelet-CNN for Classifying COVID-19 …
713
fully connected layer is an output layer which gives the output from the number of classes. The Softmax regression [22] is used for the classification task as it generates a well-performed probability distribution of the outputs.
6 Result Analysis 6.1 Findings of Bayes Wavelet Transform In the first and second phases of the model, the images are pre-processed for noise reduction. Figure 2 shows the sample images of the chest X-ray image dataset, Gaussian noisy image, and the denoised image using Bayes wavelet transform. The above figure shows that the quality of the Bayes wavelet transform method is visually good which is also evaluated using PSNR and MSE, and the results are given in Table 2. Table 2 shows the average PSNR and MSE values of Bayes wavelet transform method applied to the noisy images. Wavelet transforms denoised images to provide the higher PSNR and lowest MSE value compared with the PSNR and MSE values of original and noisy images. If the value of PSNR is high after denoising, then the quality of the reconstructed image from the noisy image is high which is good for better classification.
Fig. 2 Sample image of Gaussian noise and wavelet transform
Table 2 Average PSNR and MSE values of Bayes wavelet transform Method
PSNR/dB
MSE
Before denoising (between original and noisy images)
16.47
0.0225
Wavelet transform-BayesShrink method (between original and denoised images)
29.51
0.0011
714
S. Kavitha and H. Inbarani
6.2 Performance Measurements of Bayes Wavelet-CNN In the third phase of the model, the denoised images are classified using simple CNN, and the results are compared with the SVM and CNN developed without denoising the images. The performance of the proposed approach is analyzed through accuracy, classification report, and ROC curve. Figure 3 illustrates the graph which shows the accuracy results of simple CNN and BayesWavT-CNN. As shown in Fig. 3, the proposed BayesWavT-CNN attained the highest test accuracy of 97.10% for the 20 epochs which represents the rate of correct classification. But the CNN developed without wavelet transform achieves the 91.30% as test accuracy for the same number of epochs. Table 3 illustrates the accuracy results of SVM, CNN before denoising, and CNN after denoising (BayesWavT-CNN). As illustrated in Table 3, the proposed BayesWavT-CNN achieves the highest training and testing accuracy compared with the SVM and CNN developed without denoising of the images. The metric is a classification report which is given in Table 4. It is used to measure the predictions of the classification algorithm using precision, recall, and F1-score which are calculated using the terms true positive and false positive, true negative and false negatives. For a class, the precision is computed as the number of true positives divided by the total number of items labeled as positive, whereas the recall measures the fraction of positives that are correctly identified. F1 score is the weighted average of precision and recall.
Fig. 3 a Accuracy results of simple CNN and b accuracy results from BayesWavT-CNN
Table 3 COVID-19 accuracy results (training and testing accuracy) of SVM, simple CNN and BayesWavT-CNN
Method
Train accuracy (%)
Test accuracy (%)
Support vector machine (SVM)
94.90
89.85
Simple CNN
97.45
91.30
BayesWavT-CNN (Proposed method)
98.18
97.10
Bayes Wavelet-CNN for Classifying COVID-19 …
715
Table 4 Classification report of BayesWavT-CNN, CNN, and SVM Classification report
SVM
Simple CNN
BayesWavT-CNN (proposed method)
Precision
0.95
1.00
1.00
Recall
0.75
0.75
0.92
F1-score
0.84
0.86
0.96
Precision
0.88
0.88
0.96
Recall
0.98
1.00
1.00
F1-score
0.93
0.94
0.98
Class 0 (COVID)
Class 1 (Normal)
As given in Table 4, the precision value of 1.00 for a class 0 (COVID) denotes that every image labeled as class 0 belongs to class 0. The recall value of 1.00 means that every image from class 1 is labeled as belongs to class 1. Hence, the quality of the predictions made by BayesWavT-CNN is good based on precision, recall, and F1-score than the SVM and CNN. Another parameter, ROC curve evaluation is given in Fig. 4. It shows the false positive and true positive rates of CNN, BayesWavT-CNN, and SVM. From the ROC curve analysis, the proposed method BayesWavT-CNN classifies the images well compared with SVM and CNN. Figure 5 shows the comparisons of the proposed method with SVM and CNN. From the figure, it is shown that the performance of Bayes Wavelet-CNN is good than SVM and CNN.
Figure 4 a ROC curve of BayesWavT-CNN and CNN and b ROC curve of SVM
716
S. Kavitha and H. Inbarani
Accuracy Percentage
100 98 96 94 92
Training Accuracy
90
Testing Accuracy
88 86
SVM
Simple CNN
BayesWavT-CNN
Classification Methods
Fig. 5 Performance comparisons of SVM, CNN, and BayesWavT-CNN
7 Conclusion In this article, the novel classification approach based on the Bayes wavelet method and simple CNN (BayesWavT-CNN) is proposed for classifying the chest X-ray images into ordinary and COVID-19 images. Initially, the images are pre-processed for noise reduction using Bayes wavelet transform. Next, the simple CNN is developed on the denoised images to create the classification model. Finally, the proposed BayesWavT-CNN is compared with the SVM and simple CNN for the performance analysis. From the experimental outcomes, it is shown that the proposed model performs well than the SVM and CNN. In the future, the model can be improved to achieve the highest accuracy.
References 1. L. Li, L. Qin, Z. Xu, Y. Yin, X. Wang, B. Kong, J. Bai, Y. Lu, Z. Fang, Q. Song, Artificial intelligence distinguishes COVID-19 from community acquired pneumonia on chest CT. Radiology 200905 (2020) 2. F. Shi, J. Wang, J. Shi, Z. Wu, Q. Wang, Z. Tang, K. He, Y. Shi, D. Shen, Review of artificial intelligence techniques in imaging data acquisition, segmentation and diagnosis for covid-19. IEEE Rev. Biomed. Eng. (2020) 3. A. Abbas, M.M. Abdelsamea, Classification of COVID-19 in chest X-ray images using DeTraC deep convolutional neural network. Appl. Intell. (2020) 4. S. Wang, B. Kang, J. Ma, X. Zeng, M. Xiao, J. Guo, M. Cai, J. Yang, Y. Li, X. Meng, A deep learning algorithm using CT images to screen for corona virus disease (COVID-19), in MedRxiv (2020) 5. J.M. Sanches, J.C. Nascimento, J.S. Marques, Medical image noise reduction using the SylvesterLyapunov equation. IEEE Trans. Image Process. 17(9), 1522–1539 (2018) 6. P.B. Alisha, K. Gnana Sheela, Image denoising techniques—an overview. IOSR J. Electron. Commun. Eng. (IOSR-JECE) 11(1) (2016). e-ISSN: 2278-2834, ISSN: 2278-8735 7. L. Fan, F. Zhang, H. Fan, C. Zhang, Brief review of image denoising techniques. Visual Comput. Ind. Biomed. Art 2, Article number: 7 (2019)
Bayes Wavelet-CNN for Classifying COVID-19 …
717
8. S.G. Chang, B. Yu, M. Vetterli, Spatially adaptive wavelet thresholding with context modeling for image denoising. IEEE Trans. Image Process. 9(9), 1522–1531 (2000) 9. A. Pizurica, W. Philips, I. Lemahieu, M. Acheroy, A joint inter- and intrascale statistical model for Bayesian wavelet based image denoising. IEEE Trans. Image Process. 11(5), 545–557 (2002) 10. L. Gondara, Medical image denoising using convolutional denoising autoencoders, in IEEE 16th International Conference on Data Mining Workshops (2016), pp. 241–246 11. L. Zhang, P. Bao, X. Wu, Multiscale LMMSE-based image denoising with optimal wavelet selection. IEEE Trans. Circ. Syst. Video Technol. 15(4), 469–481 (2005) 12. S. Saha, A comprehensive guide to convolutional neural networks—the ELI5 way, towards data science (2015) 13. M. Xin, Y. Yong Wang, Research on image classification model based on deep convolution neural network. EURASIP J. Image Video Process. 40 (2019) 14. V. Gupta, R. Mahle, R.S. Shriwas, Image denoising using wavelet transform method, in Tenth International Conference on Wireless and Optical Communications Networks (WOCN) (2013), pp. 1–4 15. G. Kaur, R. Kaur, Image de-noising using wavelet transform and various filters. Int. J. Res. Comput. Sci. 2(2), 15–21 (2012) 16. J. Patil, S. Jadhav, A comparative study of image denoising techniques. Int. J. Innov. Res. Sci. Eng. Technol. 2(3) (2013) 17. R.C. Gonzalez, R.E. Woods, Digital Image Processing, 2ns edn. (Pearson Education, 2005) 18. K. Tharani, C. Mani, I. Arora, A comparative study of image denoising methods using wavelet thresholding techniques. Int. J. Eng. Res. Appl. 6(12) (2016). ISSN: 2248-9622 19. S. Albawi, T.A. Mohammed, S. Al-Zawi, Understanding of a convolutional neural network, in International Conference on Engineering and Technology (ICET), Antalya (2017), pp. 1–6 20. V. Nair, G.E. Hinton, Rectified linear units improve restricted Boltzmann machines, in International Conference on Machine Learning (2010), pp. 807–814 21. Y. Boureau, J. Ponce, Y. Le Cun, A theoretical analysis of feature pooling in visual recognition, in International Conference of Machine Learning (2010), pp. 111–118 22. T. Guo, J. Dong, H. Li, Y. Gao, Simple convolutional neural network on image classification, in IEEE 2nd International Conference on Big Data Analysis (2017) 23. S. Dutta, B.C.S. Bonthala, S. Rai, V. Vijayarajan, A comparative study of deep learning models for medical image classification, in IOP Conference Series: Materials Science and Engineering 263 (2017) 24. N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(56), 1929−1958 (2014) 25. D. Scherer, A. Muller, S. Behnke, Evaluation of pooling operations in convolutional architectures for object recognition, in 20th International Conference on Artificial Neural Networks (ICANN), Lecture Notes in Computer Science, vol. 6354 (Springer, Berlin, 2010) 26. P. Sathish, N.M. Elango, V. Thirunavukkarasu, Piecewise fuzzy C-means clustering and deep convolutional neural network for automatic brain tumour classification using MRI images, Test Eng. Manage. 83, 3729–3736 (2020) 27. A. El Boustani, M. Aatila, E. El Bachari, A. El Oirrak, MRI brain images classification using convolutional neural networks, in Advanced Intelligent Systems for Sustainable Development (AI2SD’2019). AI2SD 2019. Advances in Intelligent Systems and Computing, vol. 1105, ed. by M. Ezziyyani (Springer, Berlin, 2020) 28. C. Liu, X. Wang, C. Liu, Q. Sun, W. Peng, Differentiating novel coronavirus pneumonia from general pneumonia based on machine learning. BioMed. Eng. OnLine 19, 66 (2020) 29. V. Perumal, V. Narayanan, S.J.S. Rajasekar, Detection of COVID-19 using CXR and CT images using transfer learning and Haralick features. Appl. Intell. (2020) 30. X. Sun, L. Liu, H. Wang, W. Song, J. Lu, Image classification via support vector machine, in 2015 4th International Conference on Computer Science and Network Technology (ICCSNT), Harbin (2015), pp. 485–489
Survey of Color Feature Extraction Schemes in Content-Based Picture Recovery System Kiran H. Patil and M. Nirupama Bhat
Abstract The method of retrieving most visually similar pictures from a large database or group of picture files is called content-based picture or image recovery (CBIR) system. It is among one of the challenging research areas of multimedia computing and information recovery. In the past few decades, many different picture matching, indexing and recovery algorithms have been developed. In CBIR systems, recovery is based on matching visual content or characteristics of query picture with picture database using a picture—picture similarity calculation. The term “Content” in CBIR refers to the visual content of a picture that means texture, shape, color, etc., or any other feature/descriptor that can be acquired from the picture itself. In this paper, a survey of feature recovery techniques using color feature in the last 10 years is presented and summarized year wise. Keywords Color moments · Color correlogram · Color histogram · Feature recovery
1 Introduction Image processing is one of the significant research areas in computer science and engineering. Recently, image processing is impacting a lot during the COVID pandemic also. There is much importance achieved by image/picture/color detection, especially for the lungs images as the COVID disease spreads in the lungs. Image processing also has a number of applications such as scanners for product identification, identification of human beings using thumb impression or iris scan, etc. Content-based picture/image recovery (CBIR) is the application of computer vision to picture recovery problems. Nowadays, there is a tremendous use and generation of digital pictures by the common user of electronic devices such as smartphones, Internet, medical devices, e-commerce, academia, etc. Therefore, searching K. H. Patil (B) · M. N. Bhat CSE Department, VFSTR Deemed To Be University, Vadlamudi, Guntur, Andhra Pradesh, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_56
719
720
K. H. Patil and M. N. Bhat
and recovery of desired pictures in a large database are very important for users from various fields like academia, marketing, hospitals, military tasks, crime prevention, geography, etc. There are two main areas in CBIR systems such as computer vision and database systems. Computer vision involves image processing techniques such as obtaining the picture characteristics or descriptors and picture matching. Image processing and image transformations are used to extract picture characteristics or descriptors. There are certain steps involved like importing the image for analysis using computer tools and then extracting the necessary information. The database system includes database indexing, searching and recovery techniques. Knowledge discovery in databases (KDD) process mainly focuses on the data cleaning, its integration and then relevant data selection tasks. Afterwards, it consolidates that data in aggregate form for a pattern evaluation, and then, based on this information, standard techniques are used to mine the data. In this article, the techniques based on content-based image retrieval are reviewed in which input image and image datasets are taken for extraction of features such as texture, color, shape, etc. Hence, the article focuses only on a set of images and it’s processing, obtaining the best information from the set of images to retrieve similar images.
2 Working of CBIR A typical CBIR system consists of following main modules feature recovery, feature storage, similarity/distance measure, indexing and recovery.
(1) (2) (3) (4)
Feature Recovery: Analyze picture database to extract feature specific information. Feature Storage: Provide sufficient storage for the extracted information and also help to improve the searching speed. Similarity/Distance Measure: It is the difference between the picture database and query picture for determining relevance between pictures. Indexing and Recovery: The indexing scheme gives an efficient searching method for a picture database.
The working of CBIR is shown in Fig. 1. The characteristics of the pictures in the picture database are identified and extracted, which can also be articulated as multidimensional feature vectors. The user supplies a query picture for the CBIR system. The characteristics of the query picture are also extracted and expressed as feature vectors. Using these feature vectors, the distance measure is calculated between the query picture and picture database. Finally, an indexing scheme [1] is used to retrieve the output pictures. Performance Evaluation and Datasets Used by the Various Researchers A number of datasets are available for the CBIR, it includes the Oxford building, INRIA holidays which are used for small scale datasets, and there is also the Kentucky
Survey of Color Feature Extraction Schemes …
721
Fig. 1 Working of CBIR system
dataset which is easy to use. Tecvid dataset is used for the image and video retrievals and used for large-scale applications. In the literature review, details have been added regarding various datasets used by the corresponding researchers. It is important to note that the performance evaluation of the images is carried out using precision and recall-based methods, which is the ratio of the number of images/figures retrieved to the total number of images taken in the data set. Further, one more important parameter is there called an error count or rate in which the image performance is evaluated using the number of correct and incorrect images retrieved to the number of total images supplied in the form of data set. Retrieval efficiency is another parameter of evaluation that indicates the ratio of the number of correctly retrieved images to the number of correct images present in the dataset. The graphical representation of the output is also used in the evaluation of the performance of various datasets.
3 Feature Recovery Every picture has two types of content such as visual and semantic content. Further, visual content is of two types, general or domain-specific. The low-level characteristics like structure, size, color and texture, etc. are included under general visual content. The domain-specific visual content may contain face recognition, handwriting and fingerprints and is application dependent. It also involves domain knowledge. Based on visual content, the semantic content is achieved by textural notation
722
K. H. Patil and M. N. Bhat
or interference procedure [1]. Generally, low-level characteristics are more popular in describing pictures using picture signatures or descriptors. Low-level characteristics present various ways to describe any picture despite various picture classes and types [2]. Some of the descriptions of the term related to color feature recovery are given below: (1)
(2)
(3)
(4)
(5)
Color: For picture recovery, color is one of the most commonly used visual content. The picture comparison is based on matching pictures by their color distribution. Different techniques like color coherence vector, color histogram, color correlogram, color moments and invariant color characteristics are used in past decades. Color space must be obtained first to use any of these techniques. Color Space: Color space is a specific organization of colors. Each pixel of a picture can be represented as a point in 3D color space. Generally, the following color spaces are used for picture recovery such as RGB, HSV, CIE L*u*v*, CIE L*a*b, Munsell, etc. RGB color space is most commonly used for picture display. It consists of red, green and blue color components known as additive primaries because in RGB space, color is obtained by combining three components. Color Histogram: The color histogram represents the colors distribution in a picture. Color histogram can be applied for any type of color space like RGB or HSV, etc. It is a smooth function described over the color space that approximates the pixel counts. Color Moments: Color moments are used to discriminate pictures based on their color distribution same as central moments which describe a probability distribution. Once these moments are calculated, they provide a measurement for the similarity between two pictures. Generally, first three color moments are used as characteristics in the picture recovery process, mean, standard deviation and skewness. Color Correlogram: This feature is defined by Huang et al. [3] in 1997 for picture indexing and comparison. It is a three-dimensional table indexed by color and distance between pixels. It differentiates how the spatial correlation of color changes with the distance of a picture. Color correlogram of a table indexed by color pairs. It is easy to calculate and quite small in size.
4 Literature Review There are two important issues that need to address in the content-based image retrieval system; first, the image dataset is quite large in number, and the second important problem is image retrieval depends on the number of uncertainties and subjects, i.e., image types. So, image processing through CBIR needs to add more features to undertake the varieties of images. Image retrieval depends on the accuracy of the extraction process. Hence, through this review article, current research trends are reviewed which involves low to high-level feature extraction through various
Survey of Color Feature Extraction Schemes …
723
techniques. The following literature review describes different techniques used for color feature recovery. Duanmu [4] proposed a recovery method for color pictures using color moment invariants in 2010. Color characteristics are computed from the individual picture and it uses small picture descriptors. The proposed method was implemented on object image library, COIL-100 of Columbia University. Average precision and recall rates are better than the state-of-the-art image recovery techniques. Using the picture color distribution, Chen et al. [5] in 2010 proposed a method for the color feature recovery. To extract the color characteristics, fixed cardinality (FC) and variable cardinality (VC) recovery methods are proposed which utilizes binary quaternion-momentpreserving (BQMP) thresholding technique. They have also devised comparing histograms by clustering (CHIC) along with earth mover’s distance (EMD) measures. The database of a total of 1200 pictures used for the experimentation, collected from image albums of Corel Corporation and DataCraft Corporation, respectively. The proposed color feature recovery scheme shows an improvement in the recovery results by a factor of 25% over traditional methods. Kekre et al. [6], in 2010, proposed a new algorithm using fast Fourier transform (FFT) of each R, G and B component of an image separately. Each component of an image was divided into 12 mean values, and 6 upper half sectors are used to produce the feature vectors. Euclidean distances between the database pictures and feature vectors of a query picture are determined. To evaluate the effectiveness of the proposed algorithm, a database of 249 pictures of 10 different classes is used. Rao et al. [7] in 2011 proposed a texture and dominant color-based picture recovery system. In this method, three characteristics of a picture such as dynamic dominant color, motif co-occurrence matrix (MCM) and the difference between pixels of scan pattern (DBPSP) are considered to retrieve a color picture. Using fast color quantization algorithm, the picture is separated into eight coarse partitions, and eight dominant colors are derived from the eight partitions. MCM and DBPSP are used to represent the texture distribution of a picture, and DDC represents color characteristics of the pixels in a picture. They experimented on Wang dataset. In 2011, Yue et al. [8] proposed a method by fusion of color and texture attributes and by constructing weights of feature vectors. Initially, HSV color space is quantified to derive feature vectors. Then, based on a co-occurrence matrix, color and texture attributes are extracted. Afterward, characteristics of global, local color histogram and texture attributes are compared and analyzed for recovery. Picture recovery accuracy is improved as shown by experimentation. They used a car picture database from web sources. Afifi et al. [9] in 2012 proposed a new CBIR system using Ranklet transform (RT) and color feature to represent an image. For image enhancement operations and image invariant to rotation, Ranklet transform is used as a preprocessing step. K-means clustering algorithm used to cluster the picture respective to their feature to enhance the recovery time. They have used Wang database for experimentation. In 2013, Subrahmanyam et al. [10] proposed a new method, which integrates modified color motif co-occurrence matrix (MCMCM) and the difference between the pixels of a scan pattern (DBPSP) with equal weights for an effective CBIR system. The
724
K. H. Patil and M. N. Bhat
effectiveness of the proposed recovery method is verified on two different benchmark databases, such as MIT VisTex and Corel-1000. Walia et al. [11], in 2014, proposed a novel fusion framework for the recovery of color pictures. The proposed framework fuses both modified color difference histogram (CDH) which extracts the color and texture characteristics of an image and angular radial transform (ART) which extracts the shape characteristics globally or locally of an image. A variety of databases are considered to test the effectiveness of the applied fusion framework. In 2015, Guo et al.[12] proposed an image recovery system by using ordered dither block truncation coding (ODBTC) to produce a picture content descriptor. These picture characteristics are generated from the ODBTC encoded data streams, color co-occurrence feature (CCF), and second is a bit-pattern feature (BPF) by involving the visual codebook. They have used 13 different databases for testing. Shama et al. [13] in 2015 proposed an efficient indexing and recovery technique for plant pictures. The 2D-OTSU threshold-based segmentation technique is used to separate the object from the background. Modified color co-occurrence matrix (MCCM) and Gabor filter are used for feature recovery. Then, Euclidean distance measure was used to find the similarity between pictures. R* tree structure is used for better indexing and fast searching of pictures. They have created a database of 300 plant pictures. The experimental results proved better recovery accuracy with reduced time of recovery. In 2016, Li et al. [14] implemented an improved algorithm that combines the fuzzy color histogram and block color histogram. It also considers local and global color information for color feature recovery and helps in decreasing the color feature dimension. Experiments are conducted using Corel 1000 dataset to check the effectiveness of the improved algorithm. In 2016, Somnugpong et al. [15] proposed a novel method by combining color correlogram and edge direction histogram (EDH) to obtain the robustness to spatial changes in an image. Spatial color correlation information is processed by color correlogram. In the case of the same picture but different colors, EDH provides the geometrical information of the same picture having different color. Experimental result on the Wang dataset proved that using combination of spatial correlation of picture, color and texture semantic works better than the combination of the traditional histogram and its texture. To improve picture recovery precision in 2017, Fadaei et al. [16] developed a new CBIR scheme which combines the optimization of color and texture characteristics. Dominant color descriptor (DCD) characteristics are extracted using a uniform partitioning scheme applied in HSV color space. To overcome the problem of noise and noise translation, various wavelet and curvelet characteristics are characterized as texture characteristics. The color and texture characteristics of a picture optimized by particle swarm optimization (PSO) algorithm which is applied on the Corel dataset. Khwildi et al. [17] in 2018 proposed a set of global descriptors for modeling high dynamic range (HDR) pictures and displaying the results in standard dynamic range (SDR) devices. It uses a vector of characteristics which is a combination of two color attributes, color histogram based on HSV space and color moments. The measure of Manhattan distance was used to obtain the dissimilarity between pictures. The proposed method accurately retrieved similar HDR pictures and tested on the LDR version of the same
Survey of Color Feature Extraction Schemes …
725
dataset. In 2019, Unar et al. [18] addressed the problem of combining both low-level visual characteristics and color information. The feature vectors are obtained for the low-level visual characteristics and color information. The similarity is computed for obtained feature vectors and combined. The top-rank pictures are retrieved using a distance metric. The experimentation was performed on Corel 1 K and Oxford flowers datasets. Ahmed et al. [19] in 2020 proposed a novel method to retrieve the pictures by representing the shapes, texture, objects and spatial color information. L2 normalization applied on RGB channel used for the spatial arrangement. For efficient recovery and ranking, combined feature vectors are transformed to bag of words (BoW). The efficiency of the proposed approach is verified on nine standard picture datasets such as Caltech-101, 17-Flowers, Columbia object picture library(COIL), Corel1000, PictureNet, Corel-10000, Caltech-256, tropical fruits and Amsterdam Library of Textures (ALOT). In 2020, Sathiamoorthy et al. [20] proposed a new variant of multi-trend structure descriptor (MTSD). It encodes a feature matrix of color, edge orientation and texture quantized values versus orientation of equal, small and large trends. Using discrete Haar wavelet transform, the picture is decomposed into the fine level to reduce the time cost and to preserve the accuracy of the proposed variant of MTSD. To find the similarity, the Euclidean distance measure is used. Considerable improvement is achieved as compared to the state-of-the-art descriptors on seven different datasets. Zahid et al. [21] reported the new technique to overcome the issue of the semantic gap. The author has used a combination of speedup techniques with a histogram analyzer. Alshehri [22] used aerial satellite images for getting a detailed analysis of the vegetation present in the Riyadh region. The author used the hybrid technique of fuzzy and artificial neural network for retrieval of images. The author has compared the analysis with the existing techniques. It is found that the proposed technique shows better efficiency compared to the existing available techniques. Singh [23] et al. proposed a wavelet transform-based method for the image processing of satellitebased images. A summary of the literature review described so far is given in Table 1.
5 Conclusion and Future Directions This article reviewed the basic information about the content-based picture recovery model. Various techniques for color feature recovery are summarized amongst color, texture and shape from the last 10 years so far with the help of the literature survey. Recovery efficiency can be further improved by combining color feature with shape, texture or any other low-level feature of a picture. Hence, combining local and global characteristics can be the topic for future research to improve the performance of recovery systems. Furthermore, machine learning and deep learning approaches can be used for feature extraction to get improved results and fast processing.
Authors
Duanmu [4]
Chen et al. [5]
Kekre et al. [6]
Babu Rao et al. [7]
Sr. no.
1
2
3
4
Proposed FC and VC methods with BQMP thresholding technique using picture color distribution. Devised new distance measure, comparing histograms by comparing (CHIC)
Small picture descriptors
Proposed method
Dynamic dominant color, motif co-occurrence matrix, DBPSP picture characteristics are used
Color moment invariant cannot deal with occlusion very well
Demerits/future directions
Average precision and – overall average performance of precision and recall of each class ha cross-over point at 50%
Recovery precision – rate is enhanced by 25% than traditional methods. CHIC distance method reduces execution time than EMD measure
Better average precision and recall rates
Outcome/merits
Using fast color It outperforms Hung’s – quantization algorithm, and Jhanwar’s an picture is divided methods into eight partitions and from which eight dominant color s are obtained
Fast Fourier transform, FFT’s of each R, G, B Euclidian distance components are divided into 12 sectors and 6 upper half sectors are used to produce feature vectors
BQMP thresholding technique, fixed cardinality, variable cardinality, earth mover’s distance
Color moment invariant
Technique
Table 1 Summary of the literature review
(continued)
Wang dataset
249 picture of 10 different classes
Corel Corp DataCraftcorp
COIL-100
Dataset used
726 K. H. Patil and M. N. Bhat
Authors
Yue et al. [8]
Afifi et al. [9]
Subrahmanyam et al. [10]
Sr. no.
5
6
7
Table 1 (continued)
Modified color motif co-occurrence matrix (MCMCM), difference between pixels of scan pattern (DBPSP)
Ranklet transform, K-means clustering algorithm
HSV color space, co-occurrence matrix and Euclidian distance
Technique
Outcome/merits
Nine color patterns are generated from the separated R, G, B and planes of a color picture. MCMCM and DBPSP characteristics are integrated with equal weights
Ranklet transform is used for each picture layer, i.e., R, G, B and color moments are calculated using K-means clustering algorithm Improvement in average precision, recovery rate and recall on DB1 and DB2 was observed when compared with traditional methods
Maximum average of 0.93 at recall value of 0.05 is achieved, and it performs better than other 4 systems used for comparison
Based on co-occurrence Recovery accuracy is matrix, color and improved texture characteristics are extracted to obtain feature vectors. HSV color space is quantified
Proposed method
Dataset used
–
(continued)
MIT VisTex(DB1), Corel-1000(DB 2)
Combination of shape Wang dataset or texture feature along with color can give good results
Other low-level Car pictures database features like shape, from web sources used spatial location can be for experimentation incorporated
Demerits/future directions
Survey of Color Feature Extraction Schemes … 727
Authors
Walia et al. [11]
Guo et al. [12]
Shama et al. [13]
Sr. no.
8
9
10
Table 1 (continued)
2D-OTSU threshold-based segmentation, modified color co-occurrence matrix (MCCM), Gabor filter, Euclidean distance R*tree indexing
Ordered dither block truncation coding (ODBTC)
Modified color difference histogram (CDH), angular radiant transform (ART)
Technique
It provides best average precision rate compared to earlier schemes
It improved average recovery accuracy by apprx. 16% and 14% over CDH and ART resp.
Outcome/merits
Can be applied to video retrieval
–
Demerits/future directions
13 databases are used such as Corel, Vis tex-640, UKBench, etc.
Wang;s picture database, Olivia and Torralba (OT) scene database, VisTex database
Dataset used
(continued)
To obtain object from Improved recovery Larger datasets can be 300 plant picture background, 2D-OTSU accuracy with reduced used for further databases threshold-based time of recovery experimentation segmentation is used. MCCM and Gabor filters are used for feature recovery
Color co-occurrence feature and bit patterns characteristics of picture are generated from ODBTC encoded data streams
Modified CDH algorithm and ART methods are used to extract the color, shape and texture characteristics of a color picture
Proposed method
728 K. H. Patil and M. N. Bhat
Authors
Li et al. [14]
Somnugpong et al. [15]
Fadaei et al. [16]
Sr. no.
11
12
13
Table 1 (continued)
Dominant color descriptor (DCD), particle swarm optimization (PSO) algorithm
Color correlogram, edge detection histogram (EDH), Euclidean distance
Fuzzy color histogram (FCH), block color histogram (BCH)
Technique
It extracts the dominant color descriptor characteristics and uniform partitioning scheme is used in HSV color space. Texture and color characteristics are combined by PSO algorithm
Spatial color correlation information is processed by color correlogram and EDH provides the geometrical information
It considers the local and global color information by combining FCH and BCH and decreasing the color feature dimension
Proposed method
Higher average precision of 76.5% achieved when compared to state-of-the-art techniques
It works better than the combination of traditional histogram and its texture
Best recovery accuracy and low feature dimension
Outcome/merits
Segmentation is applied to whole image for feature extraction instead of main regions
Proposed technique can be practically implemented
–
Demerits/future directions
(continued)
Corel database
Wang dataset
Corel 1000
Dataset used
Survey of Color Feature Extraction Schemes … 729
Authors
Khwildi et al. [17]
Unar et al. [18]
Ahmed et al. [19]
Sathiamoorthy et al. [20]
Sr. no.
14
15
16
17
Table 1 (continued)
Multi-trend structure descriptor (MTSD), Discrete Haar wavelet transform, Euclidean distance measure
Gaussian smoothing, Hessian blob detector bag of words (BoW)
K-means clustering algorithm, Euclidean distance
Color histogram, color moments, Manhattan distance
Technique
Outcome/merits
MTSD encodes orientation details of local level structures, and Haar wavelet transform is used to decompose the picture into fine level
Color characteristics are extracted using L2 norm and high-variance coefficients are transformed to BoW for effective recovery and ranking
Color information is extracted and segmented with nonlinear L*a*b color space to get a feature vector
Considerable improvement is achieved with the help of precision and recall
Higher precision in average, mean and average recovery, and also for recall rated can be observed when tested with other picture groups of standard datasets
Improved efficiency and 85% accuracy over other methods
For modeling of HDR Better performance picture, a set of global regarding accuracy descriptors are used by and efficiency using HSV space-based color histogram and color moment attributes
Proposed method
Proposed technique can be applied to medical images using machine learning approach
Convolutional neural network can be used for improvement
Machine learning technique can be applied for better results
Other descriptors can be combined on large HDR image dataset
Demerits/future directions
7 databases: Corel 1 K, 5 K, 10 K, Caltech 101, LIDC-IDRI
9 datasets: Caltech-101, Corel-1000, Columbia object picture library, 17 Flowers, Picturenet, Corel-10000, Caltech-256, ALOT
Corel 1000, Oxford Flowers
100 picture database collected from website
Dataset used
730 K. H. Patil and M. N. Bhat
Survey of Color Feature Extraction Schemes …
731
References 1. L. Fuhui, Z. Hongjiang, F. David, Fundamentals of content based picture recovery, in Multimedia Information Recovery and Management (Springer, Berlin, 2003), pp 1–26. ISSN: 1860-4862 2. S. Aghav-Palwe, D. Mishra, Color picture recovery using compacted feature vector with mean count tree. Procedia Comput. Sci. 132, 1739–1746 (2018) 3. H. Jing, S. Ravi Kumar, M. Mhetre, Z. Wei-Jing, Z. Ramin, Picture ındexing using color correlograms, in IEEE International Conference on Computer Vision and Pattern Recognition (1997), pp. 762–768 4. D. Xiaoyin, D, Picture recovery using color moment invariant, in Proceedings of Seventh International Conference on Information Technology: New Generations (ITNG) (IEEE, 2010), pp. 200–203 5. C. Wei-Ta, L. Wei-Chuan, Ming-Syan, Adaptive color feature recovery based on picture color distribution. IEEE Trans. Picture Process. 19(8) (2010) 6. H. Kekre, D. Mishra, CBIR using upper six FFT sectors of Color pictures for feature vector generation. Int. J. Eng. Technol. 2(2) (2010) 7. M.B. Rao, B.P. Rao, A. Govardhan, CTDCIRS: content based picture recovery system based on dominant color and texture characterstics. Int. J. Comput. Appl. 18(6) (2011) 8. J. Yue, Z. Li, L. Liu, Z. Fu, Content-based picture recovery using color and texture fused characterstics. Math. Comput. Model. 54 (2011) 9. A.J. Afifi, W,M. Ashour, Picture recovery based on content using Color feature. ISRN Computer Graphi. 2012, Article ID 248285 (2012) 10. M. Subrahmanyam, Q.M. Jonathan, R.P. Maheshwari, R. Balasubramanian, Modified color motif co-occurrence matrix for picture indexing and recovery. Comput, Electr. Eng. 39(3) (2013) 11. E. Walia, A. Pal, Fusion framework for effective Color picture recovery. J. Visual Commun. Picture Represent. 25(6) (2014) 12. J.M. Guo, H. Prasetyo, Content based picture recovery using characterstics extracted from halftoning-based block truncation coding. IEEE Trans. Picture Process. 24(3) (2015) 13. P.S.Shama, K. Badrinath, T. Anand, An efficient ındexing approach for content based picture recovery. Int. J. Comput. Appl. 117(15) (2015) 14. M. Li, X. Jiang, An improved algorithm based on color feature recovery for picture recovery, in 8th IEEE International conference on Intelligent Human-Machine Systems and Cybernetics (2016), pp. 281–285 15. S Somnugpon, K. Khiewwan, Content-based picture recovery using a combination of color correlograms and edge direction, in 13th International Joint Conference on Computer Science and Software Engineering (IJCSSE) (2016) 16. S. Fadaei, R. Amirfattahi, A.M. Ahmadzadeh, New content based picture recovery system based on optimised integration of DCD, wavelet and curvelet characterstics. IET Picture Process. 11(2) (2017) 17. R. Khwildi, A.O. Zaid, Color based HDR picture recovery using HSV histogram and color moments, in IEEE/ACS 15th International Conference on Computer Systems and Applications (AICCSA) (2018) 18. S. Unar, X. Wang, C. Wang, M. Wang, New strategy for CBIR by combining low-level visual characterstics with a colour descriptor. IET Picture Process. 13(7) (2019) 19. K.T. Ahmed, H. Afzal, M.R. Mufti, A. Mehmood, G.S. Choi, Deep picture sensing and recovery using suppression, scale spacing and division, interpolation and spatial Color coordinates with bag of words for large and complex datasets. IEEE Access 8 (2020) 20. S. Sathiamoorthy, M. Natarajan, An efficient content based picture recovery using enhanced multi-trend structure descriptor. Appl. Sci. 2, 217 (2020) 21. Z. Mehmood, F. Abbas, M. Mahmood, M.A, Javid, A. Rehman, T. Nawaz, T. Content-based ımage retrieval based on visual words fusion versus features fusion of local and global features. Arab. J. Sci. Eng. (2018)
732
K. H. Patil and M. N. Bhat
22. M. Alshehri, Content-based ımage retrieval method using neural network-based prediction technique. Arab. J. Sci. Eng. (2020) 23. D. Singh, D. Garg, P.H. Singh, Efficient land satellite image fusion using fuzzy and stationary discrete wavelet transform. Imaging Sci. J. 65(2) (2017)
A New Method of Interval Type-2 Fuzzy-Based CNN for Image Classification P. Murugeswari and S. Vijayalakshmi
Abstract Last two decades, neural networks and fuzzy logic have been successfully implemented in intelligent systems. The fuzzy neural network system framework infers the union of fuzzy logic and neural system framework thoughts, which consolidates the advantages of fuzzy logic and neural network system framework. This FNN is applied in many scientific and engineering areas. Wherever there is an uncertainty associated with data fuzzy logic place a vital rule, and the fuzzy set can represent and handle uncertain information effectively. The main objective of the FNN system is to achieve a high level of accuracy by including the fuzzy logic in either neural network structure, activation function, or learning algorithms. In computer vision and intelligent systems such as convolutional neural network has more popular architectures, and their performance is excellent in many applications. In this article, fuzzy-based CNN image classification methods are analyzed, and also interval type-2 fuzzy-based CNN is proposed. From the experiment, it is identified that the proposed method performance is well. Keywords CNN · FCNN · Fuzzy logic · Interval type-2 fuzzy logic · Feature extraction · Computer vision · Image classification
1 Introduction In computer vision, image classification is that the undertaking of characterizing a given image into one among the pre-defined characterized classes. Customary image classification consists of feature extraction and classification modules. Feature extraction includes extricating a superior degree of pixel information from the raw pixel which will catch the greatness among the classifications. Normally, this process P. Murugeswari (B) Department of Computer Science and Engineering, Karpagam College of Engineering, Coimbatore, Tamil Nadu, India S. Vijayalakshmi Department of Computer Applications, NMS S. Vellaichamy Nadar College, Madurai, Tamil Nadu, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_57
733
734
P. Murugeswari and S. Vijayalakshmi
is done using an unsupervised manner, wherein the classes of the picture do not have anything to do with data extricated from pixels. A portion of the typical and generally utilized feature extractions is GIST, HOG, SIFT, LBP, and so on. After the feature extraction, a classification module is prepared with the image and their related names. A couple of tests of this module are SVM, logistic regression, random forest, choice trees, and so on. The different types of neural network architecture, such as recurrent neural networks (RNN), long short-term memory (LSTM), artificial neural networks (ANN) and convolutional neural network (CNN), etc., were analyzed. CNNs are the most popular architecture and suitable for image database. It works excellently on computer vision tasks like image classification [1, 2], object detection [3], image recognition [4], etc. In CNN, there is no different element extractor, it is implicit, which incorporates, include extraction and characterization modules coordinated framework, and it figures out how to remove, by separating representations from the pictures and order them dependent on regulated information. CNN is utilized in various assignments that have an extraordinary presentation in various applications. CNN has been presenting an employable class of models for better information on substance present in the image along these lines producing better image acknowledgment, segmentation, identification, and retrieval. CNN structures are productively and successfully utilized in many patterns and image recognition applications [5], for instance, motion acknowledgment [4, 6], face acknowledgment [7, 8], object characterization [9, 10], and creating scene descriptions [11]. Zadeh [12] presented the idea of fuzzy logic (type-1 fuzzy) for tackling the control framework related issues. Later, analysts have contributed many fascinating applications with regards to the field of computer vision. Thoughtfully, type-2 fuzzy set (T2FS) was presented by Zadeh in 1975[13], and further, it is created by Jerry M. Mendel. In type-1 fuzzy set (T1FS), the degree of participation is determined by a crisp number having a place with the interval [0–1], in T2FS, the degree of participation is itself fuzzy, and it is indicated by a secondary membership function. If the secondary membership function values are at its limit of 1 at each point which is called as interval type-2 fuzzy set (IT2FS) [13–15]. The T2FS incorporates a third measurement and impression of uncertainty as appeared in Fig. 1, which gives the additional level of opportunity to deal with uncertainty. This additional level of fuzziness helps an increasingly capable method of taking care of uncertainty. Figure 2 shows the secondary membership functions (MFs) (third element) of the T1FS (Fig. 2a), the IT2FS (Fig. 2b), and the general T2FS (Fig. 2c) as initiated by a similar information p as appeared in Fig. 1. For the most part, FCM type-1 has become the most notable calculation in group investigation. Numerous analysts have demonstrated that there are imperatives in the limit of T1 FSs to show and breaking point the effect of uncertainties since its interest grades are crisp. The T2FS is addressed by membership functions (MFs) that are themselves fuzzy. The IT2FS [16], an exceptional instance of T2FS, are at present most generally utilized taking into their decreased computational expense. An IT2FS is restricted with two T1FSs above and underneath, which are called upper MF (UMF)
A New Method of Interval Type-2 Fuzzy-Based …
735
Fig. 1 A case of the three kinds of fuzzy sets. A similar information p is applied to each fuzzy set. a T1FS, b IT2FS, and c T2FS
Fig. 2 A perspective on the secondary membership function (three dimensions) initiated by an information p for a T1FS, b IT2FS, and c T2FS
and lower MF (LMF) independently, and domain among UMF and LMF is footprint of uncertainty (FOU). T2FS exhibits to show various uncertainties yet it increase the computational unpredictability due to its extra component of optional evaluations of every essential enrollment. Model applications are type-2 fuzzy clustering [17], Gaussian noise filter, classification of coded video streams, medical applications, and color picture division.
2 Literature Survey CNN is a sort of neural networks, which has indicated commendable execution on a couple of difficulties related to computer vision and image processing. A part of
736
P. Murugeswari and S. Vijayalakshmi
the invigorating application areas of CNN fuse image classification and segmentation [18], object detection [19], video processing [20], natural language processing [21, 22], and speech recognition [23, 24]. The learning limit of significant CNN is basically a result of the usage of various component extractions composes that can normally take in exposé from the data. The availability of a mass of data and improvement in the gear development has accelerated the investigation in CNNs, and starting late attractive profound CNN models have been represented. A few moving plans to get advancement in CNNs have been investigated [7], for example, the utilization of various activation and loss functions, parameter streamlining, regularization, and compositional advancements. Mendel [25] stated that to use a T1FS to model a word is scientifically incorrect because a word is uncertain, whereas a type-1 FS is certain. Therefore, made an in-depth research in type-2 fuzzy and contributed many papers [26–28] in type-2 fuzzy logic. Based on that, many researchers have been contributed many algorithms for their applications. For example, the classification of coded video streams, diagnosis of diseases, pre-processing radiographic images, medical image applications, transport scheduling, forecasting of time series, learning linguistic membership grades, inference engine design, control of mobile robots, and so on. The computational complexity is high in type-2 fuzzy. Therefore, the type-2 fuzzy set is simplified into IT2 fuzzy which computational complexity can be significantly reduced in appropriate applications. Recently, fuzzy logic and neural networks are widely applied to solve real-world problems. Fuzzy logic is a lot of mathematical standards for information representation dependent on degrees of participation as opposed to binary logic. It is an incredible asset to handle imprecision and uncertainty, and it was introduced to pick up robustness and low-cost resolution for real-world problems [29]. For the most part, type-1 fuzzy logic frameworks have been executed in numerous systems to a wider scale some of which include approximation and forecasting systems, control systems, databases, healthcare clinical diagnosis, and so on. Researchers have been combined neural network and fuzzy logic and implemented successfully in intelligent systems. The fuzzy neural network (FNN) system framework implies together of fuzzy logic and neural network system ideas, which incorporates the benefits of fuzzy logic and neural network system. This FNN is applied in many scientific and engineering areas text sentiment analysis [30], object classification with small training database [19], emotion features extraction from text [31], emotion understanding in movie [32], real-world objects, and image classification[20, 33], to recognize Marathi handwritten numerals[34, 35], to predict the traffic flow [36], electric load forecasting [37], and to recognize handwritten digits [38]. Keller et al. [39] proposed a hierarchical deep neural network fuzzy system which obtains information from both fuzzy and neural representations. Price et al. [40] proposed introducing the fuzzy layers for deep learning, and fuzzy methodologies taken to deep learning have experienced the use of different combination procedures at the choice level to total yields from best in class pre-prepared models, e.g., AlexNet, VGG16, GoogLeNet, Inception-v3, ResNet-18, and so on.
A New Method of Interval Type-2 Fuzzy-Based …
737
Fig. 3 Structure of a convolutional neural network (CNN)
Generally, CNN architecture consists of two phases; there are feature extraction and classification. The FCNN is the combination of CNN and fuzzy logic; therefore, the fuzzy logic may include either feature extraction phase or classification phase. Dependence on the application, the researchers have proposed various FCNN architectures including fuzzy logic in the feature extraction phase or classification phase. Here, the two FCNN architectures have been compared for image classification. In these two architectures, fuzzy logic is included in the classification phase. Hsu et al. [19] have approached integrated a convolutional neural network with a fuzzy neural network (FCNN Model 1), where the FNN summarizes the feature information from every fuzzy map. Korshunova [9] (FCNN Model 2) have proposed CFNN architecture which includes the fuzzy layer, which situated in between the convolutional network and classifier (Fig. 3).
3 Proposed Method The new inter type-2 fuzzy CNN architecture integrates the features from CNN and the FNN. A new architecture integrates the interval type-2 fuzzy rectifying unit (IT2FRU) [41] activation function in convolution for features extraction in CNN and interval type-2 fuzzy-based classification in the fuzzy layer. This method combines the advantages of both network architectures and interval type-2 fuzzy logic. IT2FCFNN architecture has four types of layers: (i) convolutional layer with IT2FRU; (ii) pooling layer; (iii) fuzzy layer; and (iv) a fuzzy classifier. The convolutional neural framework takes a data image and performs a course of action of convolutional and pooling layers. The fuzzy layer performs grouping using the interval type-2 fuzzy clustering algorithm. The yields of the fuzzy layer neurons speak to the estimations of the participation capacities for the fuzzy clustering of input data. The information point’s cluster is chosen relies on their participation grade. These characteristics go to the promise of a classifier. Its yield is the full
738
P. Murugeswari and S. Vijayalakshmi Classification using Fuzzy layer and IT2FCM
Feature Extraction –CNN with IT2FRU Input Image
Convolution using IT2FRU
Pooling
Fuzzy Layer - Interval Type-2 fuzzy membership function
Output
Fuzzy Classification using IT2FCM
Fig. 4 Outline of the proposed method IT2FCNN
IT2FCNN and is the class scores for the picture. Leave C alone the number of neurons of the fuzzy layer (the number of clusters). The neurons of the fuzzy layer commencement limits are IT2FRU showing the interest of the information vector x to all of the L groups (Fig. 4). IT2FRU employs the following equalities; Z = 0 to guarantee that σ = 0 ⇒ ϕo = 0. Additionally, the height of the LMFs is employed as m2 = α, m1 = m3 = 1 − α as suggested in [26]. The resulting IT2-FM (ϕo (σ )) for σ ∈ [0, 1] can be formulated ϕo (σ ) = Pσ k(σ )
(1)
1 −1 + α 1 + k(σ ) = 2 α + σ − ασ −1 + ασ
(2)
where k(σ ) is defined as
Similarly, for the input interval σ [−1, 0], the IT2FM can be derived as ϕo (σ ) = N σ k(−σ )
(3)
The activation unit can be formulated by arranging Eqs. (1) and (3) as following: f (σ ) =
Pσ k(σ ), if σ > 0 N σ k(−σ ), if σ ≤ 0
(4)
the parameter P controls the incline of the capacity in the positive quadrant, while the parameter N controls the slant of the capacity in the negative quadrant. The resulting output of the IT2-FRU could be a linear or nonlinear activation depending on the selection of the parameters. IT2FRU has three learnable parameters P, N, and α. The vector = x1 , x2 . . . x j . . . xn is dealt with to the commitment of the framework, and the fuzzy layer formed the degrees of having a spot x with a vector involving the specific cluster territories: vi v2 . . . v j . The parts (uj (xi ), u (xi )) are determined _j
A New Method of Interval Type-2 Fuzzy-Based …
739
utilizing Eq. (5) to fulfill the standardization condition utilizing Eq. (6) for each preparation test vector x (k) , k = 1, . . . K K, where K is the number of vectors in preparing ready to set. The yields of neurons of the fuzzy layer are used as commitments of the classifier. n k x kj μ˜ i x = f
(5)
j=1 L
μ˜ i x k = 1
(6)
i=1
(a)
The interval type-2 fuzzy membership becomes ⎧ C ((d /d )+α(d ji /dki )δ)2/(m1 −1) , if ⎪ ⎨ Ck=1 ji ki 2/(m 1 −1) d /d +α d /d δ u j (xi ) = Ck=1 (( ji ki ) ( ji ki ) )2/(m2 −1) ⎪ ⎩ k=1 ((d ji /dki )+α(d ji /dki )δ)2/ m −1 , ( 2 ) C k=1 ((d ji /dki )+α (d ji /dki )δ ) ⎧ C ((d /d )+α(d ji /dki )δ)2/(m1 −1) , if ⎪ ⎨ Ck=1 ji ki 2/(m 1 −1) d /d +α d /d δ u j (xi ) = Ck=1 (( ji ki ) ( ji ki ) )2/(m2 −1) ⎪ ⎩ k=1 ((d ji /dki )+α(d ji /dki )δ)2/ m −1 , ( 2 ) C k=1 ((d ji /dki )+α (d ji /dki )δ )
(b)
k=1
1
(d ji /dki )
18–40 and 41–65 or greater than that) where the current status is deceased, hospitalized, and recovered with spreading mode linkage. This study analyzes gender independently for the various current situations. Based on the age, the current status, age group, and gender relation are dependent; however, linkage problems are extremely significant over spreading ratio. Keywords Epidemiological data · SARS-CoV-2 · Lethality · Galton · GP-ARIMA
1 Introduction SARS-COVID-19 is contamination from the Nido virus that comprises of Roiniviridae, Coronaviridae, and Artieviridae which is responsible for respiratory sickness over a human community that origins from cold to more serious disease like Middle East Respiratory Syndrome (MERS) and Severe Acute Respiratory Syndrome (SARS) [1]. The common symptoms of COVID-19 are fever, cold, dry cough, nasal blocking, tiredness and pain, tender throat, and running nose. When the person gets infected and does not get any symptoms and feel unwell. The people of all age groups with medical history like blood pressure, cardiovascular disease, diabetes K. M. Baalamurugan (B) · T. Yaqub · A. Shukla · Akshita School of Computing Science and Engineering, Galgotias University, Uttar Pradesh, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_59
771
772
K. M. Baalamurugan et al.
are prone to infection and found to have fever, cough, breathing issues needs to seek immediate medical attention. It is a transmissible disease and passes over droplets from the mouth and nose when infected people respire or cough. The individuals need to maintain a distance of 1 m from the affected person. Various investigations related to COVID-19 is spread through transmission from the air. The same number of individuals just experienced gentle side effects so it is a high likelihood to get COVID-19 from an individual who doesn’t feel sick. Assurance from and anticipation of COVID-19 spreading is limited and simple to receive safety measures [2] in the day by day propensities which incorporate altogether cleaning hands with hand rub liquor or washing them with water and cleanser, try not to contact nose, eyes and mouth as hands contacts surfaces and hands are transporter for COVID-19 and infected over the body, remain at home on the off chance that you feel unwell and in particular abstain from going however much as could reasonably be expected. Follow National and neighborhood specialists just as they have forwardthinking data about the circumstance. India revealed the first Covid case in Kerala accepted back from Wuhan (the focal point of Covid) and till then the number of cases has been expanding dramatically [3]. Recently, there is no immunization or medication accessible especially for COVID-19 treatment under scrutiny. This work breaks down the COVID-19 pattern dependent on standard utilizing Exploratory Data Analysis.GP-ARIMA is the best approach to investigate information by extricating valuable data. It is a brilliant advance in any sort of examination.
2 Data Analysis In data analysis, a COVID-19 data report is considered from Tamil Nadu, India. From this analysis, the dataset is downloaded from Kaggle. Here, a list of COVID-19 cases of all states is provided, however, most of the attribute values are missing. From this analysis, three attributes are concentrated with age, gender, and present state of the patient (deceased, hospitalized, and recovered). This dataset comprises of various missing values, and analysis is applied directly to the dataset. However, it does not provide appropriate outcomes. Henceforth, to carry out data pre-processing, initial validation is performed for missing values based on the state and validation. There are some missing values for certain time intervals. Based on these conditions, the Tamil Nadu region is considered. When the target data is chosen, the data is thoroughly analyzed for captivating the research problems.
An Investigation for Interpreting the Epidemiological …
773
2.1 Dataset Here, the Kaggle dataset is considered with a sum of 17 attributes in the dataset. The attributes are strings except for age, however the SPSS model cannot carry out any kind of analysis on string data type. Therefore, values of transmission type, gender, and status are substituted by nominal data list.
3 Background Preliminaries for Data Interpretation The COVID-19 outbreak stimulates data analysis over available datasets, scraped from different sources like “Ministry of Health and Family Welfare”, “COVID-19 India website”, “Kaggle”, “Worldometer” and “Wikipedia” by “Python” [4] and analysis the spread and COVID-19 trends over India. The comparison is done with the analysis from neighborhood countries all over the work. Here, the Kaggle dataset is used for the normalization process for selecting appropriate columns for filtering, column derivation, and data visualization over graphical format. This work considers MATLAB for web scrapping and pre-processing. Some libraries are included for processing and extraction of information from the available dataset. The graphs are generated for the finest visualization using MATLAB tools. The flow of the survey analysis is portrayed in Fig. 1. Fig. 1 Flow of data interpretation
Proportion of lethality
Incubation period
Mathematical Prototype: GP-ARIMA Forecasting model
Linkage Investigation & Arrangement on diseased cases
774
K. M. Baalamurugan et al.
Fig. 2 Depiction of Country-wise COVID-19 Infection Cases
3.1 Proportion of Lethality It is depicted as the ratio among the number of deaths to the number of diagnosed cases [5]. Therefore, an offset needs to fit with the Poisson regression model by facilitating over-dispersion as the logarithm of diagnosed cases. It is expressed in Eq. (1): Log(M(dt )) = µ0 + µ1 t + µ2 t 2 + µ3 t 3 + µ4 t 4 + log (gt )
(1)
Here, dt is the number of deaths/day, and gt is the number of diagnosed cases/day. The lethality proportion is computed with people of the same age group. It is not probable to compute the estimation proportion of lethality because of the cases under-reporting the official statistics [6]. However, monitoring and estimation of lethality proportion monitoring of current epidemic scenario. Country-wise COVID-19 Infection Cases are depicted in Fig. 2 based on the proportion of lethality.
3.2 Incubation Period The COVID-19 incubation period is estimated with an interval of SARS-CoV-2 and diagnosis data is evaluated for the successive approaches as in [7], who recently examines the incubation period of COVID-19 with a group of symptomatic patients. For all patients, the interval exposure to SARS-CoV 2 and the appearance of the data of the symptoms are evaluated. The assumption is made by the fact that incubation time is followed by viral respiratory tract infections which are expressed using Lognormal distribution as in Eq. (2):
An Investigation for Interpreting the Epidemiological …
Galton mu, omega2 = Galton(1.621, 0.418)
775
(2)
The group distribution is replicated for the diagnostic cases for approximating the exposure data to SARS-CoV-2 as in Eq. (3). p( j) =
14
Q(i) × Ci+ j
(3)
t=1
Here, ‘Q’ is the number of diagnosed cases/day; p( j) is the probability of infected cases/day; i = 1, 2, . . . , 14(2weeks) is the maximal time which is considered for disease progression; and Q(i) is the probability of symptoms identified/dy based on Galton probability distribution with the parameters explained above. For estimating the occurrence in 14 days, the information on diagnosed cases is not accessible for successive days. Here, a 4-degree of the polynomial model is used for diagnosed cases. These are shown in different colors for displaying the applications.
3.3 Forecasting Model The graph shown in Fig. 3 is attained with the analysis of dataset over a series of time from Genetic Programming based ARIMA (GP-ARIMA) model which helps
Fig. 3 Computation of COVID-19 Forecast: Tamil Nadu
776 Table 1 Computation of COVID-19 forecast: Tamil Nadu
K. M. Baalamurugan et al. Date
Anticipated Cases
04.10.2020
2500
07.10.2020
2580
10.10.2020
2510
13.10.2020
2600
16.10.2020
2590
19.10.2020
2595
in predicting the spread of COVID-19 [8] and the number of increased cases over India is provided with the equalized bar as in Fig. 3. The dataset holds the report till 30 October 2020. Therefore, the predicted graph is shown in November 2020. The X-axis depicts the three days interval and Y-axis gives the anticipated cases in 1000s predicted till November, 2020 approximately. However, 5000 cases are reported in Tamil Nadu. When the comparison is made with the report till October 2020, i.e. roughly 2500 cases are predicted with the interval of 3 days until November 2020. Simultaneously, the number of reports reaches 2600 in Tamil Nadu which is a nightmare for Tamil Nadu but not for other countries over the world. The prediction with GP-ARIMA is anticipated in Table 1. Algorithm: Forecast Depiction using GP-ARIMA Model D ← days between November 4 and November 19 C ← cases on November 4 Prototype ← GP based ARIMA (a,g,c,D) Predict ← Prototype (f) If Predict (1) > C true then coef ← C—Predict (1) else false coef ← C—Predict (1) return Predict
3.4 Linkage Investigation and Arrangement on Diseased Cases In light of the information accessible from the publicly supported data set at COVID19-India dab organization, patients organization and segment subtleties were made. Patient ID, nations of movement, and mass functions were determined as hubs and relationships among patients’ and voyaging functional history which is determined as organization edge. The organization comprises 551 hubs which are not detailed in this work; however, the degree of centrality of significant hubs was determined in Table 2 and shown in Fig. 4. Centrality hubs degree is determined as association’s quality
An Investigation for Interpreting the Epidemiological … Table 2 Degree Centrality based on nodal
777
Nodal depiction
Degree centrality
India: Religious event in Delhi
0.102775941
India: Mumbai
0.005948447
Italy
0.030072703
Gulf
0.011566642
United Kingdom
0.007270324
Saudi Arabia
0.004296100
Fig. 4 Computation of degree centrality
with specific hub separated by an absolute number of edges over the organization. The main 7 hubs dependent on hubs centralities were Religious function in Delhi, Gulf, Italy, UK, Mumbai and Saudi Arabia. It is discovered as a significant hotspot for the quick spread of Covid [9]. In view of the patient information database, a characterization model (GPARIMA) is created to observe that based on the segment feature, can a patient have the likelihood to perish. Different examples were dug for COVID-19 patients; anyway, it is not critical for positive cases whereas information of negative cases is not accessible. The acquisition of COVID-19 contaminated patient’s discovered that greater part cases rely on over 31–40 years in India, as appeared in Fig. 5. The patients are significantly over 60 year’s old gathering and the greater part of individuals contaminated went back to Italy.
778
K. M. Baalamurugan et al.
Fig. 5 Age wise dissemination
There is a characterization model for patients during the contamination from COVID-19. Choice Tree order model [10] was utilized to the group of COVID-19 contaminated patients before they face crucial conditions. The features discovered are very critical in gathering information’s like patient’s age, patient’s gender, and patient’s state. Similarly, 5000 patients are contaminated from COVID-19 infection up until this point, yet a segment subtlety of the apparent multitude of patients isn’t known. Table 3 depicts the state shrewd foreseen development rate till the mid of November 2020.
4 Conclusion The proposed work contributes an extensive analysis with COVID-19 outbreaking conditions over India. The infectious cases are rising gradually and the country needs aggressive control over the condition from India’s administrative unit. There are diverse factors that deal with this condition. It is related to the growing trends of infectious cases over India. The impact of mass events over a number of cases is provided based on linkage analysis and pattern excavation who suffers from the coronavirus. It leads to the uplifting condition of complete lockdown towards the country. The present study executes various approaches to offer an analysis and the outcomes need to bridge the gap between the existing drawbacks. This work is essential for the Indian Government and various states of India, scientists, researchers, healthcare sectors of India, and the administrative unit of India. This work is completely favorable for the administrative unit to deal with the factors related to COVID 19 control in the corresponding regions.
An Investigation for Interpreting the Epidemiological …
779
Table 3 State wise AGR till November 15th, 2020 States
Number of infected cases as of Anticipated growth rate (AGR) 30th October 2020 till 15th November 2020
Tamil Nadu
738
Delhi
576
278.95
Maharashtra
1135
238.81
Telangana
453
256.69
Rajasthan
363
202.50
Uttar Pradesh
361
208.55
Andhra Pradesh
348
213.51
Kerala
345
30.19
Madhya Pradesh
290
195.92
Gujarat
186
126.83
Karnataka
181
64.55
Haryana
167
288.37
Jammu and Kashmir
158
154.84
Punjab
106
140.91
West Bengal
99
266.67
Odisha
42
950.00
Bihar
38
65.22
Uttarakhand
33
371.43
Assam
28
75.00
Himachal Pradesh
27
575.00
Chandigarh
18
38.46
Ladakh
14
7.69
Andaman and Nicobar Islands 11
10.00
215.38
Chhattisgarh
10
-44.44
Goa
7
16.67
Puducherry
5
66.67
Jharkhand
4
300.00
Manipur
2
100.00
Arunachal Pradesh
1
Nil
Dadra and Nagar Haveli
1
Nil
Mizoram
1
Nil
Tripura
1
Nil
Total
5749
180.85
780
K. M. Baalamurugan et al.
References 1. M. Varshney, J.T. Parel, N. Raizada, S.K. Sarin, Initial psychological impact of COVID-19 and its correlates in Indian Community: an online (FEEL-COVID) survey. PLoS ONE 15(5), e0233874 (2020) 2. J. Qiu, B. Shen, M. Zhao, Z. Wang, B. Xie, Y. Xu, A nationwide survey of psychological distress among Chinese people in the COVID-19 epidemic: implications and policy recommendations. Gen. Psychiatry 33(2) (2020) 3. B.L. Zhong, W. Luo, H.M. Li, Q.Q. Zhang, X.G. Liu, W.T. Li, Y. Li, Knowledge, attitudes, and practices towards COVID-19 among Chinese residents during the rapid rise period of the COVID-19 outbreak: a quick online cross-sectional survey. Int. J. Biol. Sci. 16(10), 1745 (2020) 4. J.Z. Huang, M.F. Han, T.D. Luo, A.K. Ren, X.P. Zhou, Mental health survey of 230 medical staff in a tertiary infectious disease hospital for COVID-19. Zhonghua lao dong wei sheng zhi ye bing za zhi, Zhonghua laodong weisheng zhiyebing zazhi. Chin. J. Ind. Hyg. Occup. Dis. 38, E001-E001 (2020) 5. Press Release. https://pib.gov.in/PressReleasePage.aspx?PRID1539877 6. M. Battegay, R. Kuehl, S. Tschudin-Sutter, H.H. Hirsch, A.F. Widmer, R.A. Neher, 2019-novel coronavirus (2019-nCoV): estimating the case fatality rate—a word of caution. Swiss Med Wkly. 150, w20203 (2020) 7. S.A. Lauer, K.H. Grantz, Q. Bi, F.K. Jones, Q. Zheng, H.R. Meredith et al., The incubation period of coronavirus disease 2019 (COVID-19) from publicly reported confirmed cases: estimation and application. Ann. Intern. Med. 172(9), 577–582 (2020) 8. A.S.S. Rao, J.A. Vazquez, Identification of COVID-19 can be quicker through artificial intelligence framework using a mobile phone–based survey when cities and towns are under quarantine. Infect. Control Hosp. Epidemiol. 41(7), 826–830 (2020) 9. T. Singhal, A review of coronavirus disease-2019 (COVID-19). Indian J. Pediatr. 1–6 (2020) 10. WHO World Health Organization. https://www.who.int/emergencies/diseases/novel-corona virus-2019 on 31st March 2020 (2020)
Artificial Intelligence and Medical Decision Support in Advanced Healthcare System Anandakumar Haldorai
and Arulmurugan Ramu
Abstract The improvements in deep learning (DL) and machine learning (ML) centered on enhancing the availability of medical information have stimulated renewed interests in computerized clinical decision support systems (CDSSs). These systems have indicated significant capability to enhance medical provisions, patient privacy, and service affordability. Nonetheless, the usage of the systems does not come without problems, since faulty and inadequate CDSS might deteriorate the quality of medical provisions and put the patients at potential risks. Moreover, CDSSs adoption might fail due to projected patients ignoring CDSS outputs as a result of lack of action, relevancy, and trust. The main purpose of this research is to provide the required guidance centered on the literature done for various aspects of CDSS adoption with a critical focus on DL and ML-centered systems: quality assurance, commission, acceptance, and selections. Keywords Clinical decision support systems (CDSSs) · Deep learning (DL) · Machine learning (ML)
1 Introduction Machine learning (ML) and artificial intelligence (AI) prominence, over the past few decades, has been connected to the advancing volume of the medical data. As such, this has amounted to the enhancement in AI applications in general which includes computing clinical decision support systems (CDSSs). The computing CDSSs are a particular application formulated to assist patients and clinicians in medical decisionmaking illustrated as an active-knowledge framework. These systems utilize more than two items of patient information to produce case certain advice about Spiegel
A. Haldorai (B) Sri Eshwar College of Engineering, Coimbatore, Tamil Nadu, India A. Ramu Presidency University, Yelahanka, Bengaluru, Karnataka, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_60
781
782
A. Haldorai and A. Ramu
halter and Wyatt. CDSSs might apply expert knowledge or frameworks mastered using ML and statistics from the clinical sets of data. Over the past few decades, CDSSs were regarded as being capable of replacing the decision-making process in the medical sector. Nuanced which represents the modernized perspective of the CDSSs purpose aids the medical practitioners to make good decisions compared to clinicians or CDSS. The system makes this possible by processing the massive amount of data available in the medical databases. Normally, modernized CDSS makes the required recommendations to medical practitioners and the experts are expected to initiate their own decisions and overrule CDSSs projections that are considered inappropriate. Computing CDSSs has significantly evolved drastically, since their first advent that featured the computer-assisted diagnosis in the rule-centered MYCIN, HELP-alert systems and the Leed-Abdominal Pain framework. One way in which these systems have transformed is their incorporation into the medical workflow and other medical information systems. At the start, they were considered a standalone framework whereby medical practitioners had to compute the patient’s data before interpreting and reading the results. Starting from 1967, CDSSs began to be incorporated into the medical information management frameworks hence assuring two major advantages: Users might not necessarily have to compute data and CDSSs might be considered proactive which means recommending or alerting the required actions without the users necessarily seeking assistance from the CDSSs [1]. In earlier research the enhancements and adoptions of standards to share, store and represent medical information which permits the separation of knowledge contents from the codes of software in CDSS. As of 2005, the medical information systems began offering application programming interfaces (APIs) whereby they might potentially interact with CDSSs, hence permitting less standard and dynamic relationships [2]. The transformation of CDSS has amounted to a high variety of CDSS forms that might be categorized according to the number of characteristics. CDSSs can provide the required support on unprompted or demand data refer to information from the alert systems. Moreover, CDSSs might be categorized underlying technologies and reinforcement learning, genetic algorithms, probabilistic models, deep learning and rules among others. According to their functions, CDSSs are categorized to supportive diagnosis, treatment planning, outcome projections, management of medications, chronic disease management, image representation, image segmentation, pathological detection and preventive care [3]. Systematic literature suggests that the usage of CDSS minimizes unwarranted practice variation, waste in the medical frameworks, enhances the quality of medical care, minimizes the risks of burnouts and overloads among the medical practitioners. However, CDSSs have fundamental negative results, since the CDSS is faulty or its ineffective usage might potentially amount to the deterioration of the quality of healthcare provisions. Critical ethical queries and the safety of patient’s concerns are still pending concerns. The obligation of CDSSs has initially been to support and promote users (patients and clinicians) who are significantly liable for medical decisions. Considering the invention of DL, CDSSs are attaining ultimate human performance dimensions for different tasks, mostly for the analysis of images, acting
Artificial Intelligence and Medical Decision …
783
as black-boxes whereby the aspect of reasoning for the projected tasks has not been confirmed [4]. As such, this raises some novel questions concerning the liabilities and responsibilities. The regulatory procedures are adapting particularly, categorizing the CDSSs as clinical devices (legalized effects), whereas eliminating the definition of some other CDSSs like the ones that do not evaluate images or the ones that permit the users to evaluate the recommendation basis. Nonetheless, not even the regulatory approvals are assured about the positive complications. CDSSs inadvertently advance the workloads of medical practitioners. For instance, the well-known effects of the CDSSs alert system in patient monitoring are the alert fatigue which happens when medical practitioners start ignoring the messages as a result of the overwhelming false-alarm frequencies. Another possible risk amounting from the CDSSs adoption is medical practitioners losing the capacity to make proper decisions on their own or to effectively determine whenever it is effective to override CDSS. The present gains in AI make it critical for CDSS to be used in the process of making decisions for humans which possibly make these risks considerably pertinent [5]. This might become crucial in case of the computing system downtime or when patients with informal medical conditions are admitted for diagnosis and treatment. In that case, it is fundamental to establish the alerts to both negative and positive potential implications of CDSSs on medical decision-making processes. Several CDSSs have been in application for a few decades. However, their application has not yet been spread widely as a result of the number of issues connected to the implementation and design process such as medical practitioners not utilizing them as a result of lack of time or confidence in CDSS output. However, there exists an immense possibility requirement for CDSS as a result of the increased volume of the present information, advancing the diversity of treatment options and the rapid transformation of clinical technologies. CDSSs might be considered valuable as a form of delivering clinical care that has been tailored to the preference of patients and the biological features. The patients might possibly benefit from the general accumulation of human skills and the clinical professionalism leading in condition monitoring, treatment and diagnosis. There exists an advancing international need for quality custom medicine which is meant to enhance patient results, minimizing financial burdens and eliminating unwarranted practice deviations. Machine learning-centered CDSSs are projected to assist in the alleviation of a number of the present knowledge and the connected quality of healthcare variation over regions and countries [6]. Therefore, the query of maintaining, evaluating, implementing, presenting, developing and designing all forms of medical decision support capacities for consumers, patients and clinicians remains a crucial segment of research in the field of modern medicine. The purpose of this research is to provide an explanation and guidance on the various stages for successful and safe adoption of CDSS as shown in Table 1. This paper also explains how CDSS can be selected and also provides recommendations for acknowledging CDSS commissioning or testing. Moreover, the paper also describes how clinicians can rollout CDSS and provide the required guidelines for CDSS quality assurance. The rigorous process of doing the selections will aid in the identification of CDSS which fits the requirements and preference of localized
784
A. Haldorai and A. Ramu
Table 1 Summarization of the stages—CDSSs adoption Stages
Purpose
Selection
Select the best CDSS based on matched targets and the medical workflows, five rights, user acceptability and performances
Acceptance testing
Tests that CDSSs satisfy safety, privacy and security requirements applied onto clinical devices covering normal error unforeseen, exceptions and scenario situations.
Commissioning
Satisfying CDSS tests for optimized application of the medical facility (incorporating possible personalization of the safety tests). This also includes the performances in the localized context.
Implementation
CDSSs rollout the transitions from the initial workflow to the novel one after training the users and controlling their expectations.
Quality assurance
Ensure that the quality of CDSS is fit and monitoring can be done for both the external and internal updates, including the context drifts.
medical sites. Acceptance testing might ensure that the chosen CDSS fulfills the illustrated specifications and satisfies the safety guidelines. The process of commissioning will prepare CDSS for encrypted medical utility at the localized site [7]. The effective application phase has to amount in the rollout of CDSS to the trained end-users whose projections have properly been managed. Quality assurance will be utilized to ensure that the performance of CDSS is properly maintained and any issues are quickly noticed and mitigated. In this article, it is concluded that a systematic approach to adopt CDSS will aid in the elimination of pitfalls, enhance the safety of patients and promote the projections to success.
2 Selection Methods The dimension of the commercialized CDSSs for medical applications has been advancing over the past few decades. In that case, choosing an effective CDSS from those present is not easier, yet it is considered a critical step in the implementation of effective CDSS. User acceptance CDSS is fundamental and various implementation researches have indicated how CDSS is beneficial in perceiving the concept significantly. The usages of these are determined by the medical practitioners and allied medical professionals. In that case, the projected initial step in the entire process might be to formulate the multi-disciplinary steering board which includes key stakeholders, for example, patient representation, clinician champions, IT experts and department administrators who might be willing to be accountable and make decisions for CDSSs implementation. For over three decades now, studies indicate that the likelihood of user acceptance enhances whenever CDSS implementation includes the end-users other than forcing CDSS on the end-users [8]. For CDSSs to be considered effective, CDSS might be conceived as a segment of a wider, department-wide and coherent quality enhancement strategy whereby the medical
Artificial Intelligence and Medical Decision …
785
quality gap between the present patient process and the desired end-state has been identified and measured critically. Two fundamental aspects to investigate when choosing CDSSs include CDSSs quality and how effective CDSSs fits with tackling the medical quality gap. The CDSSs quality requires to be considered at about two levels: technological level platforms and that of the knowledge or data utilized to structure it. The CDSSs framework serves as an application that is possibly a medical device to be implemented, documented, tested and designed based on the application of recognized quality assurance techniques for the development of software application applicable in clinical domains. The clinical knowledge utilized in the construction of CDSSs might not be proved medically or objectively precise. However, it should attempt to captivate the present condition of scientific or professional opinions. Moreover, it should be possible to effectively verify that the essential clinical skill set satisfies some requirements such as being considered biased, completed and consistently interpreted. As for the CDSSs-centered models mastered using statistical evaluation or through ML, the evaluation of the quality of source information is fundamental. Information quality is essential since the principle of garbage in, garbage out applies to ML. Information is principally defined to be of high quality if it can fit close to the projected purpose and certainly, it has to be unbiased and a representative sample of the medical domain (medical or patient conditions) considerably being modeled. The appropriate procedures for anomaly detections dealing with incomplete data and data cleansing have widely been applied in databases and the presence of the potential biases rectified and assessed [9]. The major indicator of the CDSSs quality is its performance matrix. Performance measures vary depending on various forms of CDSSs. For instance, in CDSSs undertaking, the outcome predictions, the segment under the receivers operating characteristics (ROCs) curve and the c-index are certainly utilized as a performance metric. In other instances, the performance aspect is evaluated in terms of saved timeframe. Nonetheless, the evaluations of performance have to be complicated, mostly when the standard of medical performance is not present such as in therapy advice frameworks, whereby the medical practitioners might disagree. In the end, the most challenging to evaluate yet a valuable performance metric is the influence of CDSS on the medical processes and outcomes. Systematic reviews by CDSSs vendors of illustrative evaluations of usability and efficiency of CDSSs application might facilitate the execution of decisions. However, this has to be mastered that the experiments done by CDSSs developers might overestimate their potential benefits whereby a third party is necessary. A critical hazard evaluation that amounts to the exhaustive list of the possible risks and consequences alongside the mitigation plans for the mentioned risks is a segment of the regulatory procedure and might provide fundamental insights into CDSSs desirability [10]. During the process of selection, CDSSs acceptability has to be considered and evaluated over the performance matrix. For users to effectively accept the CDSS output, the evidence strongly supports the medical recommendations delivered by CDSSs have to be user transparent. The dimensions of the frameworks based on the hand-engineered elements and simplified frameworks (decision trees) are typically
786
A. Haldorai and A. Ramu
greater compared to the ones based on advanced techniques such as DL and random trees. As indicated earlier, it is fundamental to choose a CDSS that fits the necessities of the localized site. Foremost, the following population, intervention, comparison and outcome (PICO) framework and the ‘choice’ procedure has to be restricted to CDSS which targets the suitable population based on relevant comparators and interventions focusing on the outcomes of interests [11]. When choosing CDSS, it is based on five essential rights that CDSSs has to accomplish, namely: what (delivery of the right data, who (right people), how (correct format, where (correct channel) and when (the best timeframe. Delivering the correct set of data also means that the CDSSs output (clinical assessments and recommendations) has to be medically actionable, unambiguous, brief and relevant. CDSS has to fit the present workflow of the users as closely as possible. For instance, incorporated in the electronic health records (HER) limiting the efforts necessary for users to accomplish and act of systematic recommendations. For CDSSs to categorically fit the workflow of a certain clinic, CDSS customization is fundamental. In that case, functionality customization provided by every CDSS has to be considered during the process of selection. Another consideration connected to the localized workflow is whether the essential information for the precise functions of CDSS is present in a certain workflow point. Another element to consider whenever choosing CDSSs is its utility, more certainly depending on how training is required to be capable of using CDSSs [12]. The vendors require being clear concerning the expertise necessary for utilizing the framework. An essential consideration when choosing CDSSs has to be measured, contrasted to the alternate CDSS or other clinical devices (new equipment). Nonetheless, it is challenging to showcase the returns on investments of CDSSs, mostly against a lot of competing priorities evident at the delivery system framework. A comprehensive evaluation of the overall costs included in CDSS acquisition had been undertaken before its purchase which includes a single form of cost release (implementation, training and purchase, among others) but the expenses incurred over a specific timeframe such as resource utilization and costs of maintenance (time of users). The expenses have to be weighted not just over the estimated enhancement in medical outcomes, but also the projected savings as a result of the efficiencies that have been facilitated by CDSSs. Other considered factors include compatibility with CDSS maturity, legacy application and the upgraded availability.
3 Acceptance Testing As for acceptance testing, CDSSs might effectively be seen as a clinical tool in which a lot of procedures are normally in place with medical providers. Acceptance tests for clinical tools assure that the all-defined specifications are accomplished and that the clinical devices satisfy pertinent privacy requirements. These tests are typically defined based on the CDSSs vendors but have to be operated in the availability of localized site representatives. On successful attainment of acceptance tests, the
Artificial Intelligence and Medical Decision …
787
reports will thus be signed and facilitate the approval of the payments. Consequent to that, the collection of test cases has to be understandable which includes covering the edge cases of the CDSSs domains that are considered as projected cases. The technical factor of the acceptance test has to be done by the technological representatives whereas tests projected on clinically oriented tests and usability tests have to be done by a sub-group of users which include the representative samples of the expected end-user populations. The acceptance test plans have to tackle the aspects below: (a) (b) (c) (d) (e) (f)
Setup and installation of devices. The effective functioning of APIs provided by CDSSs. Complete walkthroughs of user interface, operating CDSSs as a segment of the prevailing workflow. Medical completeness, comprehensibility, consistency, repeatability and relevance of the CDSSs output. Security, privacy and auditing functions. Normal error cases such as incorrect, incomplete, unexpected input information and closure cases (such as the outage of power) amounting to incomplete transactions. CDSSs need not output ineffective projections in the aspect of inaccurate or incomplete data. Additionally, CDSSs are projected to deal with the conditions by placing internalized consistency, giving appropriate error tests and if essential proceeding to unconsiderable shutdowns.
On top of the aspect mentioned above, the acceptance testing of CDSSs has to evaluate the CDSSs accuracy and its recommendations, as an ineffective projection that might endanger the privacy or condition of patients. The tests have to contrast the potential results of CDSSs to project outcomes on restricted, small and fixed but representative examples of the actual cases. The projected accuracy centered on acceptance test findings needs to be contrasted against the accuracy aspect claimed by statistical tests and vendors whether it is in the specified error tolerance aspect or now. The sample is applied to the other qualitative and quantitative approximations provided by the vendors. To test whether the actual accuracy of CDSS (various parameters) is in a certain error tolerance centered on the test samples, statistical tests have to be applied to evaluate the probability that the precision seen in the samples belongs to the probability distributions resolute by claimed error tolerance and accuracy. The probability is evaluated below in a particular fundamental threshold to eliminate the hypothesis that there is real accuracy in error tolerance. Lastly, an assessment for accessibility and completeness of CDSSs user manual is a segment of the acceptance test which would be fundamental for novice users or unusual or emergencies.
4 Commissioning Commissioning represents the procedure of preparing CDSSs for safe medical use in localized sites, meeting more considerable requirements and the end-user expectations. In that case, commissioning tells if CDSSs have been incorporated in the
788
A. Haldorai and A. Ramu
localized site based on the agreed requirement, effectively handover the vendors and see significantly if it operates properly. It is considerable to prepare for this stage through the creation of a commissioning structure that defines the tasks, plans and necessary equipment resources which include the support necessary for CDSSs vendors. The foremost step in the commissioning structure is its incorporation in the localized site that in case CDSS inevitably necessitates some customization and configuration. Customization might be essential for safety or technical purposes. For instance, to ensure that CDSSs parameters are precisely connected to localized EHRs and that the explanation of medical terms is in sync between localized EHRs and CDSSs. Customization is considered as a powerful framework to ensure that CDSSs output is safe, useful and relevant for users. The qualitative evaluation found that the essential sites devoted enough staff timeframe for CDSS customization. A sample of customization is capable of improving and assessing the effectiveness of alerts to eliminate alert fatigue. To evaluate that the installed CDSSs operates effectively in the localized site, the test structure has to be created and executed. To start with, the CDSSs implementation is likely to accept some transformations in the workflow of the end-users. In that regard, the data essential to support the futuristic workflow requires identification which also includes the testing of the novel workflows. Once the novel workflow is structured, the purpose is to make sure that CDSSs are operating effectively by proper testing of fundamental medically relevant cases. The steering members formed by IT experts, administrators and clinicians have to be included in the identification of essential cases and corner cases whereby the CDSSs which have been installed might fail in the localized site ecosystem and amounts to poor reliability and quality. A rare and difficult cases alongside the representative samples of localized case populations which might retrospectively be tested when the databases with past scenarios exist. In this scenario, the recommendations of CDSSs are particularly evaluated by the board of medical experts in particular blind research where medical practitioners have ignored the CDSS output and contrasted it over the decisions which have been taken over the past few decades. Nonetheless, it is fundamental that CDSS has to be evaluated and the actual-world scenarios from users’ personal medical practices before implementation might be considered as well. The option is to evaluate CDSSs prospectively through the process of running a piloted program whereby CDSSs are utilized in analogous to prevailing workflow or whereby CDSSs are utilized with the supervisions of the present workflows as a fallback. Techniques to cover the represented samples of rare and usual cases incorporate random sampling, control flow testing and input selection. In the process of plotting, it is essential to perform an initial evaluation of the medical relevance of CDSSs as user acceptance, considering CDSS projections and its effects on medical decisions and certainly on the health and patient results. Fundamental deviations on the approximation performance of CDSSs during this stage as contrasted with that in the vendors’ performance claim and acceptance testing, including error tolerance have to be assessed with the vendors. Failure mode evaluation is a fundamental segment of commissioning tests, whereby errors in information entry are categorically simulated and CDSSs are tested and analyzed for consistency. Testing in commissioning
Artificial Intelligence and Medical Decision …
789
is fundamental to grow the confidence of localized physicians so that the support framework operates in a localized setting [13].
5 Implementation The process of implementation is a fundamental factor in the CDSSs success and includes the designing and execution of the wide-range rollout plans in which transition occurs from old workflow to novel ones. This includes CDSSs and the CDSS deployment in the localized site. Effective CDSSs implementation necessitates the preparation of both the localized site infrastructure and the individualized users, for global usage of CDSSs. The analysis of infrastructure will depend on localized sites and CDSSs. However, there are common frameworks in the manner in which users can get prepared for the application of novel CDSSs. It is fundamental to educate and communicate with users [14]. Good training of the relevant stakeholders, unexpected CDSS uses critical care for the successful implementation and has to include various aspects of the time to use or eliminate it [15]. Apart from that, various aspects of how it has to be used, how the interpretation of the CDSS output has to be done and how the system can be overridden are some of the factors to consider. This also incorporates assisting users to understand how the CDSS can affect normal activities in the modern age and how responses can be provided [16]. It is critical to consider this as a major segment of the training to effectively manage expectations of the users in terms of effectiveness and efficiency which also includes ensuring that users that comprehend the weaknesses and strengths of CDSS. Various key stakeholders might have various expectations [17] and several primarily considered CDSS can be used as a channel for promoting standardization, safety and quality where medical practitioners might consider it as different. The process of training should also consider the major purpose of getting the users ready for the essential process. In this process, CDSS shall only be utilized if it is considered essential by the people using it. The hands-on training aspect is considered as an essential tool which means that users might choir hand-holding at first since the on-site support from the relevant vendors is required to aid in the process of mitigating immediate problems that might be evident. The rollout or deployment of CDSS might be done incrementally. For instance, rolling out one post or facility to retrieve the Kinks might require prompt the effective preparation [18].
6 Quality Assurance Before the process of deploying CDSS fully, it is fundamental to formulate the quality assurance system which will be obliged to ensure that the performance and CDSS safety is effectively maintained. This process ensures that quality is maintained throughout the life cycle. As a fundamental segment of quality assurance
790
A. Haldorai and A. Ramu
program in CDSS, the aspect of performance has to be illustrated based on the application of a wide range of metrics that focuses on efficacy and efficiency. These allow the effects of CDSS to be evaluated over a particular time frame. The basis of efficacy can be categorized to CDSS functioning which includes specific and sensitivity of agnostic frameworks, and pretty generic aspects such as the safety of patients are transforming potential patient outcomes, e.g., life expectancy. The aspect of efficiency can be evaluated as resource-based, such as productivity and costs. To evaluate the performance of CDSS, it is fundamental to quantify the baseline levels of performance which include CDSS implementations beforehand and evaluation of the targeted performance upfront. The quality assurance plan should guarantee that the various malfunctions are resolved after not being still in the shortest duration possible. To make the discovery of CDSS issues possible, techniques should be identified through the use of quantitative and qualitative analysis such as creating alert frameworks through our overriding recommended CDSS. Visual evaluation, statistical procedures and control evaluation have indicated good results in the process of detecting the possible malfunctions. Adding to these malfunctions, it is essential to track or log the various cases where CDSS has not been fulfilled; i.e., whenever the alerts are being ignored or whenever the recommendations are being overridden. This is the case to tell if CDSS has been overridden and the reason behind why valuable and urgent insights amount to the identification of malfunctions that have not been noticed. Evaluations are tracking down the processed application installed. CDSS is essential since it might amount to minimized performance. Models can be upgraded using various means such as baseline shift levels (for binary models) and the cut-off framework for the binary results; computing of novel values for the present parameters or doing framework training on expanded information which amounts to novel framework parameters, coefficients and potential cutoffs for the novel binary outcomes. The best safeguarding for external and internal drift is to create and significantly analyze logs for incorrect and inappropriate feedbacks from CDSSs. At the same moment, repeated localized validation cohorts have been being gathered from one time to the other or to fundamentally reevaluate the kind of tests conducted during the commissioning phase. This form of repetition might aid in ensuring that CDSSs remains are medically valid irrespective of the transformations of localized practices or the evidence-centered guidelines. This form of continuous localized validation system will be essential when giving updates to CDSSs. Lastly, it is essential to refocus that no CDSSs can be considered perfect. However, at a minimal measure quality assurance framework will report that the CDSSs performance has met the projected criterion on the commission findings as benchmarked. Among the CDSSs top priorities, steering up the members might be purposed to update and establish the management protocols. CDSSs with clinical software generally are purposed to be updated even in offline modes. This means that through the user instigated or vendor instigated transition CDSS is considered as temporary, it takes out the medical application and is included in the maintenance condition. The subsequent transformations are done in the maintenance conditions, such as the
Artificial Intelligence and Medical Decision …
791
application of software version corrections and updating of the error function. As for analogy with the various maintenance aspects and quality assurance of medical systems, the medical handover which is the recognition of the system into the medically operational condition based on these upgrades has to be only permitted after several CDSSs performance and verification done on the transformation system. The essential tests have to be specified by the vendors or the maintenance manualcentered risk evaluation, but it might be necessary to incorporate several tests from the acceptance testing process. This is to certify that the fundamental functionalities of CDSSs have been updated. With the shift of clinical software frameworks to the cloud system, enhancing system automations and the mathematical algorithm that is capable of learning should countenance the chance of CDSS updating in an online mode. These CDSSs might be permitted to transform the actual time centered on the kind of interactions between the users and system projections, such that the condition of CDSSs might significantly transform some forms of interactions. An upgraded management protocol might explicitly allow online updates that pose interesting and novel challenges which are based on the search idea trade-offs between the possibility of continuous enhancement of performance against the risk of undetected performances and degradations such as systematic biases in inputs. Another essential priority is based on the implementation of a routine quality assurance plan which specifies the kind of tests that have to be undertaken, the time they should be undertaken and to who they are meant for. As a major segment of the quality assurance test, different functionality aspect of CDSSs is evaluated over the acknowledged ground truth. As a rule, the forms of quality assurance obligations are retrieved from the various checks as in the commissioning stage. In that regard, the documentation findings of commissioning might be utilized at specified timeframe intervals, to certify that CDSSs performance has not transited over a considerable timeframe. A lot of statistical anomaly determination frameworks applied to the CDSSs detection over a considerable timeframe are compared and described in research that will vary depending on the condition of CDSSs. The frequency and condition of CDSSs quality assurance tests depend on the chances of unwanted deviations of CDSSs performances and the possible effects. The quality assurance tests have to be undertaken frequently for either the potential failures or non-conformance events which amount to severe requirements. Unlikely events and failures which do not have fundamental medical results require checking more infrequently. An essential effort has to be directed to the procedural mitigation of the failures carrying some consequences because this might be challenging to intercept in the routine quality assurance system. To effectively create and execute the quality assurance plan, it is projected that the users in charge are in acquisition for the tasks based on data evaluation methods through the process of training. As such, bearing in-depth knowledge on the manner in which CDSS underlying technologies operating will permit clinicians to identify the potential malfunctions and comprehend their root cause. This form of training can be provided by the vendors themselves or any third party providing some customer training services. Eventually, AI and ML aspects will be tackled in the medicinal and physics field that will
792
A. Haldorai and A. Ramu
therefore amount to a deeper and wider understanding of the various computing systems.
7 Conclusion and Future Scope CDSS has shown significant potential for enhancing medical services and the safety of patients which also incorporates the minimization of unwarranted variation, resource usage and costs. AI-centered CDSS in recent times has been recognized for their capability to leverage the enhancing availability of medical information which is purposed to help patients and medical practitioners in various hospital conditions (such as providing personalized evaluations of medical results or proposing the correct diagnoses) which are based on both structured data (i.e., EHR) and unstructured data (i.e., clinical imaging). However, inappropriate and inaccurate CDSS might potentially deteriorate the quality of medical services and place the patients at possible risks. AI-centered CDSSs have reported some pitfalls which have to be covered in future research. In the future, researchers should focus on over fitting the limitations on the kind of information utilized in AI training. These measures might amount to CDSS failure in generalizing from the training information and to determine the performances at localized sites. In that case, considerable precautions have to be put in a place to control the possible CDSS effects. It is fundamental to choose with care the CDSSs matching the medical requirements of the localized sites. With any clinical device, CDSS necessitated rigorous acceptance testing, quality assurance and commissioning by the localized sites. Apart from that, effective implementation plans are key to overcome the potential barriers for successful CDSSs.
References 1. C. Bennett, K. Hauser, Artificial intelligence framework for simulating clinical decisionmaking: a Markov decision process approach. Artif. Intell. Med. 57(1), 9–19 (2013). https:// doi.org/10.1016/j.artmed.2012.12.003 2. P. Lucas, Dealing with medical knowledge: computers in clinical decision making. Artif. Intell. Med. 8(6), 579–580 (1996). https://doi.org/10.1016/s0933-3657(97)83108-9 3. H.N. Dinh, Y.T. Yoon, Two novel methods for real time determining critical clearing time: SIME-B and CCS-B based on clustering identification, in 2012 Third International Conference on Intelligent Systems Modelling and Simulation, February 2012 4. Z. McLean, Formulating the business case for hospital information systems—analysis of Kaiser permanente investment choice. J. Med. Image Comput. 42–49 (2020, July) 5. B. Srpak, N. Campbell, Analysis of biological framework and incorporating physiological modelling, J. Med. Image Comput. 34–41 (2020, July) 6. W. Horn, Artificial intelligence in medicine and medical decision-making Europe. Artif. Intell. Med. 20(1), 1–3 (2000). https://doi.org/10.1016/s0933-3657(00)00049-x 7. I. Rábová, V. Koneˇcný, A. Matiášová, Decision making with support of artificial intelligence. Agric. Econ. (Zemˇedˇelská Ekon.) 51(9), 385–388 (2012). https://doi.org/10.17221/5124-agr icecon
Artificial Intelligence and Medical Decision …
793
8. R. Yager, Generalized regret-based decision making. Eng. Appl. Artif. Intell. 65, 400–405 (2017). https://doi.org/10.1016/j.engappai.2017.08.001 9. N.M. Hewahi, A hybrid architecture for a decision-making system. J. Artif. Intell. 2(2), 73–80 (2009). https://doi.org/10.3923/jai.2009.73.80 10. C. Gonzales, P. Perny, J. Dubus, Decision making with multiple objectives using GAI networks. Artif. Intell. 175(7–8), 1153–1179 (2011). https://doi.org/10.1016/j.artint.2010.11.020 11. R. Degani, G. Bortolan, Fuzzy decision-making in electrocardiography. Artif. Intell. Med. 1(2), 87–91 (1989). https://doi.org/10.1016/0933-3657(89)90020-1 12. P. Giang, P. Shenoy, Decision making on the sole basis of statistical likelihood. Artif. Intell. 165(2), 137–163 (2005). https://doi.org/10.1016/j.artint.2005.03.004 13. D. McSherry, Conversational case-based reasoning in medical decision making. Artif. Intell. Med. 52(2), 59–66 (2011). https://doi.org/10.1016/j.artmed.2011.04.007 14. T. Leong, Multiple perspective dynamic decision making. Artif. Intell. 105(1–2), 209–261 (1998). https://doi.org/10.1016/s0004-3702(98)00082-4 15. R. Chan, A. Morse, Artificial intelligence-enabled technologies and clinical decision making. Univ. West. Ont. Med. J. 87(2), 35–36 (2019). https://doi.org/10.5206/uwomj.v87i2.1425Ma llapur 16. P. Mallapur, J. Shiva Krishna, U. Hosmani, K. Kodancha, Design of gear tooth rounding and chamfering machine. Trends Mach. Des. 4(3), 38–44 (2018) 17. T.K. Araghi, Digital image watermarking and performance analysis of histogram modification based methods, Intelligent Comput. 631–637 (2018, November) 18. G. Yu, Z. wei Xu, J. Xiong, Modeling and safety test of safety-critical software, in 2010 IEEE International Conference on Intelligent Computing and Intelligent Systems (2010, October)
Survey of Image Processing Techniques in Medical Image Assessment Methodologies Anandakumar Haldorai
and Arulmurugan Ramu
Abstract Medical image assessment represents the procedure utilized in the formulation of pictures of different parts of the body to study or identify a particular disease. There are many medical image assessment procedures carried out every week in the entire globe which means that the sector is rapidly growing as a result of the constant advancement of image processing methodologies such as Image Enhancement, Analysis, and Recognition. This article, presents a critical survey of medical image assessment based on the application of image processing methodologies. It also provides a summary of how image interpretation issues can be exemplified using various image analysis algorithms such as RIO-centered segmentation, k-means and watershed methodologies. Keywords Image processing methodologies · Medical image assessment · Medical image processing
1 Introduction Medical image assessment represents the procedure of projecting visible pictures in the inner body structure for medicinal, scientific evaluation and treatment to get a clear view of the interior parts. This procedure follows the management and identification disorders of the body and also creates an information bank of regular functions and structures of the body organs for ease recognition of body anomalies. This procedure incorporates both the radiological and organic pictures that utilized the electromagnetic energy (Gamma rays), magnetic scopes, sonography, isotope and thermal pictures. There are various technologies utilized in the process of recording data on the functions and location of certain body parts. In a single year, thousands of pictures are done in the whole world for various diagnostic aims and basically, half A. Haldorai (B) Sri Eshwar College of Engineering, Coimbatore, Tamil Nadu, India A. Ramu Presidency University, Yelahanka, Bengaluru, Karnataka, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_61
795
796
A. Haldorai and A. Ramu
of them utilize non-ionizing and ionizing radiation modules. The process of medical image assessment projects the image of internalized structures of the internal structures without the invasion procedures. These body pictures have been projected by the rapid processors as a result of the conversion of energies which are projected logically and arithmetically into signals. The signals as a result are transformed into digitalized pictures to represent various forms of body tissues. The healthcare image assessment processing denotes how digitalized pictures are handled in a computer system. This procedure incorporates various forms of operations and techniques which include image storage, communication, presentation and gaining. The image assessment element which represents the measure of features such as colour and illumination is considered as a viewed element. The digitalized picture includes several advantages such as affordability and speedy processing which further helps to cut on the cost, easy storage, quality evaluation and adaptability. The demerits of the digitalized pictures include the exploration of copyrights, the inability of resizing and quality preservation including the necessary massive storage volumes. The image processing methodology is the application of the computer in the process of manipulating digitalized pictures. This methodology incorporates advantages such as communication, information storage, adaptability and elasticity. With the advancement of the various protocols image assessment resizing methodologies, the pictures are maintained as efficient. This methodology incorporates several protocols meant to synchronize pictures. Both the 3D and two-dimensional pictures can effectively be processed in multi-dimensional aspects. The methodologies of image processing were established in the early 1960s and utilized in various fields such as clinical purposes, TV image enhancement, space and artwork. In the late 1960s and early 1970s with the rapid advancement in computer technology, the image processing cost is gradually becoming rapid and less. In the 2000s, the processing of pictures was made faster, cheaper and simple to use. Our human visual framework is one of the most credible schemes that have once existed in history. It is a framework that allows organisms to comprehend and organize the various complex aspects of the exterior ecosystem. Apart from that, the visual framework includes the eye which is capable of transmitting light to neural elements and signals before relating them to the brain system and excerpts the required information. The human eye is considered as a bilateral cylinders framework, which is found in the anterior segment of the skulls and is 2.5 cm in length and crosswise diameter [1]. At the center of the human eyeball, a blackened structure known as the pupil is Doi. This part allows light to penetrate the eye and narrows whenever exposed to too much light. This diminishes the amount of light to the human retina and boosts the process of visual image assessment. There are a lot of muscles around the human eye which are obliged to control the pupil’s widening. In that case, the eye includes some supportive elements, typically known as the sclera. Its lens acts as a ligamentous section found around the cornea and with a shape that transforms considerably as a result of the contraction of muscles. Light is concentrated in the middle section of the human eye to focus from the lens and cornea onto the retina. The Fovea is Doi to concentrate the pictures into the human retina [2]. Lastly, the brain system
Survey of Image Processing Techniques …
797
is there to formulate the colours and details through the application of multiple image assessment processes. In that case, this paper evaluates the image processing methodologies, image classification and segmentation, with a critical focus on the human eye.
2 Image Classification and Image Segmentation The methodologies of image segmentation can be categorized as both the techniques and features utilized. The aspect of features includes Edge Information, Pixel Intensity and Texture. The techniques are the center of the features and can also be grouped into structural methods and statistical methods.
2.1 Structural Methods In this section, spatial elements of the image: regions and edges are considered. Different edge detection methods and algorithms have been related to the extraction boundaries between the complete brain systems. Nonetheless, these algorithms are considered sensitive to noise and artifacts. The growth of regions is another key to structural methodology. With this technique, an individual starts by grouping the pictures into smaller segments that can be viewed as seeds. The boundaries between the various adjacent parts are therefore evaluated. Firm boundaries considering several properties are maintained, whereas rare boundaries are eliminated and the adjacent parts are joined. The procedures are done iteratively to eliminate the boundaries that are weak and worth the elimination. Nonetheless, the performance methodology is dependent on the selection of seeds and whether certain parts or regions have been defined or considered robust.
2.2 Statistical Methods The statistical method is considered as pixels about probability figures and is determined about the intensified image distribution aspect. In this methodology, the image structures are labelled and assigned through the comparison of the Gray level figures to certain intensified thresholds. One threshold acts on the image segments in two separate parts such as foregrounds and backgrounds. In other times, the job of choosing the thresholds is easier since there is a variation between the Gray level of these objects requiring segmentation.
798
A. Haldorai and A. Ramu
2.3 Mathematical Methods Mathematics methods and frameworks establish the foundation of medical image processing. Based on these methodologies on the extraction of information of the pictures fundamental techniques are presented for attaining scientific development in behavioral, biomedical, clinical and experimental research. Currently, medical/healthcare pictures are attained by applying wide-range methodologies over various biological scales that are beyond simply visible microscope and photograph lights used in the twentieth Century. Modernized medical/healthcare pictures might be viewed to be geometrically featured as a collection of data samples that quantify the diverse physical aspects and the time varied for the tissue hemoglobin. The widened scope of the dramatic illness enhances the capacity to apply novel processing methodologies and link a lot of sample channels of information into complex and sophisticated mathematical frameworks of both physiological dysfunction and function.
3 Key Issues in Image Classification and Image Segmentation The key issues are well comprehended to play a fundamental role of image assessment in the process of therapy analysis and to appreciate the present application of pictures after, during and before treatment. Our evaluation focuses on four essential elements of Image-Guided Therapy (IGT) and Image-Guided Surgery (IGS): control, monitoring, targeting and localization [3]. Certainly, in the sector of medical image assessment, there are four significant issues identified: (a) (b) (c) (d)
Segmentation—Automated methodologies that form the patient certain frameworks of essential anatomies are from the pictures. Registration—Automated methodologies that are in accordance with the multiple sets of data with each other. Visualization—The tech ecosystem that image-guided processes are displayed. Simulation—The software which might be utilized to plan, experiment processes and evaluate certain accessibility techniques and simulate projected treatments.
3.1 Modality of Pictures Various forms of image assessment techniques are been evaluated to suit the clinical application. Practically, these are complementary as technologies that provide more insight into underlying realities. In the sector of medical image assessment, these various image assessment methodologies are referred to as modalities. The anatomic
Survey of Image Processing Techniques …
799
modalities provide more insight into the anatomic morphology and include Magnetic Resonance Image assessment (MRI), Ultrasound (US), Radiography and Digitalized Subtraction among others. The functional modality represents forms of metabolism of the various underlying organs and tissues. They incorporate the three various nuclear medical modalities which include Scintigraphy, Positron Emission Tomography (PET) and the Single Photon Emission Computer Tomography (SPECT), including the Functioning Magnetized Resonance Image assessment (FMRI) [4]. The mentioned modalities are not exhausted since other novel methodologies are being introduced every year. Actually, different pictures have now been acquired digitally and incorporated in computerized image archives and the various communication frameworks.
3.2 Issues in Medical Image Assessment There are a significant number of issues in medical image assessment and processing. These incorporate: (a) (b) (c) (d) (e) (f)
Image restoration and enhancement. Accurate and automated segmentation of feature elements. Accurate and automated fusion and registration of multimodalities of pictures. Categorization and picture features and the categorization of typing structures. Quantitative evaluation of image assessment features and data interpretation measures. Enhancement of integrated frameworks in the medical sector.
Categorization of digitalized pictures is done based on two broad forms of pictures. The raster pictures illustrate the four-sided categorization of typically grouped sampled figures referred to as pixels. The digitalized pictures are typically inaccessible and include multifaceted colour variations. The digitalized pictures include fixed resolutions as a result of the size of the pixels. The digitalized pictures have the capacity to lose their quality in the process of resizing as a result of some form of missing information. Digitalized pictures are utilized significantly in photography and image assessment as a result of effective shading of image colours. Image-gaining tools are obliged to control image resolution. Digitalized pictures incorporate several formats such as Tag Inter-leave Formats (TIFF), Portable Network Graphic (PNG) and Paintbrush (PCX) among others. In image assessment, the vectors are described as the bent and wrinkles which illustrate the precision of computers. These vectors incorporate multiple qualities such as hue, dimension and line width. Apart from that vectors can typically be scalable and the reproduced pictures are considered to be in various magnitudes without transforming or altering their quality. The vectors are also effective for use in diagrams, line paintings and design.
800
A. Haldorai and A. Ramu
4 Digitalized Image Processing Application The technology of digital image processing is applied in various medical sectors. These include:
4.1 Medicine In the medical sector, a lot of methodologies are utilized in the process of segmentation and texture evaluation that is utilized for disorder identification and cancer treatment. Image assessment registration and fusion techniques are incredibly utilized in the modern age mostly in novel modalities i.e. PET MRI and PET CT [5]. In the sector of bio-data, telemedicine and format compression methods are utilized to link to the pictures remotely.
4.2 Forensics The most common methodology utilized in the sector includes pattern matching, edge detection, de-noising, biometric purposes and security purposes such as the face, fingerprint documentation and personal identification. The aspect of forensics is centered on the database aspect related to an individual. The forensics is capable of matching input information (photo, eye and fingerprints) with the potential databases to effectively define individual identity. Several diagnostic healthcare image assessment modalities are applied in the probing of the human body. Moreover, it involves the interpretation of image results that necessitate sophisticated image assessment and processing of methodologies that have the capacity to enhance the interpretation and analysis of pictures hence providing both fully-automated and semi-automated detection frameworks of tissues. Moreover, it also enhances the characterization and measurement of pictures. Generally, several transformations are required to effectively extract information based on the interest of pictures and hierarchy of steps for data enhancement that facilitates feature extraction, image classification and preceding analysis [6]. Typically, these are done sequentially for more sophisticated obligations that necessitate parameter feedback concerning steps to initiate iterative loops. Various ongoing research areas have been chosen to highlight new enhancement in the display and analysis of data in the projection that these methodologies will be considered to other selected applications.
Survey of Image Processing Techniques …
801
4.3 Mammography Mammography represents one of the most essential techniques in the process of analyzing and identifying breast cancer which is a form of malignancy in women. This application is capable of detecting illness at its initial stages whenever surgery and therapy are considered effective. Nonetheless, the screening mammogram and its interpretation are considered a repetitive task that includes subtle suffers and signs from the high rates of false positive and false negative. The Computer-Aided Diagnosis (CAD) focuses on the increment of predictive values of the methodologies through the identification of mammograms to show the areas of suspicious body abnormalities and evaluate their features, as an assistance framework of radiologists. Approximately 90% of breast cancer disorders are reported in the cell lining the milky duct of the human breast and referred to as the Ductal Carcinoma in Situ. Whenever tumour surpasses the duct lining, it is considered invasive and might metastasize to some other body parts. The radiographic indication is grouped into two broad sections: Lesions and Micro-calcification. Micro-calcification is considered a major segment of detecting the ‘in situ’ carcinomas which are the ones found in the milk ducts and the order of minor diameter and microns. Several lesions are ill-defined to shape and typically with speculations and strands that radiate from tissues and the same in radio-opacity to the normalized tissues. The image assessment necessity in mammography is considered stringent in both contrast and spatial resolution. The reliability and performance of CAD are dependent on several elements including feature selection, computational efficiency and lesion segmentation optimization, computational efficacy and image relationship between the healthcare relevance and image visual similarity of the CAD findings [7]. Image segmentation of the breast part acts as a limit of the search region for micro-calcification and lesion. Moreover, this aspect is essential to transform the grey valuations of the pictures to compensate for the varying thickness of tissues. One of the best methods to execute this is to incorporate grey values about the Euclidian Distance Map which is capable of mapping distances to the skin lines in a more smoothened aspect of a mammogram. The noises in these pictures might be minimized based on the application of median filtration. However, it might disturb the contrast and shapes of minor body structures. An enhanced methodology links the findings of morphological dilation and erosion using the multi-structured elements. To enhance the reliability and accuracy of mass body part segmentation, a wide-range computing algorithm have been projected, tested and developed. These algorithms include the active contour model algorithm, adaptive part growth algorithm, multiple layer topographical algorithms and the vibrant programming boundary algorithms. As a result of the diversified breast mass and the overlapping of breast tissues in the two-dimensional projection of pictures, it is challenging to contrast the robustness and performance of the image segmentation techniques. The features that are essential for characterizing lesions incorporate the measure of speculation, its texture and shape. Spiculation elements are typically incorporated in the calculation of image assessment gradient since Sobel Masks. The edge gradient
802
A. Haldorai and A. Ramu
which is cumulative from the Sobel magnitude of image edges might be plotted as histograms of the radial angles from the Sobel edge phases to determine the speculation degree. The Full Width of Half Maximum (FWHM) of the image gradient is capable of distinguishing the speculated masses of the smooth mass. Other features have utilized the multiple-scaled line detector to measure and detect the speculated mass [8]. The middle of the mass lesion acts as circular to specified filters while the boundaries of the lesion might be unwrapped which means that it is varied from the smoothed version utilized to feature the spiculations degree. Other essential features incorporate the symmetry that might include the automated registration of both the right and left breast pictures and transform with time. Gabor and Wavelets filters have exclusively been compared and investigated in literature and the Gabor filter has more performance and corresponds to the human vision, particularly for edge detection sensitivity. Other texture elements retrieved from the co-occurrence transformation have been also been tested. Over the past few years, fractal dimension has been indicated to be efficient and effective as a metric for evaluating the texture in the classification and detection of suspicious breast mass parts. The fractal dimensions might be utilized to compare benign and malignant breast mass and incorporates the high connections with visualized similarities. Since the fractal dimension is considered as a feature computed in the domain frequency it incorporates the merits of invariant lesion position to scale and rotation. Several researchers have extracted various characteristics and utilized the principle element evaluation to detect the most essential combinations. Various methodologies might be analyzed using the Receiver Operating Characteristics (ROC) evaluation. However, this might not be contracted with one another unless one image database is utilized. ROC curves show the performance of the PC categorization radiologists and methodologies in the job of contrasting between the benign lesion and malignant. ANN shows the artificial neural networks utilizing the cumulative edge gradients elements and the hybrid framework utilizing several features. Micro-calcification might be analyzed using the morphology aspect (brightness, area and shape) of personal spatial distribution, heterogeneity and calcification of personal calcification in a specific cluster. They might be enhanced through the thresholding of pictures and morphological opening based on the application of structural elements to effectively eliminate minor objects whereas preserving the shape and sizes of calcification. The isolated form of calcification incorporates minor clinical relevance which means that researchers and investigators applied the clustered algorithms to the categorized system, whereby the clusters include more than the chosen micro-calcification in the parts of the sizes chosen. These frameworks are curtained and implemented with the use of k-Nearest Neighbour (k-NN) algorithms [9]. Both the spatial heterogeneity distributed with these body features in a single cluster can be utilized in the evaluation of the malignancy possibility, discriminant evaluation, Bayesian methodology, rule-centered methodology and the genetic algorithms, essential in classification. CAD system does not necessarily have to be flawless since they are typically utilized with radiologists only. Due to the fact that
Survey of Image Processing Techniques …
803
the cost of the missing cancer is significant compared to benign misclassification results, it is essential to develop and minimize false negatives which are those that incorporate high levels and sensitivity for false positives.
4.4 Bone Osteoporosis and Strength Osteoporosis represents the prevalent bone illness features by the absence of bone firmness and the consequent fractured risks. Due to fact that it tends to be asymptomatic until the body structures happen, a few individuals are essentially diagnosed at a go for effective therapy analysis and administration. Medically, bone minerals and densities are utilized to effectively assess and diagnose osteoporosis. Transformation in bone masses is typically utilized as a surrogate for mitigating future risks. Even though Bone Mineral Density (BMD) is considerably utilized clinically, it has graduated the internalized bone system which is a fundamental determinant of the healthcare strength of the bones and might possibly amount to accurate osteoporosis diagnosis [10]. The limited resolutions of the commercialized CT scanners preclude effective resolution of the trabecular features. Nonetheless, CT pictures retain a number of architectural data. However, it is diminished by insufficient Modulation Transfer Function (MTF) of the image assessment framework which is featured by the fractal element of the trabecular bone and lacunarity that represents the measure of gap distribution in pictures. Fractal dimension illustrates the manner in which objects occupy sufficient spaces and are connected to the structural complexity. The dimension of fractal structures is connected to the radial Fourier energy spectrum of pictures as a result of applying the Fraction Brownian Motion as a framework of the natural fractal. The fractal signature estimates that are CT independent as scanners utilized a number of settings and might be retrieved through the correction of the energy spectrum for the degradation of pictures as a result of noises and image blurring using MTF scanners. Nonetheless, transformation in fractal dimensions requires to be interpreted carefully. The globalized fractal dimension should not transform monotonically with reference to decalcification. r (1) L(r ) = 1 + variation mean 2 (r ) Lacunarity evaluates the gap evaluation and sizes of information which means that the higher the heterogeneity the higher the form of lacunarity. In this case, a firm algorithm for evaluating lacunarity evaluates the form of deviation from the translating invariance of the image brightness and distribution utilizing the gliding box sample. Lacunarity might be defined based on the application of localized moments which is a measure for every neighbouring size of pixel image i.e. the mean of (r ) and the variation of (r ) that represent pixel values for the nearest sizes of (r ). Lacunarity
804
A. Haldorai and A. Ramu
value mean can be evaluated and calculated over a wide scaled range for the bone pictures to show the average marrow sizes and the heterogeneity degree.
5 The Medical Image Assessment System The medical image assessment systems utilized the signals from the various patients to project pictures. The systems involve both non-ionizing and ionizing sources.
5.1 X-Ray Image Assessment System From the advent of the X-ray system by Roentgen, a German scientist, this system has been utilized to picture the various body parts for the purposes of diagnostic. In the X-ray tubes, an electron is produced in the cathode via the thermal processes and effectively accelerated via a Potential Difference (PD) of 50–150 kV. An electron hits an electron to effectively produce the X-rays. However, only 1% of power is transformed to X-rays while the remaining energy is transformed to heat [11].
5.1.1
X-Ray Tube
In an X-ray machine, the pictures are projected based on the two-dimensional design for the evaluated part of the body. The fluoroscopy framework is utilized to effectively scan any moving organs. As such, the obtained image can be communicated, stored and displayed via various machines. Computing Radiography (CT) utilizes image receptors to effectively project pictures. X-ray includes a screen enclosed with housing phosphor devices. Mammography pictures are utilized to compare the various illnesses in the breast tissue. The image assessment of mammography makes use of minimal energy contrasted to bony structures image. PD ranges applied include 15–40 kV.
5.2 Computing Tomography (CT) In modality, pictures are projected in multi-dimensional aspects instead of conventional radiography. The CT scanner projects multiple body slices of tissues of the body in various directions. In the CT scanners, patients are included in scanners and apertures through the rotation of an X-ray tube in various directions in Fig. 1.
Survey of Image Processing Techniques …
805
Fig. 1 Image of the computer tomography scanner
5.3 Nuclear Medicine This form of image assessment modality makes use of radioisotopes to project pictures of functions of various structures like liver, kidney and heart. Radioisotopes are labelled by the pharmaceutical segments and materials that are used to control the various body organs. The photons produced by patients are collected in the detectors and changed into signals which are then converted to interpretable digitalized pictures. There are various forms of nuclear medicine scanner modality such as positron, tomographic and planar emission. The planar emissions project two-dimensional image for both positron and tomographic emission [12] shown in Fig. 2.
5.4 Ultrasound Ultrasound represents the method that uses high-frequency sound waves to effectively produce an image of internal body structures from the returning echoes. This method is the same as the region detection method that is used by animals such as whales and bats in the ecosystem. With this method, a high-frequency pulse is transmitted throughout the body using transducers as a travelling wave via the tissues of the body. A number of these waves are therefore reflected and absorbed back. The reflected form of a wave is accepted by these transducers and changed to electrified signals. These electric signals are changed into a digitalized aspect and transformed into the main computer system. The computer systems utilize logic and arithmetic calculation to project two-dimensional pictures of every scanned structure. In the ultrasound system, a wide-range of pulses is communicated per millisecond [13]. There are various image assessment methodologies utilized to boost the ultrasound pictures shown in Fig. 3.
806
A. Haldorai and A. Ramu
Fig. 2 Image nuclear medicine
Fig. 3 Ultrasound image diagram
6 The Methodological Basics of Digital Processing of Pictures Digitalized pictures are categorized in reference to their individual qualities such as signal-to-noise ratios, entropy, contrast, illumination ratio. In this case, a histogram provides a simpler method of processing pictures. With this method, pictures are
Survey of Image Processing Techniques …
807
displayed and do not change the quality of projected pictures. Grayscale histogram considers the most basic form of pictures which are therefore utilized to enhance the quality of the projected pictures. The method produces a scheme that indicates the values of the pixels and their various regions. As such, the Grey-level indicates if the images are either bright or shady. As for the mean pixel figure, it is possible to find it using the summation of the projected pixel values and the bin altitudes which are therefore subdivided by the definite base. The method is effective through the transformation of the histogram to being balanced, identical and smooth. Mean valuation of the centralized pixel intensity is effectively designated to an ideal form of brightness. Image intensity either below or above, makes all the pictures to be brighter or darker. The Signal-to-Noise Ratio (SNR) of the pictures is utilized to connect the various levels of the anticipated forms of signals to the various contextual levels of image signals [13]. SNR is considered as a ratio of signals and intensity to the intensity of noises. It calculates pictures in an up-front methodology. The intensity mean of pictures is calculated as the mean square of the pixel values, as seen in the expression below. S N R = P Signal Pnoise
(2)
Whereby P is considered as the mean power.
6.1 Image Assessment Enhancement The enhancement of pictures represents a technique utilized to enhance the perceptibility and quality of pictures through the application of computer-aided application software. The methodology incorporates both subjective and objective advancements. As such, the technique includes localized operations and points. In the localized operation, there are district input pixels and values. The enhancement of pictures is determined in two various forms: transform and spatial domain technologies. The spatial methodology operates more directly on the level of pixels whereas transformation methodology operates on Fourier and the spatial methods.
6.2 Image Assessment Segmentation The segmentation of pictures represents the methodology for segregating pictures into various segments. The basic purpose of this form of segregation is to execute pictures in a manner that can easily be understood and interpreted while maintaining its quality. This methodology labels the pixels in reference to the characteristics and intensity. These segments signify complete original pictures and attain their features such as similarity and intensity. The segmentation of pictures and techniques are utilized in the creation of 3D contours in the structure of the body for medical
808
A. Haldorai and A. Ramu
purposes. The aspect of segmentation is utilized in the perception of machines, tissue volume analysis, malignant illness analysis, anatomic and functional evaluation, virtual reality, 3D rendered methods and anomaly evaluation in the detection of objects and pictures. The segmentation of pictures is grouped into Localised Segmentation and Globalized Segmentation. In the localized segmentation, there is a single subdivision of pictures. This methodology incorporates a few pixels that are contrasted to the globalized form. The globalized segment operates in the complete image as a single unit. This methodology includes more number of pixels that can effectively be manipulated. Segmentation is classified into methods: Boundary, Edge and Regional Methods.
6.3 Image Segmentation Considering a Threshold The threshold segment is dependent on the value of thresholding to transform the grey colour to either white or black. There are various techniques functional in radiology to effectively replace and rebuild pictures like k-means and Otsu’s methodologies. The threshold aspect is utilized for the establishment of borders of solidified objects in the dark foreground and background. The threshold technique is essential for the establishment of the solid object borders in the darkened background. These techniques require the availability of the variations between the backgrounds and object intensities [14]. Threshold techniques appear in three forms: histogram-built, adaptive and global-selection which is broader and utilized for the various segmentation methodologies. The globalized threshold ∅ is calculated using the binary process according to the expression below. f (x) = {1iff(m, n) ≥ ∅o fx = 1 iff mn ≥ ∅o}
(3)
In the fixed or adaptive threshold segments, an image is faster compared to the parts of interest and includes novel intensity that explains the variation from the image background. The limitation of this technique is its incapability to process or simplify multiple channels of pictures.
6.4 Image Segmentation Considering Edge Detection Edge detection defines the technique of segmentation which utilizes the recognition of borders of strictly connected regions and objects. This method denotes the discontinuity of regions and objects. It is used critically in image evaluation and the identification of various image parts where a significant variation of image intensity happens.
Survey of Image Processing Techniques …
6.4.1
809
Types of Edge Detection
Robert’s Kernel Robert’s Kernel is a methodology utilized to determine the variation between two image pixels. Precisely known as forwarding variation and used to identify edges in highly noised pictures. It is evaluated using the first-order fraction derivative and the cross gradient operators as shown in the equation below. ∂ f ∂ = f (i, j− f (i+1, j+1)∂ f ∂ x = f i j − f i + 1 j + 1
(4)
∂ f ∂ x = f (i, 1, j) − f (i, j + 1)∂ f ∂ x = f i + 1 j − f i j + 1
(5)
The fraction derivative can also be applied in the two 2 × 2 matrices. In this condition, Robert’s masks are evaluated, as shown in the equation below: Gx = {−1001} and Gy = {01 − 10} Gx = −1001 and Gy = 0 − 110
(6)
Prewitt Kernel This methodology is centered on the ideology of centralized variation. This methodology is more effective compared to Robert’s operators. Assuming that the matrices are categorically arranged in pixels, Eq. (7) will be produced. ≡ a0a7a6a1[i, j]a5a2a3a4 ≡ a0a1a2a7[i, j]a3a6a5a4
(7)
The fraction derivative of the Prewitt operators is expressed as shown in the equation below. Gx = (a2 + ca3 + a4) − (a0 + ca7 + a6)Gx = a2 + ca3 + a4 − a0 + ca7 + a6 (8) In the above equation, c is a constant and shows the pixel encrypted at the image center. GyGy and Gx Gx represent the expression at [I, j]. Whenever c is equated with 1, the Prewitt operator is expressed as the equation below. Gx =≡ −101 − 101 − 101 ≡ and Gy =≡ −1 − 1 − 1000111 ≡ Gx = − 1 − 1 − 1 − 111 and Gy = − 101 − 101 − 101
(9)
810
A. Haldorai and A. Ramu
Sobel Kernel The Sobel Kernel method can be reliant on a centralized variation that acts on the centralized pixel in the image average. This method can be expressed as a 3 by 3 matrices, to an initial Gaussia Kernel derivative. This methodology is calculated as Eqs. (10), (11), and (12). Gx = (a2 + 2a3 + a4) - a0 + 2a7 + a6 Gx = a2 + 2a3 + a4 - a0 + 2a7 + a6
(10)
Gy = (a6 + 2a5 + a4) − (a0 + 2a1 + a2) Gy = a6 + 2a5 + a4 − a0 + 2a1 + a2
(11)
The Sobel mask is expressed as shown below. Gx =≡ −101 − 201 − 101 ≡ and Gy = −1 − 2 − 1000121 ≡ Gx = −1 − 2 − 1000111 and Gy = −101 − 202 − 101
(12)
The Sobel model is considered effective compared to Prewitt in the reduction aspect of noise. This methodology is utilized in the functional aspect of image modality i.e. nuclear medicine [15]. In the research of red blood cells and pictures, the unveiling of near neighbouring cells is termed challenging as a result of background noises. This influences the interpretation procedure which makes it challenging to effectively diagnose physicians. The process of segmentation might also mitigate the challenges identified in the red blood cells.
7 Conclusion and Future Scope The proposed work provided an analysis of medical image assessment based on the application of image processing methodologies. The pictures provide a method of expression of information in a pictographic manner. They consist of different minor elements known as image pixels whereby every pixel includes a certain value and position. Geometric pictures represent a picture arithmetically with the geometrical primitives like the lines. Every image has been saved in a certain file format that incorporates two segments namely, data and heading. Image processing methods represent a classification of techniques used in the process of image handling using a computer system. The main purpose of image segmentation describes the partition of pictures into essential image portions. Localized segments handle the partition of these pictures into minor segments in the individual image whereas globalized
Survey of Image Processing Techniques …
811
segmentation handles the assembly of the partitions. In the future, medical experts and researchers should focus more on applying artificial intelligence in the image assessment of body part parts. This technological aspect is useful for mitigating a wide range of image assessment problems and enhancing the delivery of patient results. Sophisticated software should be used to evaluate significant amounts of pictures and data to create algorithms based on image segmentation and processing. Artificial intelligence can effectively be applied to retrieve novel patient data and predict potential body disorders in patients. As a result, it allows radiologists to easily identify the body organisms that require further analysis.
References 1. K. Aoyagi, Medical image assessment apparatus, ultrasonic image assessment apparatus, magnetic resonance image assessment apparatus, medical image processing apparatus, and medical image processing method. J. Acoust. Soc. Am. 133(5), 3220 (2013). https://doi.org/ 10.1121/1.4803793 2. B. Lelieveldt, N. Karssemeijer, Information processing in medical image assessment 2007. Med. Image Anal. 12(6), 729–730 (2008). https://doi.org/10.1016/j.media.2008.03.005 3. C. Hung, Computational algorithms on medical image processing. Curr. Med. Image Assess. Former. Curr. Med. Image Assess. Rev. 16(5), 467–468 (2020). https://doi.org/10.2174/157 340561605200410144743 4. F. Aubry, V. Chameroy, R. Di Paola, A medical image object-oriented database with image processing and automatic reorganization capabilities. Comput. Med. Image Assess. Graph. 20(4), 315–331 (1996). https://doi.org/10.1016/s0895-6111(96)00022-5 5. Y. Chen, E. Ginell, An analysis of medical informatics and application of computer-aided decision support framework. J. Med. Image Comput., (July) 10–17 (2020) 6. M. Heng Li, M. Yu Zhang, Computational benefits, limitations and techniques of parallel image processing. J. Med. Image Comput., (July) 1–9 (2020) 7. A. Khusein, U.A, Clinical decision support system for the activity of evidence based computation. J. Med. Image Comput., (September) 50–57 (2020) 8. P.L. Aaron, S. Bonni, An evaluation of wearable technological advancement in medical practices. J. Med. Image Comput., (September) 58–65 (2020) 9. K.K, P.E.P, Web based analysis of critical medical care technology. J. Med. Image Comput., (September) 66–73 (2020) 10. A. Haldorai, S. Anandakumar, image segmentation and the projections of graphic centred approaches in medical image processing. J. Med. Image Comput., (September) 74–81 (2020) 11. P. Jannin, J. Fitzpatrick, D. Hawkes, X. Pennec, R. Shahidl, M. Vannier, Validation of medical image processing in image-guided therapy. IEEE Trans. Med. Image Assess. 21(12), 1445– 1449 (2002). https://doi.org/10.1109/tmi.2002.806568 12. K. Drukker, Applied medical image processing, second edition: a basic course. J. Med. Image Assess. 1(2), 029901 (2014). https://doi.org/10.1117/1.jmi.1.2.029901 13. P. Jannin, Validation in medical image processing: methodological issues for proper quantification of uncertainties. Curr. Med. Image Assess. Rev. 8(4), 322–330 (2012). https://doi.org/ 10.2174/157340512803759785 14. M. Goris, Medical image acquisition and processing: clinical validation. Open J. Med. Image Assess. 04(04), 205–209 (2014). https://doi.org/10.4236/ojmi.2014.44028 15. T. Aach, Digital image acquisition and processing in medical x-ray image assessment. J. Electron. Image Assess. 8(1), 7 (1999). https://doi.org/10.1117/1.482680
An Analysis of Artificial Intelligence Clinical Decision-Making and Patient-Centric Framework Anandakumar Haldorai
and Arulmurugan Ramu
Abstract The smart decision-making support framework is typically referred to as artificial intelligence (AI). The clinical decision framework can transform the process of decision-making using various technologies. These technologies incorporate framework engineering and information technology. The vital centered ontologycentered automatic reasoning which is incorporated in machine learning methodologies has been established in the present patient databases. The approach evaluated in this paper is in the support of the interoperability between various health information systems (HIS). This has been evaluated in sampling implementations that link up to three separate databases: drug prescriptions guidelines, drug-to-drug interaction and patient information which are databases used to showcase the efficiency of an algorithm used to provide effective healthcare decisions. Generally, the possibility of artificial intelligence was evaluated in the process of supporting tasks that are essential for medical experts including the aspect of coping up with noisy and missing patient information and enhancing the utility of various healthcare datasets. Keywords Artificial intelligence (AI) · Health information systems (HIS) · Medical decision support system (MDSS) · Intelligent decision support system (IDSS)
1 Introduction The medical decision support system (MDSS) maps the data for patients in the process of determining diagnostic and treatment causeway. The technological segment of this system has been showcased in different medical settings which will be discussed in detail in this research paper. The aspect of smart decision-making is gradually expanding because of advances evident in artificial intelligence and A. Haldorai (B) Sri Eshwar College of Engineering, Coimbatore, Tamil Nadu, India A. Ramu Presidency University, Yelahanka, Bengaluru, Karnataka, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_62
813
814
A. Haldorai and A. Ramu
system-centric ecosystems which have the capacity to deliver fundamental technologies required in decision-making. The process of coordination and communication between dispersed frameworks can be capable of delivering actual-time information, time processing, collaborative ecosystems and globally updated data to enhance the process of decision-making. During the duration, the technology of artificial intelligence and fundamental techniques has proved to be vital in the process of ensuring computational aid to users of practical applications. In this analysis, the system that can initiate the medical decisions based on the application of partial information about artificial intelligence. The properties of the information which includes heterogeneity, representation, interoperability and availability are significant in the process of assuring the application of MDSS [1]. The process of decision-making must apply various essential datasets from the various distributed frameworks other than one form of information source meant to maximize its efficiency. However, actual-time healthcare decisions are normally centered on incomplete data because of the potential issues posed by the features or properties of data synthesis. Various artificial intelligence (AI) methods which include learning-centered methods and knowledge-centered methods have been applied to handle any form of data challenges which potentially creates a practical and robust MDSS. Even though the past methods have been reported to have partial significance, just a few of them have been deemed successful in the actual-world healthcare setting. The knowledge-centered system is vulnerable to losses when it comes to the evaluation of patients’ performance since the data might be considered incomplete. For instance, the elimination of patient data or accessibility restrictions from visualizing healthcare records stops contrary to the decision of the learning-centered system which cannot be easily explained. There are some issues to differentiate the causation and correlation of these systems when making proper healthcare decisions [2]. The system analyzed in this research is a crucial format for the advantages of machine learning, logic-centered inferences and structural representation. Apart from that, the actualworld data which is essential for the provision of intelligent and robust decisions in the healthcare sector irrespective of the complexity over healthcare relationship and the dependency evident in healthcare decisions. Although the past form of machine learning has been deemed to have some disadvantages, our hybrid form of machine learning architecture provides decisions that can be explained and verified critically based on missing data. To evaluate the framework effectively, the raw patient data has been represented based on the application over ontological aspects replaced by structured triple stores. Inference guidelines have been established by the domain professionals and incorporated based on the application of semantic reasoning meant to develop proper decisions in the healthcare sector. As a result, the decisions produced by the framework are easy to explain and validate. However, the resultant knowledgecentered framework necessitates complete data that limited its effectiveness in the actual world. The limitations are overcome through the administration of the semantic reasoning using machine learning methods that compute values of missing information. The imputation concepts are produced in the pre-processing stages before being integrated into the ontological framework, hence allowing the performance of the system
An Analysis of Artificial Intelligence Clinical Decision …
815
during an actual time. In that case, patient-centric and evidence-centered decisions are made in the healthcare support system. The proof of the concepts and implementation system incorporates three vital sources of data, namely massive actual-world datasets for patient healthcare data, medicinal interaction registry and the gathering of healthcare prescription protocols [3]. The preliminary findings evaluated in this paper ascertain that practical healthcare cases, where the data from patients are incomplete or missing, are based on the knowledge-centered methods and the hybrid design system dependent on machine learning. The process of decision-making in the healthcare sector is as critical as a human procedure in the process of interacting with patients in the actual world. The healthcare practitioners initiate both poor and good decisions which make it essential for researchers to debate the most applicable means of supporting the healthcare sector in the process of making effective decisions. One of the best ways of characterizing decisions is to focus on comprehending the manner in which healthcare practitioners will be assisted to effectively categorize decisions as unstructured, structured or semistructured. Structural decision issues are known to have a specific optimal remedy which means that they do not require a specifically designed decision framework. For instance, an exact remedy can be applied by practitioners to effectively make a proper decision regarding the most resized causeway between two points as for the unstructured decision issues. There are no certain solutions or criteria to apply which means that the kind of preferences considered is entirely upon the decision-makers. For instance, determining an individual’s partner might be deemed to be an unstructured decision. Visualizing between the two forms of problems, it is possible to encounter a wide range of semi-structured issues that are known for their acknowledged parameters that might as well require the preference and input of humans in the process of making proper decisions using a specific method. For instance, an organizational decision concerning the choice of expanding the organization to an international market might be considered as a semi-structured decision. Semi-structured decision issues are therefore amenable to the decision-making framework that is attached to the interconnection of user’s interaction with the analytical techniques meant to develop alternatives concerning the optimal remedies and methodologies. When artificial intelligence methods are applicable in the enhancement of fundamental healthcare alternatives, the resultant system is considered as an IDSS. Data analysts have also reminded us that comprehensive acknowledgment of decisions is required for the efficient utility of artificial intelligence in the healthcare sector. Moreover, AI is focused on limiting human decisions to develop them in some capacity and fundamental advances. With that regard, this paper will focus on decision-making protocols and DSS which are centered on the comprehension and concomitant applications of artificial intelligence methodologies to establish firm IDSS. Several literature assumptions have been made about the human decision-making process which critically recognizes and explains the keepers in the process of decision-making. In the DSS, effective decisions are featured by an element of critical reasoning based on the application of distinctive human characteristics for selecting optimal alternatives based on a
816
A. Haldorai and A. Ramu
specific criterion. Many forms of reasoning are being showcased through analytical methodologies, and in that regard, it might be embedded into the system of intelligent decision support system (IDSS). Actually, not all segments of decisionmaking frameworks are embedded in the IDSS system [4]. In some other segments, the stimuli of recognition alternative might result in a specific decision action as a mastered response without any form of identifiable reasoning framework. The process of comprehending the decisions in response to some healthcare systems does not entirely focus on the aspect of reasoning. Several research findings done by researchers have concluded that decision-making is mostly initiated by emergency response departments or healthcare emergency services.
2 Literature Review The literature works focus on application open matching conditions, where the experience of human beings and their immediate responses are required to enhance effective decisions in the healthcare sector. In such scenarios, the decision support in case applicable should provide essential data that could focus on the critical element of human processing for decision-making. Physiologically, the capability to enhance the process of decision-making is termed focus on the preferred lobes in the brain system, where critical decisions are made. Any faults in this area amount to irrational decisions and ineffective evaluation of risks. The process of decision-making is considered to be affected by emotions inscribed in the neural system which might be both unconscious and conscious. In the most recent analysis, IDSS has shunned the capacity to model effective features like the emotions required to enhance the process of decision-making, even though effective inclusion of such emotions in machine learning is a future concern. Some researchers have also pointed out that operating the memory for cognitive functions is essential in the process of decisionmaking. The intelligent data processing frameworks such as an IDSS harbor stone memory technologies, symbol reasoning, and the capability to interpret and capture stimuli, IDSS has the fundamental capacity to copy the decisions made by humans [5].
2.1 System Evaluation and Implementation The system of human decision-making underpins fundamental frameworks that have been proposed for IDSS and DSS. An initial framework for decision-making has also been offered by several researchers which are mostly referred to as the subjective expected utility framework or the expected utility framework. Researchers assume that in case decision-makers focus on several outcomes preferences or steps when satisfying a certain postulated decision, then the decision-makers can lead the functionality of outcomes being maximized for a specific form of decision probability.
An Analysis of Artificial Intelligence Clinical Decision …
817
The researcher’s theory typically is utilized to suggest actions and decisions that will be maximized in the application of decisions provided that probabilities have been provided for specific events. In that case, the researcher’s advanced framework of decision-making has some uncertainties. There are a lot of criticisms about such theories; certainly, the conclusion that decision-makers might evaluate critical consequences of their actions acquires knowledge about the future happenings without any form of connected probabilities. Nonetheless, one of the contributions of the theory was to enhance the separation of actions, outcomes and events. To effectively evaluate the recommended system formulated a proof of concepts and implementation concentrated on the knowledge management wiry and component execution from the present ontological decision systems. The insomnia is cured by selecting the channel of inquiry and utilized some actual-world datasets. 1.
2.
3.
The records of patients extracted from the center of disease control are based on the behavioral risk factors (BRF) and surveillance system for 2010. The behavioral risk factors include a wide assumption of respondent data such as location sex, rest and age including information about medical statuses such as asthma, diabetes, mental illness and cancer. Several behavioral risk elements including the consumption of alcohol, sleep deprivation and drug usage were evaluated. All the crucial data stored and recorded were done in our structured format in a relation and database. The application protocols extracted from the Mayo clinic were also utilized as professional decision-making protocols that correspond to the prescription protocols for various medical issues such as sleep disorders. Drug interaction registration was done to identify the interaction of drug-to-drug medical application. Practitioners use an ontological graph based on the record and information of patients. The figures included in the ontological framework represent the vital concept used in the evaluation of the patient relationships. The figures also demonstrated the application of inference protocol to effectively map raw information into ontological concepts.
2.2 Skill Management Concept Researchers argue that to effectively instantiate the skill management concept in the designing of the decision-making system, it is critical to establish a simplified ontological model to define essential key concepts and different patient relationships. The formulated inference protocols about the BRF codes that are defined by the semantic values of various data attributes in the process of transforming numerical BRF information into the concept of patients. The protocols have been applied in the process of recording data to formulate semantic knowledge storage of BRF data.
818
A. Haldorai and A. Ramu
2.3 Query Execution Concept To effectively instantiate the query execution concept, researchers have linked a semantic reasoning segment known as “Euler Proof Mechanism” together with the WEKA machine learning system to establish the execution component. Semantic form of reasoning has provided a critical mechanism element for the logical best decision-making aspect in the system, where WEKA has enhanced the process of supporting the mandate to impute missing information. A subset of sleeping aids was identified for the researchers to be applied in the Mayo clinic protocol to effectively establish the condition that should be focused on when prescribing medicines. Utilizing the ontological framework, this data has been transformed into inference protocol for the process of decision-making. A family physician from proximity aided in the selection of different drugs which have been validated in the Mayo clinic sleeping aid prescription framework [6]. Even though the inference protocol has been maintained as simple, this does not reflect the actual healthcare consideration for the sleeping aid prescription. The generics segment of the resultant protocol has been explained below. However, several interactions have been verified by global healthcare practitioners. These include: 1.
2.
3.
Drug-to-drug interaction protocol: In case patients are presently on a specific drug D1 and that drug cannot be provided with the drug D2, patients cannot be provided with D2. Drug-to-condition interaction protocol: In case patients are reported with a specific medical condition C, under specific drug D reported to contradict a condition, patients cannot be provided with drug D. Drug–to-disease interaction protocol: In case patients are reported with the illness E and drug D is reported to have some contradictions to the disease, patients might not be given drug D.
To effectively mitigate the problem of missing values in the records of patients, the formulated classifiers based on the application of machine learning in the process of predicting values for the missing information fields. Apart from that, these classifiers were trained effectively to predict every attribute considering all the complete information from the BRS datasets in the process of training. For instance, in case of the sleeping aid, Estazolam is not capable of prescribing elderly patients healthcare practitioners need to determine the ages of patients. To focus on this principally, all the codes for the patients were extracted in the BRF set, where the ages have been confirmed before partitioning the subsets into training data and validated data [7]. The classifiers are then built based on the application of validated data which can be used to provide an estimated valuation of the classifiers for the process of initiating decision-making for future works. In the future instances, where the age of patients has been missing, practitioners will apply classifiers to effectively label the ages of patients as young or old. The projected value is then substituted into the records of patients based on semantic reasoning which is done progressively. The decisionmaking confidence through semantic reasoning is centered on the point estimates evaluated in this article.
An Analysis of Artificial Intelligence Clinical Decision …
819
3 Experiment Evaluation and Comparison To effectively evaluate the operation of the system, the proposed hybrid decisionmaking framework was reviewed and experimented. Patients who are provided with sleeping aid at the Mayo clinic prescription center have been provided whether positive exemplars, such as labels against the negative exemplars. Whenever a specific system has been labeled, patients can effectively respond to specific queries that are either true negative (TN) or true positive (TP). Moreover, the false negative (FN) and false positive (FP) are some of the cases produced by the system. The findings are evaluated as follows: (a) (b) (c)
Sensitivity: This applies to the rate of positive exemplars that have been labeled as positive. Specificity: This applies to the rates of negative exemplars that have been labeled as negative. Balance form of accuracy: This principle computes simple averages of sensitivity and specificity as shown in Eq. 1.
Specificity =
tp Specifity tn Sensitivity = BalAcc = tn + f n tp + f n Sensitivity
(1)
3.1 Machine Learning-Centered Framework To effectiveness of our hybrid framework focused against the learning-centered framework, and it used to evaluate the condition and performance of various machine learning algorithms of the BRF datasets. Using the four algorithms AdaBoost, bagging, C4.5-R8 and decision stamp, it was possible to formulate 50 various random chosen training sets with various sizes of 2500–5000 exemplars. This data was utilized to evaluate the framework based on the elementary selection of algorithms to minimize different sets for data attributes. For every algorithm, a predictive classifier were trained and the model for each sleeping aid in this case which is meant to effectively predict if the patients can be provided with sleeping aid or not. The patients have been trained to execute outputs of skill-centered systems concerning the datasets of patients. The fundamental truth concerning every record of data based on the application of semantic reasoning since skill-centered decisions can be considered as 100% accurate compared to predictive accuracy for every ML algorithm. The AdaBoost algorithm is known for its best performance over the entire four algorithms. In that case, a system of algorithms is compared to the AdaBoost-derived classifier. In this case, the general accuracy of the algorithm when projected with the most precise medical decision is minimal from the scale of zero to one which is just the same as
820
A. Haldorai and A. Ramu
the learning-centered system, projecting that there might be some forms of missing data in the system. Other than the poor form of performance in the machine learning segment, it can be projected and be tolerant of missing information. The implication of the missing data concerning the performance of ML through the process of removing known figures from the records of patients was analyzed [8]. Moreover, the £ as the average figure of the attributed values was evaluated to eliminate the patients’ records and varied £ from the average of 1 which is the eliminated value in a single record of 6 that has been removed in each record. As for every £ record, analysts have trained the AdaBoost-centered classifiers based on the application of 50 sets of five thousand exemplars from the partially—DOI information. As a result, the implication of the missing £ was evaluated concerning the performance of the algorithm based on the aspect of machine learning.
3.2 Skill-Centered and Hybrid-Centered Systems Finally, a comparison is performed to measure the hybrid frameworks with the machine learning systems, including the skill-centered system which is free from the imputation capacity. The process of the skill-centered reasoning is performed using Eulersharp, and the algorithm-based classifier is used for the process of machine learning and the evaluation of critical elements. There are four datasets are chosen in the process of evaluating the critical values of £. For every £, we evaluated and measured the skill-centered decision process and trained the algorithm-based classifier to effectively predict the values of patients projected through the process of machine learning. The semantic form of reasoning is therefore re-evaluated before initiating any forms of decisions in the system. Throughout this process, the hybrid decision-making process experiences some form of performance degradation in the process of balancing the level of accuracy since £ increases (which represents an increment of £ to affect the decreasing performance that is less than 1% a point) [9]. Nonetheless, the performance of the skill-centered decision-making framework model degrades with the same range as £ typically increases by 0.5 in £ due to the diminishing performance of approximately 4%. Generally, the hybrid framework attains a significant form of balancing accuracy which shows that its recommendation for the healthcare decision-making process can be considered to be effective.
3.3 Standardized Imputation Techniques The evaluation of datasets with unverified or missing information is a common issue that is globally studied in the field of statistics. In this field, multiple imputation (MI) methods are normally applied. During the process of performing MI, every unverified or missing data is imputed several times through the process of drawing
An Analysis of Artificial Intelligence Clinical Decision …
821
featured values from the predictive form of distribution. As such, this amounts to the collection of imputed sets of information. Every imputed dataset in a single shape is considered as an original form of datasets, whereas its non-missing figures are identical to the original forms of information since the missing and unverified datasets are computed differently for every imputed version. The grouped imputed datasets might be utilized to project unbiased estimations of summarized statistics such as regression and means coefficients, including the statistically verified confined datasets in the field of statistics. It should be noted that the main obligation of producing an accurate summary of these datasets might vary from our mission of accurately projecting the unverified and missing data of individuals in the information set. However, for completeness, this article has evaluated the application of MI in the decision-making system. Here, the famous MI method, namely Bayesian MI is known to assume a particular joint probability framework over some form of feature values before projecting imputed information off the posterior form of distribution of these sets of missing information for the observed sets of data [10]. This methodology was applicable in the medical survey evaluation over the past few decades. The mix opening is used as the sourced package for the R software ecosystem to effectively test the off-the-shelf capacity of these techniques. The mixture is composed of several limitations which affect the overall performance: It might not be capable of using more than 30 features considering that BRF is composed of more than 400 features, it operates generally slow, and it might not be capable of using the features characterized by massive data missing in the set. To effectively test the mix, it can be decided to choose features to structure imputed dataset. It should be noted that due to the modeling assumption which is inherent in the Bayesian MI, it might be difficult to facilitate the separation element of selection during the process of predicting various features. It is possible to execute the mix on various dataset portions, and the procedure is not considered that straightforward since it had multiple flaws in translation between various formats of datasets. As a result, it was difficult to effectively explain the implications of utilizing the mixes for imputation in the ecosystem, other than noting that this mix might be projected in the future. As such, future researches might focus on the evaluation of the constructions of effective issue-specific versions of these mixes which are essential for application in a secure decision-making ecosystem. Moreover, our projected hybrid framework provides substantial performance merits over other alternatives in the absence of unverified and missing datasets in machine learning frameworks and the availability of skill-based systems in MI. Consequent to that, the framework shows a robust remedy to the issues of partially unverified and missing data in the decision-making systems DOI in the healthcare sector.
822
A. Haldorai and A. Ramu
4 Discussion In this article, the actual-world issue was analyzed such as helping medical practitioners to execute proper decisions about the present patients’ data and the application of best practices encoded in the protocol-based case on unverified and missing data. Healthcare experts consider this issue to be a prevailing concern for solutions to be adopted (especially with patients misrepresenting and omitting their present medical profiles). As a result, the AI methodologies can be a significant advantage to effectively address this prevailing concern to categorically yield accurate healthcare advice effectively for patients dependent on traditional probability reasoning in isolation and delivery of tasks. To initiate hybrid construction, a certain hybrid construction framework was evaluated and presented for healthcare decision support engines. Our projected framework is capable of processing the queries of users critically based on the application of logic-centered reasoning and utilizing ML inference frameworks to handle unverified and missing data. This technique has distinctive merits with different findings depending on the end users which might also be verified for correctness by third parties because answers are focused on logic assumption. Even though our validation approach utilized a certain sleeping aid prescription case, the system is more generic to be applied in other healthcare facilities. To effectively build a remedy around this issue serving various issue domains, an issue specific to the ontological framework for information representation has to be evaluated, including the professional inference protocols for the processes of decision-making. After that, the ML algorithm which operates effectively with certain datasets can be utilized to predict unverified and missing data direct from raw information. Once the founding primitives have been evaluated, the framework construction can be considered uniform to the kind projected in this paper. In other related works, there is a significant deal of interest in the application of ML methodologies for medical decision-making support framework. For this aim, other methodologies are ontological to our own. These works project a comparative assumption of two ML methods over the present decision-making procedures based on the application of the medical assessment protocol [11]. Their findings show definite merit of utilizing ML algorithms. Nonetheless, they have showcased that ML methods alone presented a significant amount of false negatives and false positives.
4.1 The Decision Support System (DSS) DSS refers to the wide-range interactive computing frameworks which aid decisionmakers and experts to use models, knowledge and data to mitigate structured, semistructured and unstructured issues. The individuals initiating the decisions are a part of the system. In that case, DSS incorporates the capacity to permit these decisionmakers to choose single or multiple input selections, drilling explanations, querying
An Analysis of Artificial Intelligence Clinical Decision …
823
systems, examining outputs and interacting with the various networking devices. Since a lot of DSS systems have been established to mitigate a certain issue or a specific segment of issues, there are various forms of DSS that have been specialized for various forms of problems and users. The system can also be formulated for a single or multiple decision-makers which might be applicable in the process of supporting crucial decisions made in the range of creative to managerial problem-solving. Different terms have been incorporated to DSS such as group DSS, collaborated DSS, expert DSS, medical DSS, intelligent DSS and adaptive DSS among others in the process of attempting to capture the merits of DSS or the individuals requiring it. The single-user DSS is typically considered as a procedure of making proper decisions and incorporates components for designing, imputing and choosing specific decisions. The collaborative or grouped DSS is still in their early development because theories of collaborative decision-making processes for humans are still an emergent issue. As a matter of fact, DSS to support idea generation and system innovation is still a prevailing concern with fundamental theories of human creativity still in progress. The schematic of a typical DSS structure shown in Fig. 1 incorporates the decisionmaking process as a key segment of the system. Inputs incorporate the skill base, model base and database aspect. In the database, there is data that is essential for the process of decision-making, whereas the skill base incorporates the guidance required in the selection of proper alternatives. The model base considers the formal framework, techniques and algorithms that possibly establish potential outcomes and identifies the most effective remedies under a specific constraint. The responses from the processing stage might showcase more inputs which are typically updated in actual time to enhance the process of solving potential issues. Outputs might also be generated to project explanations and forecasts meant to justify crucial recommendations that project potential advice. As such, results can be given to decision-makers that possibly use the system to project essential data or queries. Over the past few decades, the terminology “decision support” has been used widely to incorporate other decision-making assistance aspects such as skill management systems, organizational intelligence and data analytics which consider the essential interaction of decision-makers. AI characteristics are typically utilized and initiate observations from a massive, distributed aspect of big data. These frameworks can therefore incorporate the personalization of decision-makers and preferences which emulate the decision of humans. Moreover, the frameworks project firm novel tools meant to deal with complex issues and emergent trends for the future of AI.
4.2 Evaluation of IDSS To effectively evaluate IDSS, or typically any form of DSS, it is essential to comprehend the opportunities and advantages for boosting the performance of the system. In this attempt, a primary assumption of the research concerning the fundamental purpose of decision-making was evaluated and presented in a specific healthcare
824
A. Haldorai and A. Ramu
Fig. 1 Arrangement of DSS
facility. According to the literature research, one system feedback criterion to evaluate the DSS success is based on the enhancement of speed and boosting the efficiency aspect of decision-making. The potential outcomes might be linked to tangible advantages such as the reduction in revenue and the increment in costs. Nonetheless, a closer evaluation of results showed that the process feedback has also been considered as the merits of IDSS and DSS. Moreover, according to the theory of Simson, the procedure of decision-making includes implementation, choice, design and intelligence [12]. The system of IDSS might, for instance, perform database evaluation and assessment in reference since the systems perceive the requirement of decision-makers or aid the users in the process of choosing proper variables in the designing phase shown in Fig. 1. As for the instance, where the general outcomes are not transformed, the decisionmakers have an effective comprehensive ability of the decision-making issue based on the application of IDSS. In that case, the evaluation of IDSS and DSS might be considered as a multiple format and criterion, based on its assessment to improve design and
An Analysis of Artificial Intelligence Clinical Decision …
825
guide decision-makers on how systems should be utilized. Multiple criteria assessment of data systems has to be evaluated further in the future works since several DSS and IDSS have been deemed to report some essential challenges. The users of these systems follow the application of trade-offs based on the performance and goals of utilizing multiple programming techniques. Several researchers have applied these approaches to DSS and professional systems with multiple criterion approaches that include technical, empirical and subjective techniques. The subjective methods incorporated sponsors and users’ perspectives, technical methods, analytical techniques and the effectiveness of the system while the criterion method incorporated the performance of the system with or without the involvement of humans. Apart from that, other researchers have evaluated numerous data systems with two essential categories of performance measurement aspects: efficiency and effectiveness. The three crucial methodologies and criteria that are evaluated by other researchers are as follows: the overall system protocol meant to assess how output adds to the accomplishment of fundamental organizational goals, effectiveness to evaluate the efficiency of resources, inputs in attaining outputs and efficacy in the process of evaluating how systems produce the potential results. In this case, efficacy represents the value-centered perspective.
4.3 The Intelligent Decision Support System (IDSS) The intelligent DSS typically known as IDSS makes use of AI methodologies to effectively improve and enhance the process of decision-making in the healthcare sector. Tools used in AI include fuzzy logic, evolutionary computing, case-based reasoning, intelligent agents and artificial neural network. All these tools when incorporated with DSS are a powerful aid for the healthcare sector and can be assured to categorically solve problems of massive datasets and complex reasoning in actual time [13]. IDSS is gradually turning up to be significantly important and also the practical application since it applies essential AI methods. The applications range from the medical support system to initiating proper decisions with the capacity to enhance proper decisions. Researchers have pointed out that DSS influence both the outcomes and processes of decision-making, where DSS applications are based on varied sources of systems that potentially evaluate the process of decision-making. The process criterion enhances how decisions are measured and executed in a qualitative approach to boost the results and speed of decisions. The outcome criterion has been assessed based on quantifiable aspects such as the decrement in costs, enhanced profits, prediction accuracy and prediction success or failure. In that case, multiple procedures of IDSS have been justified both in practice and theory [14]. As such, IDSS can be evaluated and assessed based on the outcomes and processes of decision-making. The most crucial manner to develop the quantitative framework for the decision values in the system is to apply the analytical hierarchy process (AHP) [15].
826
A. Haldorai and A. Ramu
AHP is advantageous and allows the individuals to add on various components that are acceptable by the system. The stochastic development of AHP is critical and permits the statistical merit of contributions which can be applied by the system. Apart from that, AHP projects a methodology that will be used to compare various alternatives through the process of structuring methodologies into hierarchical system relevance for the process of making a proper decision [16]. The system hierarchy is formulated into a hierarchy essential to the system and is broken into levels as a reporting structure in an organization. The assessor typically requires supplying paired comparisons of these alternatives at a low level, where AHP is capable of computing intermediate comparisons before combining them to a specific decision value comparing different alternatives in this level. The criterion has been measured to project an eigenvalue remedy which will be utilized in the execution of judgment, where two alternatives are fundamental. AHP is therefore essential in the process of mitigating decision issues which can be applied in evaluating IDSS [17].
5 Conclusion and Future Scope The emergent research in the use of artificial intelligence is to initiate decisions in the healthcare sector, and smart adaptive frameworks are meant to deal with complex issues. The systems are linked to the preferences of the decision-makers and interfaces of virtual humans to facilitate effective interaction of both machines and humans. Throughout the process of making critical decisions, techniques are utilized for difficult issues such as big data. Machine learning is integrated with the ontological reasoning for the inherent merits of decision-making approaches to provide accuracy for medical domains. The proposed system operates with the actual-world datasets and protocols. This system can be typically applied to any medical context to make decisions. Since the IDSS is considered to have some prevailing issues for users, further research on the same is fundamental. Future works should focus on the projection of an effective interface for users of DSS with the projection of advancing from the emergent application to the kind that is deployed. These works in smart decision-making advancements project to the fundamental challenges and opportunities in the medical field. Opportunities are viewed in the enhanced process of decision-making which is typically essential for handling complex issues that are beyond the capacity of humans to perceive different variables vibrant in nature. On the other hand, challenges are considered for the wide-range IDSS in the process of designing systems that have clear returns on investment and can interact with humans to gain trust in system usage and deal with potential issues in the system. As a result, in the future, IDSS can lead to a novel wave of a wide and sophisticated framework of decision-making.
An Analysis of Artificial Intelligence Clinical Decision …
827
References 1. C. Bennett, K. Hauser, Artificial intelligence framework for simulating clinical decisionmaking: a Markov decision process approach. Artif. Intell. Med. 57(1), 9–19 (2013). https:// doi.org/10.1016/j.artmed.2012.12.003 2. P. Lucas, Dealing with medical knowledge: Computers in clinical decision making. Artif. Intell. Med. 8(6), 579–580 (1996). https://doi.org/10.1016/s0933-3657(97)83108-9 3. W. Horn, Artificial intelligence in medicine and medical decision-making Europe. Artif. Intell. Med. 20(1), 1–3 (2000). https://doi.org/10.1016/s0933-3657(00)00049-x 4. P.L. Aaron, S. Bonni, An evaluation of wearable technological advancement in medical practices. J. Med. Image Comput. 58–65 (2020) 5. Web based analysis of critical medical care technology. J. Med. Image Comput. 66–73 (2020) 6. A. Haldorai, S. Anandakumar, Image segmentation and the projections of graphic centered approaches in medical image processing. J. Med. Image Comput. 74–81 (2020) 7. I. Rábová, V. Koneˇcný, A. Matiášová, Decision making with support of artificial intelligence. Agric. Econ. (Zemˇedˇelská Ekonomika) 51(9), 385–388 (2012). https://doi.org/10.17221/5124agricecon 8. R. Yager, Generalized regret based decision making. Eng. Appl. Artif. Intell. 65, 400–405 (2017). https://doi.org/10.1016/j.engappai.2017.08.001 9. N.M. Hewahi, A hybrid architecture for a decision making system. J. Artif. Intell. 2(2), 73–80 (2009). https://doi.org/10.3923/jai.2009.73.80 10. C. Gonzales, P. Perny, J. Dubus, Decision making with multiple objectives using GAI networks. Artif. Intell. 175(7–8), 1153–1179 (2011). https://doi.org/10.1016/j.artint.2010.11.020 11. Y. Chen, E. Ginell, An analysis of medical informatics and application of computer-aided decision support framework. J. Med. Image Comput. 10–17 (2020) 12. M. Heng Li, M. Yu Zhang, Computational benefits, limitations and techniques of parallel image processing. J. Med. Image Comput. 1–9 (2020) 13. R. Degani, G. Bortolan, Fuzzy decision-making in electrocardiography. Artif. Intell. Med. 1(2), 87–91 (1989). https://doi.org/10.1016/0933-3657(89)90020-1 14. P. Giang, P. Shenoy, Decision making on the sole basis of statistical likelihood. Artif. Intell. 165(2), 137–163 (2005). https://doi.org/10.1016/j.artint.2005.03.004 15. D. McSherry, Conversational case-based reasoning in medical decision making. Artif. Intell. Med. 52(2), 59–66 (2011). https://doi.org/10.1016/j.artmed.2011.04.007 16. T. Leong, Multiple perspective dynamic decision making. Artif. Intell. 105(1–2), 209–261 (1998). https://doi.org/10.1016/s0004-3702(98)00082-4 17. A. Khusein, Clinical decision support system for the activity of evidence based computation. J. Med. Image Comput. 50–57 (2020)
A Critical Review of the Intelligent Computing Methods for the Identification of the Sleeping Disorders Anandakumar Haldorai
and Arulmurugan Ramu
Abstract The intelligence computing techniques and the knowledge-centered systems are considered in the process of identifying the different complications in a clinical setting. In this article, a critical review of the different intelligent computing techniques, which are utilized in the detection of the sleeping disorders, will be analyzed. The core issue in this contribution is centered on the identification of the sleeping disorders such as snoring, parasomnia, insomnia, and sleep apnea. The mostly used diagnostic techniques by medical researchers are centered on the knowledge-based systems (KBSs), rule-based reasoning (RBR), the fuzzy logic (FL), case-based reasoning (CBR), artificial neural networks (ANNs), multilayered perceptron (MLP), genetic algorithm (GA), neural networks (NNs), k-nearest neighbor (K-NN), data mining (DM), Bayesian network (BN), and the support vector machine (SVM), including other many methods integrated with the medical sector. As for the ancient methods, questionnaires are utilized for the identification of different forms of disorders, which have now been replaced with the above methods. This is meant to enhance sensitivity, specificity, and accuracy. Keywords Knowledge-based systems (KBSs) · Rule-based reasoning (RBR) · The fuzzy logic (FL) · Case-based reasoning (CBR) · Artificial neural networks (ANNs) · Multi-layered perceptron (MLP)
1 Introduction In the field of ancient neuroscience, sleep is considered a fundamental therapeutic factor. This field becomes considerably essential because of the prevailing familiarity among humans. The research done in [1] has shown that approximately 40% of the considered medical themes had about a single sign and syndrome that was projected A. Haldorai (B) Sri Eshwar College of Engineering, Coimbatore, Tamil Nadu, India A. Ramu Presidency University, Yelahanka, Bengaluru, Karnataka, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_63
829
830
A. Haldorai and A. Ramu
to be disrupted by sleep. The young people mentioned that adults (i.e., one out of five adults) are normally distressed from sleepiness during the day and in that matter, narcolepsy is an issue due to extreme sleep during the day. The two forms of disorders include a vital, but confusing effect posed on regular activities of the day. Like that, sleep disturbances are formed by sleep-connected breathing disorders, which are known as sleep apnea. Moreover, the sleep disorders have a long duration and concise duration of dreadful factors. The short duration of effects that are more direct to impaired sleep attention potentially affect the quality of human life, which increases the chances of mishaps. The lengthy effects of the sleep distress shift over to the increment of the mortality and morbidity rate from the increment of mishaps, high blood pressure, cardiovascular illnesses, learning disabilities, and the bulkiness alongside the human discouragement. Some disorders of sleep, according to the researchers in [2], are considered severe enough and hamper the psychological, cognitive, physical and motor functioning of a person. Normally, humans pay minimal attention to sleep apnea, snoring, parasomnia, and insomnia, but the persistence of these disorders can be considered to be incredibly serious. In that regard, early identification of these issues becomes a fundamental task. In the initial stages of identifying these disorders, the medical researchers utilize the quantitative methods in the manner of the questionnaire for the identification of the different sleeping disorders. However, the accuracy evaluations are always an issue in this method due to the complete questionnaire review being dependent on the complete number of medical participants, including the questions designed for the medical survey. In that case, the researchers shift toward the enhancement of the various forms of accuracy based on the application of different intelligent methods. Artificial intelligence (AI) is vital when applied in these methods. The intelligent computing methods such as BN, DM, MLP, NN, SVM, GA, FL, and ANN are all information-controlling techniques, other than knowledge. However, a lot of researchers have implemented different integration methods of knowledge controlling, which are applied in the clinical domain. The basic issue-solving paradigms in the segment of AI are CBR and RBR. The researchers in [3] projected a more rigid protocol for the patients’ sleep macro-structure specification that is mutated recently by the Academy of Sleeping and Medicine in the SA. Human snoring is considered extravagant and tedious. In that case, the frequent attempts have been structured to formulate such frameworks, which take the count of records in a more automated manner. The R-K method utilized in the construction of the rule-centered sleep staging framework is based on the application of the multi-rule decision tree. The tree framework and technique were utilized in the improvement of the accuracy over one decision tree system and was considered that the multiple decision tree system has about 7% form of accuracy over one model. The RBR techniques are also time consuming as they necessitate signal data, identification of certain patterns such as k-complexes, rapid eye shifts in EOG and the sleep spindles in EEG. Canisius utilized the bio-signal processing algorithm for the identification of the sleep problems that are based on the ECG signals with an approximate accuracy of about 77%. A lot of time is essential to formulate a framework from RBR that retrieves
A Critical Review of the Intelligent Computing Methods …
831
the elements from the initial dataset before constructing the rules according to the human brain system whereas in opposite classification numerical does not require any complicated rule-centered features extracted by the power spectrum. The chromosomes with variable structures and fitness elements are utilized in the process of finding optimal input elements and the network recognition specifications. In numerical categorization techniques, no human knowledge and protocols are essential. The improvements of the portable microcontrollers based on the medical devices are effectively enhanced for home monitoring for long-term monitoring. This form of the medical device might potentially provide different forms of outputs as overall snoring count, medium numbers of snores in every hour, and irregular numbers of snoring. The success aspect of the devices was about 85% in a laboratory ecosystem and about 70% in the home-based [4]. CBR is capable of utilizing the exact knowledge of the experienced and confirmed issue cases. It considerably favors learning based on experience, since this is easier to master from the real issue-solving experience. The first framework, which is the case-centered reasoning, was considered as the CYRUS framework that was formulated by researchers in [5]. CYRUS was a showcasing of Schank’s dynamic aspect of the memory framework. As a result of the contrasted merit and demerit of CBR and RBR, it is considerably challenging to mitigate the issue independently. However, in case their advantages are noted and the demerits are ignored, their various junctions present more considerable advantages such as BOLERO and MIKAS. These are some of the systems that can be integrated with CASEY, PROTOS, CBR, and RBR. MBR and CBR are meant to integrate GREBE, MBR, CBR, and RBR with a knowledge base that includes the cases and rules connected with the kind of laws for the injuries to the workforce. The transformation of the hidden skill set into the more precise protocols would amount to the irregularity and loss of knowledgeable contents. An alternative of these forms of inferences in Baye’s theory, which fit the probabilistic figure for the measured outputs such as MES and ES, was analyzed in 2007. SAMOA represents an automated sleeping apnea sign diagnostic framework. All of these systems are effective for the special illnesses and the self-governing signs but can also fail when someone is having more than a single system or disease, which gives a disease a reason to advance. Moreover, the rule-centered expert framework has two potential demerits: (a) (b)
All situations might not be illustrated by protocols for different conditions Experiences collected by errors and trials might not be contained by the knowledgebase with no human input. As a result of this, low forms of agreements are reported at a rate of 83% [6].
In that case, ANNs have entertained to apply some form of human intelligence. ANNs have considerably been utilized and acknowledged as a method for the treatment and diagnosis of sleep disorders at the various stages of the human developmental stages. In some instances, GA is utilized to evaluate the neurons number of the potential hidden layers. There is a difficulty in the advancement of the automatic framework as a result of several uncertainties that arose as issues arise day in day
832
A. Haldorai and A. Ramu
out. To mitigate this form of problem, FL is analyzed and applied as a suitable framework such as the ISSSC version of the 1.0 system that is applicable for the clinical treatment and diagnosis. DM is considered as a proficient technique and tool for the creation of novel knowledge set from the wide-range databases. Different techniques of DM are termed in the collective diagnosis measure of wide-range illnesses such as the prediction protocol for the obstructive disorder known as sleep apnea. This research is centered on the analysis of the various techniques of the diagnosis and detection of sleep illnesses such as snoring, parasomnia, insomnia, and apnea [7]. Various techniques centered on the wide-range intelligent computing methods and their connections such as DM, FL, GA, ANN, CBR, and RBR/KBS were discussed. The connected techniques are GA-FL, ANN-DM, ANN-GA, ANN-BN, ANN-FL, RBR-ANN, CBR-FL, and RBR-CBR [8]. The DM techniques and the proficient systems projected for the identification of sleeping disorders have widely been evaluated in this research article. In the wide research of sleep disorders, wireless technology is applicable where patients are allowed to acquire the merits of treatment and diagnosis, even with not formulation of any forms of disturbances in the patients’ normal sleep patterns. In this process, healthcare practitioners can obtain the necessary information about the patients [9]. The remaining section of the article is divided into the following sections. Section 2 explains the various intelligent computing methods. Section 3 handles the obtained findings from the reviews. Section 4 explains the results and discussions. Finally, Sect. 5 concludes the research work along with the future scope.
2 Intelligent Computing Methods 2.1 Knowledge-Based Systems (KBSs) and Rule-Based Reasoning (RBR) Knowledge-based systems (KBSs) represent the AI device that provides smart decisions purposed for validation. In this, the skill representation and acquisition are being structured based on various scripts, frames, and rules. Knowledge is thus represented by the use of different cases: RBR and CBR. The core elements of RBR are the rulebased and the inference engines. The rule based includes several standards termed as the knowledge based. The inference engine is capable of inferring data about the interaction of the rule-based input. The match-resolve act cycle is structured to effectively execute the construction framework program. The core merit of RBR is the definition of the data in the manner of protocols, compressing the representation of the modalities, and the protocols [10]. The R-K method is basically centered on the marking of events such as k-complex, sleep spindle, slow delta waves, and the rapid eye movement instead of the background signal activities. In case no marking instances or events are identified at the sleeping epoch, then the event-centered smoothing protocols and classification rules affect the ANNs with
A Critical Review of the Intelligent Computing Methods …
833
minimal performance. Moreover, the rule-centered proficient framework has three fundamental limitations: (1) (2) (3)
All the situations might not be defined by the protocols The experiences retrieved by the errors and trials might easily be contained by the knowledge-based with no human efforts Misconception to properly comprehend the protocols.
The neural network systems included the shortcomings of the protocol-based expert framework. The limited reliability of the automatic sleep scorings has been replaced by the demerits of the hybrid neural networks and the rule-centered proficient framework with the agreement rates of about 85.9%. CBR represents the procedure of mitigating the novel issues based on the remedies of the same past issues. CBR-based frameworks are structured to provide a remedy on the novel issues by applying the four R’s: Retain, Revise, Reuse, and Retrieve. CBR has the capability to effectively convey a certain update and knowledge of KBS when novel cases are reported. This also aids in the management of the unpredicted input. However, there are some demerits or limitations such as the acquisition of knowledge and the issues of the limited or unDoi cases, where the efficiency of inference is not desired based on the straightforward provision of explanations [11]. Several research evaluations have utilized the connected or integrated technique of RBR and CBR for the implementation of the frameworks in the process of detecting sleeping disorders.
2.2 Artificial Neural Networks (ANNs) ANNs commonly utilized for the pattern classification and recognition include the collection of the kind of perceptron linked with the layers by the data-controlling techniques and connectionist in the clinical domains. This possesses the adaptive condition to transform its structure during the learning stage, in that regard, utilized in the mitigation of the actual-world complicated connections. It relates to the issues in which the training information assimilates to the complex and noisy sensor information, including the issues which are more symbolic when illustrated. BP algorithms are majorly utilized in ANN study methods that have been utilized by a lot of researchers in the classification and detection of hypopnea and SA events. ANN is known for its few merits over KBS in the exhibition of the complementary approaches to RBR in terms of skill set illustration that necessitates a lengthy duration to construct such a framework from the rule-centered techniques meant to extract features from the initial forms of recordings such as PSG and EEG then structure the protocols about the human knowledge. ANNs incorporates attractive properties for the automatic recognition of the sleeping EEG pattern that does not require any more elaborated categorization protocols or complex domain skill set. ANNs agreed with a more manual scoring of about 93% for the wide-range scoring epoch to a more manual scoring of time consuming and arousal procedure. The semi-automated arousal detection framework has been implemented by a system of
834
A. Haldorai and A. Ramu
Sorensen applying FFNN meant to overcome the kind of limitations of the manual framework [12]. The repetitiveness and abstraction of the tasks are more direct to the low inter-scorer and inaccuracies agreements, whereas no skill of probability distribution is necessary. Neural network is more proficient in the evaluation of the posterior probability, giving the basis of implementation of the category protocols. The automatic categorizer was projected to identify the arousal, less time consuming, and inexpensive procedure as contrasted to other methods. In the automatic sleeping spindle detection research, cited from the visual detection findings range from 70 to 99% and the false-positive ranging from 3% to 4.7. ANNs is effective in the categorization of the non-periodic and nonlinear forms of signals such as sleeping EEG pattern. In another critical analysis, two forms of ANNs: LVQ and MLP are utilized to categorize the sleeping stages for babies. Automated sleeping stage and scoring in humans were projected based on the application of the multi-layered FFNN, which includes the rates of recognition that varies from 82 to 90%. The merits of bio-sleep TM utilize the automatic neural network techniques for the sleeping stage for speed and ease of analysis. Not entirely bounded to R-K technique, it provides the merits over the manual evaluation. ANNs insensitive to the data distribution indicates the same results being obtained with transformed and raw data. ANNs are considered deft in handling the non-Gaussian probabilities and density function that includes extreme values. A transformation is ineffective based on its capacity to separate the spaces into subspaces. The k-NN function is non-parametric as a method that can be used for categorization, which is also assumed to have no apriori parameter knowledge on the probability framework of data samples being used when the best performances are being provided. This is done for transformed and homogeneous data for a considerable number of cells with minima sizes that can be accomplished by LDA as contrasted to ANNs. LVQ, which represents the network system with supervised training, indicated an outstanding adaption to the training sets with an enhanced number of neurons attaining maximum categorization. The normalized readings are considered separate completely from the apneic analyzed recordings from NN and k-NN supervised learning classifiers. The merits of RBF networks include architectural simplicity, minimization in the training timeframe, and the capacity to handle the unseen information. The RBF networks are applicable for fault identification, face identification, or clinical diagnosis. RBF-FCM networks assure the best categorization accuracy with minimized network complexities whereby RBF-KM networks assure the most effective categorization performance as contrasted to the RBF-OLS and RBFFCM networks. With the mentioned networks, performance is considered somewhat minima compared to the RBF-KM. Irrespective of these merits, ANNs are known to have some drawbacks, such as the organization of NN, which is considered more ambiguous. The priori knowledge/data utilized for the initialization aim might not be handled for better initialization of networking parameters and the minimization of the learning timeframe. It is included as a rule-centered proficient framework that might not contain all the essential protocols as a result of the minimal agreement rate of 55%
A Critical Review of the Intelligent Computing Methods …
835
as contrasted to the rule-based proficient framework of 83%. The visual identification and counting of the spindles are considered laborious and takes time for the complete sleeping EEG recordings. As such, there is a problem connected with the mimicking of the human score in the automatic sleeping spindle detection framework. ANN-centered analysis framework is considered not sufficiently effective and accurate for the sleeping research that utilizes the R-K categorization framework. Bio-sleeping RM is constrained to the visualized inspection between the pseudo-RK hypnogram and the scores manually. Eliminate a single scorer out and the classical cross-validation methods might not be applicable whenever operating the massive database. The categorization errors reach about 30% and might not be enhanced whenever the number of information in the sets is more than 500 different samples that use the k-NN classifier or the Parzen estimators. K-NN and MLP gave incredible results as contrasted to other methods; however, k-NN is known for its majority votes over the remainder of the classes. The choice of the most effective NN and the optimization structure of layers are time consuming and challenging to consider. K-NN necessitates spacing for the massive amounts of study vectors in the span of memory. The unsupervised GCS and SOM techniques were considerably poorer. In the wide-range apnea-screening techniques, it is seen that about 90% of correct categorization on timely analysis can be obtained for patients. It potentially extracts the spectral elements via the Fourier transform of both ECG and RR series morphological features. In our perspective, its major demerit is in the dimensional characteristic spaces, i.e., 88 various features. The selection of a simplified topology amount in a network system that is not capable of studying the complex characteristics whereas the complex topology amounts to the generalization loss and its capacity that amounts to overfitting the studying dataset. In a considerably complex architecture, NN might have the capacity to master the trained sets amounting to the inaccurate projection of futuristic samples. In that case, early stopping methods are known to be an alternative to make this possible. It includes the validation sets meant to halt the trained algorithms before the networks begin mastering the noises in the information set as a segment of the model amounting to the evaluation and estimation of the generalization errors. The most effective generalization evaluation is attained based on the networks whose level of complexity is considered too large or too low. The RBF-KM network from the increment in the sizes of the hidden layers was not known to substantially enhance the varied form of accuracy. Apart from that, studying the MLP network with BP necessitates a significant number of the userbased parameter, which will be a priori-specified training epoch, momentum and a learning rate. SVM, as a result of its generalization capacity, is utilized in the process of solving the supervised categorization, binary categorization, and regression problems, which include the tasks of non-parametric elements applicable in statistics. This maximizes the margins identified between the decision boundaries and the training sets of information. As a result, this can cast as the quadratic optimization issue. Machine learning technique is projected by researchers in 1995. The ideology of SVM is to structure the optimized separation hyperplanes. An optimization method
836
A. Haldorai and A. Ramu
of SVM represents the width of margins for different categories, i.e., the spaces around the decision boundaries illustrated by the prevailing distance to the closest training pattern. SVM is a supervised learning model connected to the learning algorithm meant to evaluate data and identify the potential patterns and map information into the highdimensional space meant to identify the separated hyper-plane with more maximize margin. The merits of SVM are that it might potentially mitigate the issue of nonlinear categorization and there is no need for speed clamping for the constriction framework. The kernel element is applied in the process of mitigating the issue of the inner products and its calculations in the high-dimensions; hence, an effective technique for the nonlinear categorization is considered from it. The kernel elements have to be selected to accomplish the most effective categorization accuracy for unidentified samples. The cross-independent and validation test accuracy of the apneic events’ identification are noted to be about 92%–93%, respectively. For hypopnea events, these two forms of accuracies are considered to be 90 and 89%, whereby sensitivity was utilized to effectively optimize SVM parameters. After evaluating three various kernel elements such as sigmoid, polynomial, and RBF, it was identified that the polynomial kernel indicates high performance compared to the rest. In contrast to GA, PSO is known to have lesser complicated operations. In that case, lesser parameters might be coded based on the stochastic procedures to mitigate it. SVM has the capacity to effectively minimize both the empirical and structural risks amounting to the effective generalization of novel forms of data categorization. Rapid convergence is a single limitation of PSO. The selfadvising SVM fundamentally gives better findings compared to the ancient SVM. Self-advising SVM purposed to handle with the ignoring of skill sets retrieved from the miscategorized information. SVM represents the approximate application technique of the structural minimization of risks to accomplish low forms of probability based on generalized errors. The categorization performance of k-NN, linear discriminant, and PNN categorizer on test information was minimal compared to SVM. Three various kernel functions are utilized such as radial basis, polynomial, and linear function, whereby 100% accuracy is gotten based on the polynomial kernel element with four various features, the same as the linear function with just two different feature subsets. PNN and k-NN indicate poor categorization performance, i.e., 70% and 83%, respectively, on the tested information.
2.3 Fuzzy Reasoning/Logic Fuzzy sets theories play a fundamental role in handling the complexities whenever drafting the required decisions in the clinical domain. Fuzzy logic is a form of probabilistic logic that handles logic, which is considered appropriate instead of being accurate and constant. The fuzzy logic variable might have the truth value, which ranges in degrees between 0–1. It is considered expanded to effectively manage the theories of partial truth, whereby the truth value might range between the completely
A Critical Review of the Intelligent Computing Methods …
837
true and false. In the linguistic variable, the degrees might be dealt with based on certain functions. The fuzzy rule-based framework attained accurate results for several samples, but it still requires being improved based on performance in the various samples of the recorded information. It also had to overcome the potential limitations of the epoch-centered sleeping staging by accessing more continuous transformation of the sleeping habit in patients. The elimination of binary decisions assures soft transformation and allows concurrent characterization of the various sleeping states. The usage of the Mamdani fuzzy protocols allows knowledge to effectively be applied in the manner of linguistic protocols, next to the human language, that effectively facilitates the acquisition of knowledge, understandability, and also permits explanatory capacity. The limitation of the R-K protocols was based on the unnatural assignment of discrete stage apart from producing it more continuously. Receiver operating curves (ROC) index of 1 was received for the categorization of such events as hypopneas and apneas. The manual sleeping categorization of patients was structured by various experts with the inter-raters reliability of approximately 70%. The categorization procedure has considerable results compared to the processes of discretization. It has wide-range advantages such as easy reformation of the rule-base or the fuzzy datasets, ease of comprehension as a result of the output presence in the linguistic formation, ease of designing as a result of minimal costs, and the provisions meant to permit the conflicting inputs arrives at a lesser time interval. Contrary to that, fuzzy includes the limitation such as the challenging construction of the models from the fuzzy framework.
2.4 Genetic Algorithms (GA) The genetic algorithms represent the search heuristic optimization method that mimics the process of natural advancement. These algorithms have been exposed to identify the optimal remedy for various challenging issues as a suspected or effective tool centered on the principles of evolutionary strategies. This is also possible to effectively transfer GA to the present simulation framework. A stylish genetic algorithm requires two terms: robustness element meant to evaluate the remedy domain and the inherited demonstration of the remedy domain. This provides the automated score about the sleeping stages [13]. The major merit of GA is that no considerable condition has the knowledge of mathematical evaluation for the purpose of comprehension. It is more effective to utilize the case of a complex and huge search domain. In these instances, the localized minima price element might be retrieved from the gradient optimization technique. There are several limitations of GA such as the no assurance of international optimum results, no firm optimization response timeframe, limited controls of GA application and the accurate optimization issues that might not be mitigated by GA.
838
A. Haldorai and A. Ramu
2.5 Data Mining (DM) Data mining represents the procedure utilized in the discovery of knowledge and the process of pulling previously unforeseen connections, layouts, and unidentified relationships. DM gives offers meant to effectively distinguish productive chunks of data covered in huge and expensive databases. Statistics provided from DM packages include associated rules, data segmentation, k-nearest neighbors, rule induction, and the decision tree. The tree-like architecture is utilized to enhance the accuracy level over the single form of the tree structure. In case the neuron variation is enhanced in the hidden layers, the mean accuracy is considerably enhanced and thus the standardized deviation is minimized. The most effective performance is attained based on the application of LMBP (trainlm) and the Gradient Descent based on the mean momentum (traingdx) and the adaptive study rate BP learning element to study the ANNs. However, trainlm provides effective results in the testing and training process contrasted to traingdx. The insomnia prevalence happened more frequently in OSAS patients and was systematically connected with poor sleeping quality does not affect the long-term complications about the kind of patients suffering from insomnia ranging from moderate to more severe syndrome. The rate of agreement was thus extended through the extraction of features from EMG and EOG signals ranging from 71 to 80%. An automated detection framework was thus proposed through the avoidance of a time-consuming manual arousal procedure.
2.6 Bayesian Network (BN) Bayesian network provides a framework for undetermined reasoning, which is capable of dealing with diagnostic issues. BN is healthy and strong as a system formalism, which permits reasoning under some forms of uncertainties, recommending a more graphic representation of more statistical dependency between the variables of the domain. BN is considered a more directed cyclic graph, which means it is a combination of different nodes that represents more random variables linked by different edges that feature the conditional probabilistic dependency between the various vertices. The categorization of different nodes is dependent on the parent nodes and Node Y is considered conditionally independent of X in case there are no straight paths from X–Y. In that case, BN is structured to represents not just correlate the causalities. It thus facilitates the various forms of visualization of firm links.
3 Computational Analysis The single form of KBS framework has individualized merits and demerits such as inference issues, and skillset acquisition issues. As dealt with in the above segment,
A Critical Review of the Intelligent Computing Methods …
839
the demerits of the single form KBS are limited through the integration of ICT and KBS methods. In this segment, the merits incurred from the integrated frameworks alongside their applications from treatment and diagnosis of sleeping disorders. The interlinkage of CBR-RBR amounts to the ease of the acquisition of knowledge, enhancement attained based on accuracy, efficiency, and performance. The major merits prevailing from the connected methods incorporate high performances on sleeping disorders, sleep justification, and learning capacity. The segment of sleep stages and its scoring is considered to necessitate hybrid reasoning due to human experts using both rule-centered experiences and knowledge. To mention the merits of CBR in the treatment and diagnosis of OSA, researchers have considerably formulated the minor prototype framework known as Somnus that provides case retrieval and storage in the sleeping disorder. To handle with complexity and diversity of information, researchers have utilized the model that includes the fuzzy logic method from modeling the case elements and the semiotic method for the model of the wide-range measures. The user interfaces structured in the prototype of the Somnus framework are entirely restricted to the SQL statement hence providing more quick access to information; however, the system is now limited to deal with the minor groups of users that are familiar with the database schema. The usage of the semi-fuzzy method provides users a more uniform representation of subjective, objective, qualitative, and quantitative measures. Even the more limited CBR cases in Somnus, it is essential to consider that different healthcare providers in the training field use the measures. A lot of computational frameworks are recommended in the segment of sleep to entirely handle the detection framework. The first application of NN is the time of sleep has widely been evaluated by researchers [14]. The BP technique is termed in this instance but the categorization rate does not exceed 60%. The researchers in [15] had tried about three various supervised learning techniques such as the scaled conjugate gradients, Bayesian approaches, and the regularly scaled conjugate gradients. SCG was chosen as a studying algorithm for the networks as a result of the prompt convergence speed and minimal memory requirement and the mean squared errors (MSE) as a more cost function. However, the second approach is more opposite to SCG whereby MSE with more regularization terminology term known as weight decay is utilized as a cost element. Whereby only the most effective results were accomplished based on the application of the Bayesian model and the regular cross-entropy element through the reduction of the error function.
4 Results and Discussion Discrete wavelet transformation was utilized as a processing phase to fix and minimize the input number of the classifiers. The choice of Bayesian network has fundamental advantages that the various learning parameters are auto-adaptive or nonexternal human action required. In past research, a lot of emphasis has not been done on the categorization and detection of the nonlinear features of the EEG signal. The
840
A. Haldorai and A. Ramu
major causal factor might be the challenging mathematics and require some expertise meant to interpret. However, many researchers have projected the connected interlinkage of the bispectral evaluation of ANNs. Contrary to the energy spectrum, the bio-spectrum presents the non-Gaussian and the nonlinear data that permits the identification of the nonlinear features and the featuring of nonlinear approaches. The quadratic phase coupling (GPC) is considered a unique feature of bispectrum, which is utilized to differentiate, quantify and detect the normalized QSAS patients. Training is effectively controlled through the cross-validation method. In the training process, chromosomes have been structured based on the variablelength architecture and the fitness element, which is utilized to identify the optimal input character and the identification of the network specifics. The integrated framework base on feed-forward NN had considerably transformed concerning genetic algorithms, to identify the input features and optimal structures. The massive amount of information, complexities of categorization evaluation, and the variability in the human expert present the sole reason to formulate the automated sleeping categorization framework. The neuro-fuzzy model was selected for the structuring of the protocols, for the categorization procedure and to identify the kind of parameters, which illustrate the degrees of absence and presence of patterns. The findings from this analysis were from the multilayer perceptron NN and were recorded at 87.3% and based on the professional’s rules of sleeping categorization of 86.7% whereas the accomplishment of the findings in NF categorizer was about 88%. More time is essential to structure the model from the rule-centered method for the feature extraction from the more initial recordings. The R-K method determines the sleeping and waking stage has to last for about a single minute. In this research, an attempt was structured to evaluate the different methods adopted for the sleeping disorders such as BN, DM, MLP, NN, SVM, GA, FL, and ANN are all information controlling techniques such as GA-FL, ANN-DM, ANNGA, ANN-BN, ANN-FL, RBR-ANN, CBR-FL, and RBR-CBR. This paper uses and shows the overall cases in which different techniques are utilized for diagnosis, classification, and detection of sleeping disorders. From the research, it is examined that the standalone class, from the 79 different cases, 5 BN cases, 7 Fuzzy cases, GA cases, and ANN major cases are utilized in the timeframe of sleeping disorders: its diagnosis, classification, and detection. A few instances of the incorporated methods were witnessed in this duration such as GA-FL (3), ANN-DM (1), ANNGA (4), ANN-BN (5), ANN-FL (2), RBR-ANN (2), CBR-FL (1), and RBR-CBR (2). It is considered that out of the 97 cases connected to the practice of the above methods in sleeping disorders, several approaches, i.e., 59 are used by a single form of methodology whereas the remainder of the 20 cases is positioned by the connected techniques. From Figs. 1 and 2 of this research, the relative application of every technique about the overall cases were identified based on the application of a single methodology that is represented by (m, p%). From the representation, m shows the number of the cases based on the application of a certain methodology whereby p is the percentage ratio of m (41) to the overall cases based on the application of the singular methodology (41 + 5 + 2 + 7 + 4 = 59) shown in the table (7th row). In that case,
A Critical Review of the Intelligent Computing Methods …
841
Fig. 1 Comparison of computing methods against usage
Fig. 2 Comparison of computing methods against percentage usage
the relative application of ANN methodology is given as FL (12%), DM (7%), BN (8%), and GA (3%) which depicts a total of 69 as shown in Fig. 1. As for the incorporated methods, the same calculations are utilized and shown by (i, q%). From the representation, i represents the overall cases based on the application of certain connected methods and q represents the percentage ratios of i to the overall number of the connected methods utilized. In that case, the relative usage of CBR-RBR is exactly 10%, GA-FL (15%), ANN-DM (20%), ANN-GA (20%), ANN-BN (25%), ANN-FL (10%), RBR-ANN (10%), CBR-FL (5%), and RBR-CBR (10%).
842
A. Haldorai and A. Ramu
5 Conclusion and Future Scope The core rationale of this article is delivering a critical instance in the deployment and development of different ICM in the duration of sleeping disorders. Various literature sources were analyzed in the domain of sleeping and walking. It is evaluated that ANN methods are considerably utilized in the sleeping disorders domains as contrasted to a single methodology where minimal reasoning. The connected methods are also utilized in the classification and detection of sleeping disorders and contrasted to single methods, whereby their numbers are minimal. Out of the eight connected methods, ANN-GA and ANN-BN have widely been utilized percentagewise compared to other connected methods, i.e., 20 and 25%. Over the past few decades, the trend of utilizing fuzzy logic and DM has increased considerably. ANN, fuzzy, and BN are mainly utilized for diagnosis, categorization, and detection purposes; GA is utilized for detection whereas DM is for classification and detection. As such, future research might find our research relevant for novices, which might emerge in the medical field.
References 1. K. Aoyagi, Medical image assessment apparatus, ultrasonic image assessment apparatus, magnetic resonance image assessment apparatus, medical image processing apparatus, and medical image processing method. J. Acoust. Soc. Am. 133(5), 3220 (2013). https://doi.org/ 10.1121/1.4803793 2. B. Lelieveldt, N. Karssemeijer, Information processing in medical image assessment 2007. Med. Image Anal. 12(6), 729–730 (2008). https://doi.org/10.1016/j.media.2008.03.005 3. P.L. Aaron, S. Bonni, An evaluation of wearable technological advancement in medical practices. J. Med. Image Comput. 58–65 (2020) 4. Web based analysis of critical medical care technology. J. Med. Image Comput. 66–73 (2020) 5. A. Haldorai, S. Anandakumar, Image segmentation and the projections of graphic centered approaches in medical image processing. J. Med. Image Comput. 74–81 (2020) 6. C. Hung, Computational algorithms on medical image processing. Current Med. Image Assess. Form. Current Med. Image Assess. Rev. 16(5), 467–468 (2020). https://doi.org/10.2174/157 340561605200410144743 7. F. Aubry, V. Chameroy, R. Di Paola, A medical image object-oriented database with image processing and automatic reorganization capabilities. Comput. Med. Image Assess. Graph. 20(4), 315–331 (1996). https://doi.org/10.1016/s0895-6111(96)00022-5 8. P. Jannin, J. Fitzpatrick, D. Hawkes, X. Pennec, R. Shahidl, M. Vannier, Validation of medical image processing in image-guided therapy. IEEE Trans. Med. Image Assess. 21(12), 1445– 1449 (2002). https://doi.org/10.1109/tmi.2002.806568 9. K. Drukker, Applied medical image processing, second edition: a basic course. J. Med. Image Assess. 1(2), 029901 (2014). https://doi.org/10.1117/1.jmi.1.2.029901 10. P. Jannin, Validation in medical image processing: methodological issues for proper quantification of uncertainties. Current Med. Image Assess. Rev. 8(4), 322–330 (2012). https://doi. org/10.2174/157340512803759785 11. M. Goris, Medical image acquisition and processing: clinical validation. Open J. Med. Image Assess. 04(04), 205–209 (2014). https://doi.org/10.4236/ojmi.2014.44028 12. T. Aach, Digital image acquisition and processing in medical x-ray image assessment. J. Electron. Image Assess. 8(1), 7 (1999). https://doi.org/10.1117/1.482680
A Critical Review of the Intelligent Computing Methods …
843
13. H. Barrett, A. Gmitro, Information processing in medical image assessment. Image Vis. Comput. 12(6), 315 (1994). https://doi.org/10.1016/0262-8856(94)90055-8 14. T.K. Araghi, Digital image watermarking and performance analysis of histogram modification based methods. Intell. Comput. 631–637 (20180 15. G. Yu, Z. Wei Xu, J. Xiong, Modeling and safety test of safety-critical software, in 2010 IEEE International Conference on Intelligent Computing and Intelligent Systems (2010)
Review on Face Recognition Using Deep Learning Techniques and Research Challenges V. Karunakaran, S. Iwin Thanakumar Joseph, and Shanthini Pandiaraj
Abstract In the research area of object recognition, last few decades many of the researches did research on face recognition and the research is still active because of its application and challenges present in the real world. In the constrained environment most of the recent face recognition techniques offer a better result but it fails in an unconstrained environment. An unconstrained environment such as the images were captured during various environments like different resolutions, in a different pose with various expression, illumination, and occlusions. In this article, various deep learning techniques used for face recognition have been discussed. Keywords Face recognition · Deep learning techniques · Object recognition · Constrained environment · Unconstrained environment
1 Introduction The researchers who belong to computer vision had more attention toward deep learning techniques. Most of the researches carried out with various deep learning techniques for face recognition. The reasons for most the researches have more attention toward deep learning techniques can easily deal with a huge amount of data and it provides better classification accuracy. It is heavily dependent on high-end machines. Here, the problems are not divided into a small problem, and it solves the problem as end to end. It takes a long time to train the system and it takes V. Karunakaran (B) · S. Iwin Thanakumar Joseph Department of Computer Science and Engineering, Karunya Institute of Technology and Sciences, Coimbatore, India e-mail: [email protected] S. Iwin Thanakumar Joseph e-mail: [email protected] S. Pandiaraj Department of Electronics and Communication Engineering, Karunya Institute of Technology and Sciences, Coimbatore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_64
845
846
V. Karunakaran et al.
only a small amount of time for testing the data. Face recognition research is still alive due to its following applications, such as prevent retail crime, unlock phones, smarter advertising, find missing persons, help the blind, protect law enforcement, aid forensic investigations, identify people on social media platforms, track school attendance, and so on. The rest of the paper is organized as follows: Section 2 presents challenges in face recognition, Sect. 3 explains the face recognition using deep learning algorithms. Finally, Sect. 4 concludes the research work.
2 Challenges in Face Recognition Challenges in face recognition are as follows: pose variations, presence or absence of structuring elements or occlusions, facial expression changes, aging of the face, various illumination conditions, image resolution and modality, and availability and quality of face dataset. In this section, the challenges of face recognition were discussed.
2.1 Pose Variations In face recognition, one of the main challenges is posed variations. In practical applications, the head pose plays an important role in face recognition. Many researchers were handled pose variations problem using the following three algorithms such as 1. 2. 3.
Invariant feature extraction-based approach Multi-view based approach 3D image-based approach.
Invariant feature extraction face recognition was carried out with invariant pose changes. This approach is further classified into 1. 2.
Appearance-based algorithm Geometric model-based algorithm.
If the face image dataset is sufficient, then appearance-based algorithm is performed well. If the input face image dataset is insufficient, the appearance-based algorithm will not be performed well. This problem was overcome by face synthesis, it will create additional face images from the existing ones, and it will improve the accuracy of the model. The geometricbased model provides a promising result in the situation of various face pose changes and with an insufficient face image dataset but it acquires more computational cost when compared to the appearance-based algorithm [1]. The same person’s face image with a different pose is shown in Fig. 1.
Review on Face Recognition Using Deep Learning …
847
Fig. 1 Same person face image with different pose
2.2 Presence or Absence of Structuring Element or Occlusion When the face recognition algorithm is tested with occlusions image, definitely there will be a small drop in the algorithm performance. Occlusions mean the person is wearing sunglasses, a cap or hat, a beard, a scarf, and so on. It will degrade the performance of the algorithm during face recognition. Many researchers have handled this problem by using texture-based algorithms [2, 3]. Sample occlusion image is shown in Fig. 2 [4].
Fig. 2 Sample occlusion image
848
V. Karunakaran et al.
Fig. 3 Same person with different expressions a anger, b disgust, c sadness and d happiness
Fig. 4 Same person image with various illumination conditions
2.3 Facial Expression Changes The automatic face recognition algorithm will be tested with various facial expressions definitely there is a small drop in the algorithm performance. Various facial expressions such as anger, disgust, happiness sadness, and so on. Figure 3 shows a sample of various facial expression images of the same person [5].
2.4 Various Illumination Conditions There is a large variation in the illumination present in the image will definitely drop down the performance of the face recognition algorithm. The various illumination conditions such as low-level lighting in the background or foreground of the image and high-level of lighting in the foreground or background of the image. Various illumination conditions image is shown in Fig. 4. In the next section, the researchers handled the above-mentioned challenges by using different approaches and algorithms.
3 Face Recognition Using Machine Learning and Deep Learning Techniques Masi I et al. proposed pose aware model for tackling pose variation problems by using a convolutional neural network with specific several poses. In this article, the
Review on Face Recognition Using Deep Learning …
849
3D rendering is used to synthesize multiple face poses obtained from the input image. If the model is trained with multiple faces to pose obtained from the input image will help to achieve the best accuracy result in the test phase. The obtained result clearly shows that the proposed method provides better accuracy and proved the method is capable of tackling pose variation problems [6]. In this article, the face images were processed by various poses using deep convolution neural network. Deep convolution neural network layers and various pose model selection had improved the performance of the proposed system during the recognition phase. The proposed system provides better results in the aspect of verification and identification tasks compared to the state-of-the-art methods [7]. Chen et al. evaluated the performance of deep convolution neural network for a new dataset such as IARPA benchmark A and Janus Benchmark A (IJB), additionally with traditional data set such as Labeled Face in the Wild (LFW). The experiment was conducted with two methods: one is DCNN and another one is a Fisher Vector method. The experiment result clearly shows deep convolution neural network model was performed well on both new datasets such as IARPA benchmark A and IBJ A when compared to the fisher vector method in the identification and verification tasks [8]. Su et al. proposed a model for detecting sunglasses and scarves. If the image is present with sunglass and scarf, the support vector machine is used for detecting the occlusion in the image and the regression analysis will be used to remove the sunglass and scarf from the image, that image is called a reconstructing image. The experiment was conducted for both the reconstructed image and the original image. The experimental results concluded when compared to the reconstructed image, the non-occluded part for the face recognition image provides a better result [9]. Wang et al. [10] give an overall view about the exploration and content creation of virtual reality with deep learning methodology. A rapid growth in the deep learning techniques and its advantages over all applications energized the involvement of machine learning paradigms in virtual reality methods for an intelligent approach. In general, the content creation and exploration of virtual reality directly map to (a) (b)
Analysis of image and video Appropriate synthesis as well as editing.
Usually, the general adversarial networks are mostly used and modeled for a specific application to manage (a) (b) (c)
Panoramic images Videos Virtual 3D scenes.
Golnari et al. [11] developed DeepFaceAR which is used for the recognition of deep faces and display of personal information through the concept of augmented reality. The popular research topic in machine vision is biometric recognition. Here, deep learning methodologies combined with augmented reality logic are used to recognize the individual faces and also listed about the person utilizing the concept of augmented reality. The dataset consists of 1200 facial images of approximately 100 faculty teachers belong to the Shahrood University of Technology.
850
V. Karunakaran et al.
Mostly, augmented reality based works follow three approaches, namely location based, marker based, and motion based. Shaul Hammed et al. [12] give an insight into the facial recognition system. The major form of non-verbal communication is facial expression. It expresses 1. 2.
Person feeling Judgment of an individual. The general facial expression system consists of the following four steps:
1. 2. 3. 4.
Signal acquisition Preprocessing Feature extraction Classification.
Hbali et al. [13] developed an augmented reality system for face and dual eyes based on the feature of the histogram of gradients used for object detection. Appropriate machine learning algorithms are utilized for (a) (b) (c)
The detection of the face and eyes of a human using that application. Tracking of eyes in a real time. Eye and face positions are used for embedding the image of glasses upon the face.
These types of systems are very well utilized for the application of checking the quality of glasses without directly checks in into the shops. The accuracy of the system is highly improved due to the utilization of HOG features. Based on the reduction of computational complexity, these kinds of virtual reality-based applications can be effectively implemented on smart gadgets thus upgrade the shopping behavior of customers through e-commerce. Yampolskiy et al. [14] developed a system for recognition of face in the virtual world especially, avatar face recognition. The major problem in the virtual world is several types of criminal activities enabled the forensic communities to track accurately the user in an automotive manner. A COTS FR methodology provides the best accuracy of identification and also avatar face recognition mainly introduced for the authentication purpose. FERET-to—avatar face dataset is used in this system for testing the efficiency of the algorithm. The psychological, social, and economic position of the user and their relevant avatar in the virtual world gives that avatars mostly mapped with their owners instead of being as a fully virtual design and implies high stability. The general template for recognizing avatar face requires the following three basic steps 1. 2. 3.
Detection of the face and image normalization Representation of face Matching.
Yong et al. [15] developed deep learning-based recognition of emotions for who wear head-mounted displays. Appropriate training to the convolutional neural
Review on Face Recognition Using Deep Learning …
851
network (CNN) is given by concealing eyebrows and eyes of the face image wears head mounted display (HMD) in an available dataset. This gives an excellent performance in the estimation of emotions from images of a person who wears HMD. Lang et al. [16] investigated the algorithms for face detection in the real-time environment and introduced an adaboost-based face detection algorithm which categorizes under multi-classifier model. This methodology generates a cascade basis in the training phase and the approach is comparatively strong to illumination, pose, and highly applicable for real-time systems (Table 1). Table 1 List of deep learning-based face recognition in virtual reality Authors and year
Description
Mesi et al. [6]
The pose aware model is used for Better in terms of accuracy tackling pose variations problem by during the recognition phase using a convolutional neural network with specific several poses
Inference
Almageed et al. [7]
Deep convolutional neural network Better in terms of verification layers and various pose model tasks and identification tasks selection is used in face recognition to solve pose variations problem in face recognition
Chen et al. [8]
DCNN performance was evaluated DCNN is performed better in on two new datasets such as IARPA terms of both the verification task benchmark A and IJB-A and identification task for both new datasets when compared to the fisher vector method
Su et al. [9]
SVM is used for detecting an occlusion in the image. Regression analysis is used for removing occlusion from the image
The experimental results concluded when compared to the reconstructed image; the non-occluded part for the face recognition image provides a better result
Golnari [11]
Combination of deep neural network with augmented reality to recognize the individual face
Recognition accuracy improved
Hbali [13]
Hybrid of augmented reality with HOG features for detection of face and eye
Virtual eyeglass try on system
Yampolskiy et al. [14]
COTS FR algorithm to verify and recognize avatar faces
The algorithm performs 99.58% accuracy
Yong et al. [15]
CCN is trained to measure the emotions from facial image wearing HMD
Accuracy of estimating emotion improved
852
V. Karunakaran et al.
4 Conclusion This research article gives insight into the facial detection system based on machine learning approaches in virtual reality applications. This new research direction paves the way to further research enhancement on visual media. This research challenges and future scope in the research field of facial recognition were discussed. The combined approach of the deep learning algorithm and augmented reality enhanced the identification of the face of an individual with improved accuracy.
References 1. S. Du, R. Ward, Face recognition under pose variations. J. Franklin Inst. 343(6), 596–613 (2006) 2. R. Min, A. Hadid, J.L. Dugelay, Efficient detection of occlusion prior to robust face recognition. Sci. World J. (2014) 3. R. Singh, M. Vatsa, A. Noore, Recognizing face images with disguise variations. Recent Adv. Face Recogn. 149–160 (2008) 4. A.A. Yusuf, F.S. Mohamad, Z. Sufyanu, A state of the art comparison of databases for facial occlusion. Jurnal Teknologi77(13) (2015) 5. F. Prikler, Evaluation of emotional state of a person based on facial expression, in 2016 XII International Conference on Perspective Technologies and Methods in MEMS Design (MEMSTECH) (IEEE, 2016), pp. 161–163 6. I. Masi, F.-J. Chang, J. Choi, S. Harel, J. Kim, K. Kim, J. Leksut et al., Learning pose-aware models for pose-invariant face recognition in the wild. IEEE Trans. Pattern Anal. Mach. Intell. 41(2), 379–393 (2018) 7. A. Almageed, Y.W. Wael, S. Rawls, S. Harel, T. Hassner, I. Masi, J. Choi et al., Face recognition using deep multi-pose representations, in 2016 IEEE Winter Conference on Applications of Computer Vision (WACV) (IEEE, 2016), pp. 1–9 8. J.C. Chen, V.M. Patel, R. Chellappa, Unconstrained face verification using deep cnn features, in 2016 IEEE Winter Conference on Applications of Computer Vision (WACV) (IEEE, 2016), pp. 1–9 9. Y. Su, Y. Yang, Z. Guo, W. Yang, Face recognition with occlusion, in 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR) (Kuala Lumpur, 2015), pp. 670–674. https://doi. org/10.1109/ACPR.2015.7486587 10. M. Wang, et al.,VR content creation and exploration with deep learning: a survey. Comput. Visual Media 1–26 (2020) 11. A. Golnari, H. Khosravi, S. Sanei, DeepFaceAR: deep face recognition and displaying personal information via augmented reality, in 2020 International Conference on Machine Vision and Image Processing (MVIP) (IEEE, 2020) 12. Hammed, S. Shaul, A. Sabanayagam, E. Ramakalaivani, A review on facial expression recognition systems. J. Crit. Rev. 7(4) (2019) 13. Y. Hbali, M. Sadgal, A.E. Fazziki, Object detection based on HOG features: faces and dual-eyes augmented reality, in2013 World Congress on Computer and Information Technology (WCCIT) (IEEE, 2013) 14. R.V. Yampolskiy, B. Klare, A.K. Jain,Face recognition in the virtual world: recognizing avatar faces, in 2012 11th International Conference on Machine Learning and Applications, vol. 1 (IEEE, 2012)
Review on Face Recognition Using Deep Learning …
853
15. H. Yong, J. Lee, J. Choi, Emotion recognition in gamers wearing head-mounted display, in 2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR) (IEEE, 2019) 16. L. Yong, W. Gu,Study of face detection algorithm for real-time face detection system, in 2009 Second International Symposium on Electronic Commerce and Security, vol. 2 (IEEE, 2009)
Steganalysis for Images Security Classification in Machine Learning Using SVM P. Karthika, B. Barani Sundaram, Tucha Kedir, Tesfaye Tadele Sorsa, Nune Sreenivas, Manish Kumar Mishra, and Dhanabal Thirumoorthy
Abstract Grouping is one of the most important errands for different applications, such as text order, tone recognition, image characterization, articulation of miniature cluster efficiency, protein function forecasts, and classification of details. A significant portion of the current controlled order modulation formats on traditional measurements that can give ideal results when the test size is maintained. Nonetheless, only minimal examples can be obtained through and through. A new learning technique, support vector machine (SVM), is used in this paper on different data (diabetes knowledge, heart data, satellite data, and shuttle data) that have two or multigroups. SVM, an impressive computer technique created from observable learning, has achieved important achievements in some fields. Introduced in the mid-nineties, they inspired a blast of excitement for AI. The SVM establishments were founded by Vapnik and, due to numerous attractive highlights and promising precise execution, are gaining ubiquity in the field of AI. The SVM technique does not withstand information computational complexity constraints and minimal examples. Keywords Support vector machine · Classification · Machine learning · Security
P. Karthika (B) Kalasalingam Academy of Research and Education, Krishnankoil, India B. Barani Sundaram Computer Science Department, College of Informatics, Bule Hora University, Bule Hora, Ethiopia T. Kedir · T. T. Sorsa College of Informatics, Bule Hora University, Bule Hora, Ethiopia N. Sreenivas School of Electrical and Computer Engineering, Addis Ababa Institute of Technology, Addis Ababa University, Addis Ababa, Ethiopia M. K. Mishra Department of Computer Science, University of Gondar, Gondar, Ethiopia D. Thirumoorthy Blue Hora University, Blue Hora, Ethiopia © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0_65
855
856
P. Karthika et al.
1 Introduction Vapnik first suggested the support vector machine (SVM) and has since been seriously enthusiastic about the AI research network [1]. A few ongoing investigations have detailed that the SVM (uphold vector machines) is often designed to convey better arrangement accuracy than the other calculations of the information order [2–5]. In a wide range of genuine problems, Sims have been used, such as text order, manually written digital recognition, tone recognition, image characterization, object identification, miniature cluster quality articulation data analysis, and information arrangement. It has been proposed that Sims is better than other controlled learning strategies in a reliable way [6–9]. For certain datasets, however, the SVM display is responsive to how the cost boundaries and portion boundaries are set. Accordingly, in order to figure out the optimal boundary environment, the client typically needs to perform broad cross-approval. As model determination, this loop is typically referred to. One practical problem with model selection is that this cycle is rather repetitive. We also explored numerous avenues for different limits related to the use of the SVM calculation that can affect the results [10]. These limits include bit job decision, the standard deviation of the Gaussian section, relative loads associated with slack factors to reflect the non-uniform transmission of marked information, and the amount of model preparation. For example, we have taken four specific information index applications, for analysis, diabetes information, heart information, and satellite information, all of which have different highlights, grades, number of preparation information, and distinctive number of testing information.
2 Support Vector Machine Algorithm Backing vector machine or SVM is one of the most famous supervised learning calculations, and it is used for image classification techniques. In machine learning, classification techniques play a vital role to solve the regression issues. The SVM approach helps to identify the best line or n-dimensional space to isolate the classes, and it will make a new information region without any stretch points. Hyperplane is defined as choice limit. Extraordinary focus in SVM picks the hyperplane, so extraordinary focus called as help vectors and this estimation is known as support vector machine (SVM) [11]. Consider the underneath outline in which there are two distinct classifications that are characterized utilizing a choice limit or hyperplane as shown in Fig. 1. Model: SVM can be perceived with the model that we have utilized in the KNN classifier. Assume that we see an abnormal feline that additionally has a few highlights of canines, so on the off chance that we need a model that can precisely recognize whether it is a feline or canine, so such a model can be made by utilizing the SVM calculation [12]. We will initially prepare our model with bunches of pictures of
Steganalysis for Images Security Classification …
857
Fig. 1 SVM classified using a decision boundary or hyperplane
felines and canines so it can find out about various highlights of felines and canines, and afterward, we test it with this odd animal. So, as help vector makes a choice limit between these two information (feline and canine) and pick extraordinary cases (uphold vectors), it will see the outrageous instance of feline and canine [13]. Based on the help vectors, it will arrange it as a feline. Consider the chart is shown in Fig. 2: SVM algorithm can be used for face detection, image classification, text categorization, etc.
Fig. 2 Example of SVM classifier identifier original or copy image
858
P. Karthika et al.
3 Implementation of Hyperplane for Linear and Nonlinear SVM 3.1 Linear SVM The working of the SVM calculation can be perceived by utilizing a model. Assume that we have a dataset that has two labels (green and blue), and the dataset has two highlights x1 and x2. We need a classifier that can group the pair (x1, x2) of directions in either green or blue. Consider the beneath picture as shown in Fig. 3. So, as it is 2D space so by utilizing a straight line, we can without much of a stretch separate these two classes. In any case, there can be various lines that can isolate these classes. Consider the picture as shown in Fig. 4. Thus, the SVM calculation assists with finding the best line or choice limit; this best limit or locale is called hyperplane. The closest purpose of the lines from both classes is found by the SVM calculation. This focus area is referred to as support vectors. An edge is called the distance between some of the vectors as well as the hyperplane. What is more, SVMs target is to extend this edge. The highest edge hyperplane is recognized as the perfect hyperplane as shown in Fig. 5.
Fig. 3 Working of SVM algorithm
Steganalysis for Images Security Classification … Fig. 4 SVM algorithm using line or decision boundary
Fig. 5 SVM algorithm goal to maximize margin
859
860
P. Karthika et al.
Fig. 6 SVM algorithm using nonlinear data
3.2 Nonlinear SVM In the event that information is directly masterminded, at that point, we can isolate it by utilizing a straight line, yet for non-straight information, we cannot draw a solitary straight line. Consider the picture as shown in Fig. 6. So, to isolate these information focuses, we have to include one more measurement. For straight information, we have utilized two measurements x and y, so for non-direct information, we will include a third measurement z. It very well may be determined as: z = x 2 + y2
(1)
The input vector of the SVM maps to a larger dimensional space, where a maximum isolating hyperplane is created. On each side of the hyperplane, two equal hyperplanes are generated that distinguish the details. The isolating hyperplane is the hyperplane that increases the gap between the two equal hyperplans. The presumption is that the greater the edge or separation between these equal hyperplanes, the better the classifier’s speculation blunder would be [2]. We recognize the structure’s knowledge purposes {(x1 , y1 ), (x2 , y2 ), (x3 , y3 ), (x4 , y4 ) . . . , (xn , yn )}. where yn = 1/−1, a steady sense of the class, where x n has a position for that point. N = the test number. Each x n is a true p-dimensional vector. In order to prepare for vector (ascribes), the scaling is imperative with greater variance. This training knowledge can be seen by methods for isolating (or separating) the hyperplane, which involves
Steganalysis for Images Security Classification …
w·x +b =o
861
(2)
On the off chance that the preparation information is straightly distinguishable, we can choose these hyperplanes so that there are no focuses among them and afterward attempt to expand their separation. w·x + b = 1 w · x + b = −1 We find the separation between the hyperplane is 2/|w| by mathematics. So, we need to restrict |w|. In order to energies the focus of knowledge, we have to guarantee that either for all I. By math, we discover the separation between the hyperplane is 2/|w|. So, we need to limit |w|. To energize information focuses, we have to guarantee that for all I either w · xi − b ≥ 1 or w · xi − b ≤ −1 This can be written as yi ( w · xi − b) ≥ 1, 1 ≤ i ≤ n
(3)
Tests are called support vectors (SVs) along the hyperplane in Fig. 7 isolating hyperplane with the biggest edge characterized by M = 2/|w| that is indicates uphold vectors implies preparing information guides storage rooms toward it. y j w T · x j + b = 1, i = 1 Fig. 7 Maximum margin hyperplanes with samples from two groups trained for an SVM
(4)
862
P. Karthika et al.
Ideal canonical hyperplane (OCH) is an authoritative hyperplane having a greatest edge. For all the information, OCH ought to fulfill the accompanying limitations yi w T · xi + b ≥ 1; i = 1, 2 . . . l
(5)
to locate the ideal isolating hyperplane having a maximal edge, a learning machine ought to limit w 2 subject to the disparity requirements. This enhancement issue illuminated by the seat purposes of the Lagrange’s function l L P = L (w,b,α) = 1/2w2 −
1
αi (yi (w T xi + b) − 1)
i=1
= 1/2w T w −
1
αi (yi (w T xi + b) − 1)
(6)
i=1
where α i is a Lagranges multiplier. The quest for an ideal seat focuses (w0 , b0 , α 0 ) is essential on the grounds that Lagranges must be limited concerning w and b and must be boosted as for nonnegative (α i ≥ 0).
4 Analysis for Steganography Method and SVM Classification To build the security and the size of put away information, another versatile LSB method is utilized. Rather than putting away the information in each most unnoteworthy piece of the pixels, this method attempts to utilize more than the slightest bit in a pixel so that this change would not influence the visual appearance of the host picture. It utilizes the side data of neighboring pixels to gauge the quantity of touch which can be conveyed in the pixels of the host picture to shroud the mystery information.
Steganalysis for Images Security Classification …
863
864
P. Karthika et al.
The ongoing writing accentuates on the impact of AI procedures for the steganalysis applications, yet at the same time just, a couple of writing has examined this technique, leaving a road to investigate the examination, in view of AI. It is discovered that steganalysis utilizing factual methodology would offer promising results when the techniques for AI for picture order is applied. The ensuing segments of this paper talk about the system and the test results for the proposed technique. The following three tables relate to the different outcomes of the exams. The best estimate of different RBF boundary esteem (C, γ ) and cross-approval rate with 5crease cross-approval using the matrix search method [5], 6 is shown in Table 1. The cumulative execution time for all data to anticipate accuracy in a moment or two is shown in Table 2.
Steganalysis for Images Security Classification …
865
Table 1 Best bargain for various RBF value parameters (C, γ ) Applications
Training data
Testing data
Best g and c with five crease (C, γ )
Cross-validation rate
Diabetes data
600
300
212 = 3048, 2*8 = 0.008048
85.7
Heart data
300
70
26 = 45, 2*8 = 0.008048
92.6
Satellite data
5435
3000
21 = 2, 21 = 2
95.768
15,435
216
Shuttle data
5350
Table 2 Execution time using SVM in seconds
= 33,748,
21
=2
99.72
Applications
Complete time to predict execution
Diabetes data
SVM
RSES
81
24
Heart data
33
7.5
Satellite data
84,849
95
Shuttle data
262,234.1
320
Fig. 8 Heart data accuracy with SVM
866
P. Karthika et al.
Fig. 9 Accuracy diabetes data with SVM
Figures 8 and 9 indicate the accuracy of diabetes information index correlation following the adoption of diverse preparation set but all tests set for both procedures (SVM and RSES) using RBF component and rule base classifier for work in SVM.
5 Conclusion The online characterization task is a lot costlier utilizing a straight and non-direct classifier contrasted and a various leveled SVM. Truth be told, each order task requires the Euclidian separation estimation to all focuses in the dataset, which would be a costly expense to cause within the sight of a huge dataset. In this paper, the comparable findings were shown using distinctive bit capacities. The related after effects of different data samples using different sections of direct, polynomial, sigmoid, and RBF are shown in Figs. 8 and 9. The results of the trial are empowering. It can be seen that the choice of portion ability and best approximation of precise bit boundaries are important for a given knowledge measure.
Steganalysis for Images Security Classification …
867
References 1. U.H.-G. Kressel, Pairwise classification and support vector machines, in Advances in Kernel Methods: Support Vector Learning, ed. by B. Schölkopf, C.J.C. Burges, A.J. Smola (Massachusetts Institute of Technology Press, Cambridge, MA, 1999), pp. 255–268 2. P. Karthika, P. Vidhya Saraswathi, IoT using machine learning security enhancement in video steganography allocation for Raspberry Pi. J. Ambient Intell. Human. Comput. (2020). https:// doi.org/10.1007/s12652-020-02126-4 3. S. Koknar-Tezel, L.J. Latecki, Improving SVM classification on imbalanced data sets in distance spaces, in ICDM ’09: Proceedings of the Ninth IEEE International Conference on Data Mining, ed. by W. Wang, H. Kargupta, S. Ranka, P.S. Yu, X. Wu (Institute for Electrical and Electronics Engineers, Piscataway, NJ, 2010), pp. 259–267 4. X. Liu, Y. Ding, General scaled support vector machines, in ICMLC 2011: Proceedings of the 3rd International Conference on Machine Learning and Computing (Institute of Electrical and Electronics Engineers, Piscataway, NJ, 2011) 5. P. Karthika, P. Vidhya Saraswathi, Content based video copy detection using frame based fusion technique. J. Adv. Res. Dyn. Control Syst. (JARDCS) 9, Sp-17/2017, 885–894 (2017). Online ISSN 1943-023x 6. Y. Rizk, N. Mitri, M. Awad, A local mixture based SVM for an efficient supervised binary classification, in IJCNN 2013: Proceedings of the International Joint Conference on Neural Networks (Institute for Electrical and Electronics Engineers, Piscataway, NJ, 2013), pp. 1–8 7. P. Karthika, P. Vidhya Saraswathi, A survey of content based video copy detection using big data. Int. J. Sci. Res. Sci. Technol. (IJSRST) 3(5), 01–08 (2017). Print ISSN : 2395-6011, Online ISSN: 2395-602X. https://ijsrst.com/ICASCT2519.php 8. Y. Rizk, N. Mitri, M. Awad, An ordinal kernel trick for a computationally efficient support vector machine, in IJCNN 2014: Proceedings of the 2014 International Joint Conference on Neural Networks (Institute for Electrical and Electronics Engineers, Piscataway, NJ, 2014), pp. 3930–3937 9. P. Karthika, P. Vidhya Saraswath, Digital video copy detection using steganography frame based fusion techniques, in Proceedings of the International Conference on ISMAC in Computational Vision and Bio-Engineering 2018 (ISMAC-CVB). ISMAC 2018. Lecture Notes in Computational Vision and Biomechanics, ed. by D. Pandian, X. Fernando, Z. Baig, F. Shi, vol. 30 (Springer, Cham, 2019). https://doi.org/10.1007/978-3-030-00665-5_7 10. M. Stockman, M. Awad, Multistage SVM as a clinical decision making tool for predicting post operative patient status, in IKE ’10: Proceedings of the 2010 International Conference on Information and Knowledge Engineering (Athens, GA, CSREA, 2010) 11. P. Karthika, P. Vidhya Saraswathi, Image security performance analysis for SVM and ANN classification techniques. Int. J. Recent Technol. Eng. (IJRTE) 8(4S2), 436–442 (2019) (Publisher: Blue Eyes Intelligence Engineering & Sciences Publication) 12. Tax, M.J. David, R.P.W. Ruin, Support vector domain description. Pattern Recogn. Lett. 20, 1191–1199 (1999) 13. P. Karthika, P. Vidhya Saraswathi, Raspberry Pi—a tool for strategic machine learning security allocation in IoT. Apple Academic Press/CRC Press (A Taylor & Francis Group). Proposal has been accepted (provisionally) for the book entitled “Making Machine Intelligent by Artificial Learning”, to be published by CRC Press (2020)
Author Index
A Abdulbaqi, Azmi Shawkat, 209 Aggrawal, Ritu, 469 Akshita, 771 Al-barizinji, Shokhan M., 209 Ambili, A. V., 341 Amritha Varshini, S., 431 Ancy, C. A., 41 Anitha, Raghavendra, 553 Anitha, S., 747 Anto Sahaya Dhas, D., 507 Aravinth, J., 273, 431 Arun Deshpannde, Pavan, 527 Aseffa, Dereje Tekilu, 401 Athisayamani, Suganya, 95 Ayane, Tadesse Hailu, 401
B Baalamurugan, K. M., 771 Bacanin, Nebojsa, 689 Balamurugan, Varshini, 231 Balasubramanian, Noviya, 231 Baldota, Siddhant, 353, 369 Barani Sundaram, B., 855 Barhate, Deepti, 383 Bezdan, Timea, 689 Bhat, M. Nirupama, 719 Bijeesh, T. V., 303 Biradar, Kashinath, 527 Bodile, Roshan M., 175
C Chandra Sekhar, P. N. R. L, 457
Chaurasia, Rajashree, 57 Chellatamilan, T., 327 Cherian, Aswathy K., 223 Chhajed, Gyankamal J., 587 Chowdary, Ambati Aaryani, 71 Chowday, Jampani Sai Monisha, 71 Cvetnic, Dusan, 689
D Dalal, Vishwas, 165 Desai, Shrinivas, 541 Dhanasekaran, S., 95 Dharneeshkar, J., 231
E Easwarakumar, K. S., 191 El Emary, Ibrahiem M. M., 341 Ellappan, V., 401
G Gajic, Luka, 689 Gaonkar, Manisha Naik, 483 Garg, Bindu R., 587 Ghose, Udayan, 57 Ghosh, Sumantra, 383 Gopi Krishna, T., 401 Govardhan, N., 165 Guru, D. S., 553
H Hada, Nishita, 665
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 S. Smys et al. (eds.), Computational Vision and Bio-Inspired Computing, Advances in Intelligent Systems and Computing 1318, https://doi.org/10.1007/978-981-33-6862-0
869
870 Haldorai, Anandakumar, 781, 795, 813, 829 Hanumantha Rao, T. V. K., 175 Harshita, C., 29 Htoo, Khine, 617
I Ilavarasan, E., 643 Inbarani, Hannah, 707 Indumathi, P., 415 Iwin Thanakumar Joseph, S., 845
J Jain, Sejal, 665 Janapati, Ravichander, 165 Jhala, Dhruvilsinh, 383 John, Jisha, 135 Joseph, S. Iwin Thanakumar, 497 Joshi, Shubham, 665 Joy, K. R., 599 Juyal, Piyush, 675
K Kalla, Harish, 401 Kanade, Vijay A., 449 Kannur, Anil, 627 Karthika, P., 855 Karthika, R., 231, 245, 273 Karunakaran, V., 497, 845 Kavitha, S., 707 Kedir, Tucha, 855 Keerthi, K. V. L., 457 Khaitan, Nimisha, 369 Kohli, Himani, 315 Krishna Teja, V., 457 Krishnan, R., 15 Krithiga, R., 643 Kumar, Manoj, 315
L Lohani, Manoj Chandra, 259
M Madhusudhan, S., 747 Malathy, C., 353 Malviya, Utsav Kumar, 107 Mamta, P., 121 Manjunath Aradhya, V. N., 553 Manubolu, Satheesh, 71 Milosevic, Stefan, 689
Author Index Mishra, Manish Kumar, 855 Mishra, Satyasis, 401 Misra, Rajesh, 147 Murthy, G. S. N., 567 Murugan, A., 15
N Naik, Ashwini Dayanand, 81 Najim, Saif Al-din M., 209 Narasimhamurthy, K. N., 303 Niranjan, D. K., 1
P Pai, Maya L., 41 Pais, Alwyn R., 287 Pal, Saurabh, 469 Pandian, R., 415 Pandiaraj, Shanthini, 845 Panessai, Ismail Yusuf, 209 Pant, Himanshu, 259 Pant, Janmejay, 259 Parameswaran, Latha, 245 Pathak, Aaditya, 383 Patil, Kiran H., 719 Petshali, Prachi, 259 Pinge, Anuja, 483 Poornima, A. R., 231 Poovammal, E., 223, 369 Prasad, S. V. A. V., 121 Pravin, A., 415 Prem Jacob, T., 415
R Rahul, Kamma, 71 Raja Kumar, R., 415 Rajan, Muktha, 231 Rajasekar, V., 497 Rakesh, N., 1 Ram, Mylavarapu Kalyan, 567 Ramu, Arulmurugan, 781, 795, 813, 829 Ramya, P., 327 Rani, Anuj, 315 Rathore, Vivek Singh, 107 Ravi Teja, T., 29 Ravikumar, Aswathy, 135 Raviraja Holla, M., 287 Ray, Kumar Sankar, 147 Renuka, B. S., 527 Robert Singh, A., 95 Robin, Mirya, 135 Rohith Sri Sai, M., 29
Author Index S Sagar, Parth, 315 Sai Pavan, E. J., 327 Sai Teja, T., 29 Sankara Narayanan, S., 95 Santhi, K., 327 Sasikala, 199 Savakar, Dayanand G., 627 Sein, Myint Myint, 617 Sengupta, Rakesh, 165 Senthil Kumar, A. V., 341 Shalini, 199 Shankar, T. N., 457 Sharma, Rubal, 369 Sharma, Sachin, 675 Shukla, Akshat, 771 Shyamala Devi, M., 71 Soni, Shreya, 665 Sorsa, Tesfaye Tadele, 855 Spandana, M., 527 Sreenivas, Nune, 855 Srinivas, Rayudu, 567 Srivastava, Atul Kumar, 315 Sudha, V., 191 Sujana, Challapalli, 567
871 Sumithra, R., 553 Sunag, Bhagya, 541 Sunitha, P. J., 599 Supriya, M., 81 Surekha, B. S., 527
T Telsang, Danesh, 627 Thirumoorthy, Dhanabal, 855 Thomas, Vinod J., 507
V Valarmathi, B., 327 Varshaa, K. S., 273
Y Yaqub, Tanya, 771
Z Zeelan Basha, C. M. A. K., 29 Zivkovic, Miodrag, 689