161 6 156MB
English Pages [1435]
Advances in Intelligent Systems and Computing 1108
S. Smys João Manuel R. S. Tavares Valentina Emilia Balas Abdullah M. Iliyasu Editors
Computational Vision and Bio-Inspired Computing ICCVBIC 2019
Advances in Intelligent Systems and Computing Volume 1108
Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Nikhil R. Pal, Indian Statistical Institute, Kolkata, India Rafael Bello Perez, Faculty of Mathematics, Physics and Computing, Universidad Central de Las Villas, Santa Clara, Cuba Emilio S. Corchado, University of Salamanca, Salamanca, Spain Hani Hagras, School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK László T. Kóczy, Department of Automation, Széchenyi István University, Gyor, Hungary Vladik Kreinovich, Department of Computer Science, University of Texas at El Paso, El Paso, TX, USA Chin-Teng Lin, Department of Electrical Engineering, National Chiao Tung University, Hsinchu, Taiwan Jie Lu, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW, Australia Patricia Melin, Graduate Program of Computer Science, Tijuana Institute of Technology, Tijuana, Mexico Nadia Nedjah, Department of Electronics Engineering, University of Rio de Janeiro, Rio de Janeiro, Brazil Ngoc Thanh Nguyen , Faculty of Computer Science and Management, Wrocław University of Technology, Wrocław, Poland Jun Wang, Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong
The series “Advances in Intelligent Systems and Computing” contains publications on theory, applications, and design methods of Intelligent Systems and Intelligent Computing. Virtually all disciplines such as engineering, natural sciences, computer and information science, ICT, economics, business, e-commerce, environment, healthcare, life science are covered. The list of topics spans all the areas of modern intelligent systems and computing such as: computational intelligence, soft computing including neural networks, fuzzy systems, evolutionary computing and the fusion of these paradigms, social intelligence, ambient intelligence, computational neuroscience, artificial life, virtual worlds and society, cognitive science and systems, Perception and Vision, DNA and immune based systems, self-organizing and adaptive systems, e-Learning and teaching, human-centered and human-centric computing, recommender systems, intelligent control, robotics and mechatronics including human-machine teaming, knowledge-based paradigms, learning paradigms, machine ethics, intelligent data analysis, knowledge management, intelligent agents, intelligent decision making and support, intelligent network security, trust management, interactive entertainment, Web intelligence and multimedia. The publications within “Advances in Intelligent Systems and Computing” are primarily proceedings of important conferences, symposia and congresses. They cover significant recent developments in the field, both of a foundational and applicable character. An important characteristic feature of the series is the short publication time and world-wide distribution. This permits a rapid and broad dissemination of research results. ** Indexing: The books of this series are submitted to ISI Proceedings, EI-Compendex, DBLP, SCOPUS, Google Scholar and Springerlink **
More information about this series at http://www.springer.com/series/11156
S. Smys João Manuel R. S. Tavares Valentina Emilia Balas Abdullah M. Iliyasu •
•
•
Editors
Computational Vision and Bio-Inspired Computing ICCVBIC 2019
123
Editors S. Smys Department of CSE RVS Technical Campus Coimbatore, India Valentina Emilia Balas Faculty of Engineering Aurel Vlaicu University of Arad Arad, Romania
João Manuel R. S. Tavares Faculty of Engineering Faculdade de Engenharia da Universidade do Porto Porto, Portugal Abdullah M. Iliyasu School of Computing Tokyo Institute of Technology Tokyo, Japan
ISSN 2194-5357 ISSN 2194-5365 (electronic) Advances in Intelligent Systems and Computing ISBN 978-3-030-37217-0 ISBN 978-3-030-37218-7 (eBook) https://doi.org/10.1007/978-3-030-37218-7 © Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
We are honored to dedicate the proceedings of ICCVBIC 2019 to all the participants and editors of ICCVBIC 2019.
Foreword
It is with deep satisfaction that I write this foreword to the proceedings of the ICCVBIC 2019 held in RVS Technical Campus, Coimbatore, Tamil Nadu, on September 25–26, 2019. This conference was bringing together researchers, academics, and professionals from all over the world, experts in computational vision and bio-inspired computing. This conference particularly encouraged the interaction of research students and developing academics with the more established academic community in an informal setting to present and to discuss new and current work. The papers contributed the most recent scientific knowledge known in the field of computational vision, soft computing, fuzzy, image processing, and bio-inspired computing. Their contributions helped to make the conference as outstanding as it has been. The local organizing committee members and their helpers put much effort into ensuring the success of the day-to-day operation of the meeting. We hope that this program will further stimulate research in computational vision, soft computing, fuzzy, image processing, and bio-inspired computing and provide practitioners with better techniques, algorithms, and tools for deployment. We feel honored and privileged to serve the best recent developments to you through this exciting program. We thank all authors and participants for their contributions. S. Smys Conference Chair
vii
Preface
This conference proceedings volume contains the written versions of most of the contributions presented during the conference of ICCVBIC 2019. The conference provided a setting for discussing recent developments in a wide variety of topics including computational vision, fuzzy, image processing, and bio-inspired computing. The conference has been a good opportunity for participants coming from various destinations to present and discuss topics in their respective research areas. ICCVBIC 2019 conference tends to collect the latest research results and applications on computational vision and bio-inspired computing. It includes a selection of 147 papers from 397 papers submitted to the conference from universities and industries all over the world. All of accepted papers were subjected to strict peer-reviewing by 2–4 expert referees. The papers have been selected for this volume because of quality and the relevance to the conference. ICCVBIC 2019 would like to express our sincere appreciation to all authors for their contributions to this book. We would like to extend our thanks to all the referees for their constructive comments on all papers; especially, we would like to thank to organizing committee for their hardworking. Finally, we would like to thank the Springer publications for producing this volume. S. Smys Conference Chair
ix
Acknowledgments
ICCVBIC 2019 would like to acknowledge the excellent work of our conference organizing the committee, keynote speakers for their presentation on September 25–26, 2019. The organizers also wish to acknowledge publicly the valuable services provided by the reviewers. On behalf of the editors, organizers, authors, and readers of this conference, we wish to thank the keynote speakers and the reviewers for their time, hard work, and dedication to this conference. The organizers wish to acknowledge Dr. Smys, Dr. Valentina Emilia Balas, Dr. Abdul M. Elias, and Dr. Joao Manuel R. S. Tavares for the discussion, suggestion, and cooperation to organize the keynote speakers of this conference. The organizers also wish to acknowledge for speakers and participants who attend this conference. Many thanks are given to all persons who help and support this conference. ICCVBIC 2019 would like to acknowledge the contribution made to the organization by its many volunteers. Members contribute their time, energy, and knowledge at local, regional, and international levels. We also thank all the chair persons and conference committee members for their support.
xi
Contents
Psychosocial Analysis of Policy Confidence Through Multifactorial Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jesús Silva, Darwin Solano, Claudia Fernández, Ligia Romero, and Omar Bonerge Pineda Lezama Human Pose Detection: A Machine Learning Approach . . . . . . . . . . . . Munindra Kakati and Parismita Sarma
1
8
Machine Learning Algorithm in Smart Farming for Crop Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. D. Vindya and H. K. Vedamurthy
19
Analysis of Feature Extraction Algorithm Using Two Dimensional Discrete Wavelet Transforms in Mammograms to Detect Microcalcifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Subash Chandra Bose, Murugesh Veerasamy, Azath Mubarakali, Ninoslav Marina, and Elena Hadzieva
26
Prediction of Fetal Distress Using Linear and Non-linear Features of CTG Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E. Ramanujam, T. Chandrakumar, K. Nandhana, and N. T. Laaxmi
40
Optimizing Street Mobility Through a NetLogo Simulation Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jesús Silva, Noel Varela, and Omar Bonerge Pineda Lezama
48
Image Processing Based Lane Crossing Detection Alert System to Avoid Vehicle Collision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. S. Priyangha, B. Thiyaneswaran, and N. S. Yoganathan
56
Stratified Meta Structure Based Similarity Measure in Heterogeneous Information Networks for Medical Diagnosis . . . . . . . . . . . . . . . . . . . . . Ganga Gireesan and Linda Sara Mathew
62
xiii
xiv
Contents
An Effective Imputation Model for Vehicle Traffic Data Using Stacked Denoise Autoencoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S. Narmadha and V. Vijayakumar Fruit Classification Using Traditional Machine Learning and Deep Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. Saranya, K. Srinivasan, S. K. Pravin Kumar, V. Rukkumani, and R. Ramya Supervised and Unsupervised Learning Applied to Crowdfunding . . . . Oscar Iván Torralba Quitian, Jenny Paola Lis-Gutiérrez, and Amelec Viloria Contextual Multi-scale Region Convolutional 3D Network for Anomalous Activity Detection in Videos . . . . . . . . . . . . . . . . . . . . . . M. Santhi and Leya Elizabeth Sunny
71
79
90
98
Noise Removal in Breast Cancer Using Hybrid De-noising Filter for Mammogram Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 D. Devakumari and V. Punithavathi Personality Trait with E-Graphologist . . . . . . . . . . . . . . . . . . . . . . . . . . 120 Pranoti S. Shete and Anita Thengade WOAMSA: Whale Optimization Algorithm for Multiple Sequence Alignment of Protein Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Manish Kumar, Ranjeet Kumar, and R. Nidhya Website Information Architecture of Latin American Universities in the Rankings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 Carmen Luisa Vásquez, Marisabel Luna-Cardozo, Maritza Torres-Samuel, Nunziatina Bucci, and Amelec Viloria Silva Classification of Hybrid Multiscaled Remote Sensing Scene Using Pretrained Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . 148 Sujata Alegavi and Raghvendra Sedamkar Object Detection and Classification Using GPU Acceleration . . . . . . . . . 161 Shreyank Prabhu, Vishal Khopkar, Swapnil Nivendkar, Omkar Satpute, and Varshapriya Jyotinagar Cross Modal Retrieval for Different Modalities in Multimedia . . . . . . . . 171 T. J. Osheen and Linda Sara Mathew Smart Wearable Speaking Aid for Aphonic Personnel . . . . . . . . . . . . . . 179 S. Aswin, Ayush Ranjan, and K. V. Mahendra Prashanth Data Processing for Direct Marketing Through Big Data . . . . . . . . . . . 187 Amelec Viloria, Noel Varela, Doyreg Maldonado Pérez, and Omar Bonerge Pineda Lezama
Contents
xv
Brain Computer Interface Based Epilepsy Alert System Using Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 Sayali M. Dhongade, Nilima R. Kolhare, and Rajendra P. Chaudhari Face Detection Based Automation of Electrical Appliances . . . . . . . . . . 204 S. V. Lokesh, B. Karthikeyan, R. Kumar, and S. Suresh Facial Recognition with Open Cv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 Amit Thakur, Abhishek Prakash, Akshay Kumar Mishra, Adarsh Goldar, and Ayush Sonkar Dynamic Link Prediction in Biomedical Domain . . . . . . . . . . . . . . . . . . 219 M. V. Anitha and Linda Sara Mathew Engineering Teaching: Simulation, Industry 4.0 and Big Data . . . . . . . . 226 Jesús Silva, Mercedes Gaitán, Noel Varela, and Omar Bonerge Pineda Lezama Association Rule Data Mining in Agriculture – A Review . . . . . . . . . . . 233 N. Vignesh and D. C. Vinutha Human Face Recognition Using Local Binary Pattern Algorithm - Real Time Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 P. Shubha and M. Meenakshi Recreation of 3D Models of Objects Using MAV and Skanect . . . . . . . . 247 S. Srividhya, S. Prakash, and K. Elangovan Inferring Machine Learning Based Parameter Estimation for Telecom Churn Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 J. Pamina, J. Beschi Raja, S. Sam Peter, S. Soundarya, S. Sathya Bama, and M. S. Sruthi Applying a Business Intelligence System in a Big Data Context: Production Companies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 Jesús Silva, Mercedes Gaitán, Noel Varela, Doyreg Maldonado Pérez, and Omar Bonerge Pineda Lezama Facial Expression Recognition Using Supervised Learning . . . . . . . . . . 275 V. B. Suneeta, P. Purushottam, K. Prashantkumar, S. Sachin, and M. Supreet Optimized Machine Learning Models for Diagnosis and Prediction of Hypothyroidism and Hyperthyroidism . . . . . . . . . . . . . . . . . . . . . . . . 286 Manoj Challa Plastic Waste Profiling Using Deep Learning . . . . . . . . . . . . . . . . . . . . . 294 Mahadevan Narayanan, Apurva Mhatre, Ajun Nair, Akilesh Panicker, and Ashutosh Dhondkar
xvi
Contents
Leaf Disease Detection Using Image Processing and Artificial Intelligence – A Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304 H. Parikshith, S. M. Naga Rajath, and S. P. Pavan Kumar Deep Learning Model for Image Classification . . . . . . . . . . . . . . . . . . . . 312 Sudarshana Tamuly, C. Jyotsna, and J. Amudha Optimized Machine Learning Approach for the Prediction of Diabetes-Mellitus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 Manoj Challa and R. Chinnaiyan Performance Analysis of Differential Evolution Algorithm Variants in Solving Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . 329 V. SandhyaSree and S. Thangavelu Person Authentication Using EEG Signal that Uses Chirplet and SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338 Sonal Suhas Shinde and Swati S. Kamthekar Artificial Neural Network Hardware Implementation: Recent Trends and Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345 Jagrati Gupta and Deepali Koppad A Computer Vision Based Fall Detection Technique for Home Surveillance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355 Katamneni Vinaya Sree and G. Jeyakumar Prediction of Diabetes Mellitus Type-2 Using Machine Learning . . . . . . 364 S. Apoorva, K. Aditya S, P. Snigdha, P. Darshini, and H. A. Sanjay Vision Based Deep Learning Approach for Dynamic Indian Sign Language Recognition in Healthcare . . . . . . . . . . . . . . . . . . . . . . . 371 Aditya P. Uchil, Smriti Jha, and B. G. Sudha A Two-Phase Image Classification Approach with Very Less Data . . . . 384 B. N. Deepankan and Ritu Agarwal A Survey on Various Shadow Detection and Removal Methods . . . . . . . 395 P. C. Nikkil Kumar and P. Malathi Medical Image Registration Using Landmark Registration Technique and Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402 R. Revathy, S. Venkata Achyuth Kumar, V. Vijay Bhaskar Reddy, and V. Bhavana Capsule Network for Plant Disease and Plant Species Classification . . . 413 R. Vimal Kurup, M. A. Anupama, R. Vinayakumar, V. Sowmya, and K. P. Soman
Contents
xvii
RumorDetect: Detection of Rumors in Twitter Using Convolutional Deep Tweet Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422 N. G. Bhuvaneswari Amma and S. Selvakumar Analyzing Underwater Videos for Fish Detection, Counting and Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431 G. Durga Lakshmi and K. Raghesh Krishnan Plant Leaf Disease Detection Using Modified Segmentation Process and Classificatıon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442 K. Nirmalakumari, Harikumar Rajaguru, and P. Rajkumar Neuro-Fuzzy Inference Approach for Detecting Stages of Lung Cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453 C. Rangaswamy, G. T. Raju, and G. Seshikala Detection of Lung Nodules Using Unsupervised Machine Learning Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463 Raj Kishore, Manoranjan Satpathy, D. K. Parida, Zohar Nussinov, and Kisor K. Sahu Comparison of Dynamic Programming and Genetic Algorithm Approaches for Solving Subset Sum Problems . . . . . . . . . . . . . . . . . . . . 472 Konjeti Harsha Saketh and G. Jeyakumar Design of Binary Neurons with Supervised Learning for Linearly Separable Boolean Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480 Kiran S. Raj, M. Nishanth, and Gurusamy Jeyakumar Image De-noising Method Using Median Type Filter, Fuzzy Logic and Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488 S. R. Sannasi Chakravarthy and Harikumar Rajaguru Autism Spectrum Disorder Prediction Using Machine Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496 Shanthi Selvaraj, Poonkodi Palanisamy, Summia Parveen, and Monisha Challenges in Physiological Signal Extraction from Cognitive Radio Wireless Body Area Networks for Emotion Recognition . . . . . . . 504 A. Roshini and K. V. D. Kiran Pancreatic Tumour Segmentation in Recent Medical Imaging – an Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514 A. Sindhu and V. Radha A Normal Approach of Online Signature Using Stroke Point Warping – A Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523 S. Prayla Shyry, P. Srivanth, and Derrick Thomson
xviii
Contents
Spectral Density Analysis with Logarithmic Regression Dependent Gaussian Mixture Model for Epilepsy Classification . . . . . . . . . . . . . . . 529 Harikumar Rajaguru and Sunil Kumar Prabhakar Comprehensive Review of Various Speech Enhancement Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536 Savy Gulati An Approach for Sentiment Analysis Using Gini Index with Random Forest Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541 Manpreet Kaur Mining High Utility Itemset for Online Ad Placement Using Particle Swarm Optimization Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 555 M. Keerthi, J. Anitha, Shakti Agrawal, and Tanya Varghese Classification of Lung Nodules into Benign or Malignant and Development of a CBIR System for Lung CT Scans . . . . . . . . . . . . 563 K. Bhavanishankar and M. V. Sudhamani Correlation Dimension and Bayesian Linear Discriminant Analysis for Alcohol Risk Level Detection . . . . . . . . . . . . . . . . . . . . . . . 576 Harikumar Rajaguru and Sunil Kumar Prabhakar A Smart and Secure Framework for IoT Device Based Multimedia Medical Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583 Shrujana Murthy and C. R. Kavitha Exploration of an Image Processing Model for the Detection of Borer Pest Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589 Yogini Prabhu, Jivan S. Parab, and Gaurish M. Naik Measurement of Acid Content in Rain Water Using Wireless Sensor Networks and Artificial Neural Networks . . . . . . . . . . . . . . . . . . 598 Sakshi Gangrade and Srinath R. Naidu Modified Gingerbreadman Chaotic Substitution and Transformation Based Image Encryption . . . . . . . . . . . . . . . . . . . . 606 S. N. Prajwalasimha, Sidramappa, S. R. Kavya, A. S. Hema, and H. C. Anusha Thresholding and Clustering with Singular Value Decomposition for Alcoholic EEG Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615 Harikumar Rajaguru and Sunil Kumar Prabhakar FPGA Implementation of Haze Removal Technique Based on Dark Channel Prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624 J. Varalakshmi, Deepa Jose, and P. Nirmal Kumar
Contents
xix
Impulse Noise Classification Using Machine Learning Classifier and Robust Statistical Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631 K. Kunaraj, S. Maria Wenisch, S. Balaji, and F. P. Mahimai Don Bosco An Online Platform for Diagnosis of Children with Reading Disability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645 P. Sowmyasri, R. Ravalika, C. Jyotsna, and J. Amudha Review on Prevailing Difficulties Using IriS Recognition . . . . . . . . . . . . 656 C. D. Divya and A. B. Rajendra Performance Analysis of ICA with GMM and HMM for Epilepsy Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 662 Harikumar Rajaguru and Sunil Kumar Prabhakar Evaluating the Performance of Various Types of Neural Network by Varying Number of Epochs for Identifying Human Emotion . . . . . . 670 R. Sofia and D. Sivakumar Bleeding and Z-Line Classification by DWT Based SIFT Using KNN and SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 679 R. Ponnusamy and S. Sathiamoorthy RNA-Seq DE Genes on Glioblastoma Using Non Linear SVM and Pathway Analysis of NOG and ASCL5 . . . . . . . . . . . . . . . . . . . . . . 689 Sandra Binoy, Vinai George Biju, Cynthia Basilia, Blessy B. Mathew, and C. M. Prashanth Evaluation of Growth and CO2 Biofixation by Spirulina platensis in Different Culture Media Using Statistical Models . . . . . . . . . . . . . . . 697 Carvajal Tatis Claudia Andrea, Suarez Marenco Marianella María, W. B. Morgado Gamero, Sarmiento Rubiano Adriana, Parody Muñoz Alexander Elías, and Jesus Silva Breast Cancer Detection Using CNN on Mammogram Images . . . . . . . 708 Kushal Batra, Sachin Sekhar, and R. Radha Segmentation Based Preprocessing Techniques for Predicting the Cervix Type Using Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . 717 M. B. Bijoy, A. Ansal Muhammed, and P. B. Jayaraj Design and Implementation of Biomedical Device for Monitoring Fetal ECG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 727 Rama Subramanian, C. Karthikeyan, G. Siva Nageswara Rao, and Ramasamy Mariappan A Comparative Review of Prediction Methods for Pima Indians Diabetes Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735 P. V. Sankar Ganesh and P. Sripriya
xx
Contents
Instructor Performance Evaluation Through Machine Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 751 J. Sowmiya and K. Kalaiselvi Advanced Driver Assistance System Using Computer Vision and IOT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 768 M. Hemaanand, P. Rakesh Chowdary, S. Darshan, S. Jagadeeswaran, R. Karthika, and Latha Parameswaran Segmentation and Classification of Primary Brain Tumor Using Multilayer Perceptron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 779 Shubhangi S. Veer (Handore), Anuoama Deshpande, P. M. Patil, and Mahendra V. Handore Dynamic and Chromatic Analysis for Fire Detection and Alarm Raising Using Real-Time Video Analysis . . . . . . . . . . . . . . . . . . . . . . . . 788 P. S. Srishilesh, Latha Parameswaran, R. S. Sanjay Tharagesh, Senthil Kumar Thangavel, and P. Sridhar Smart Healthcare Data Acquisition System . . . . . . . . . . . . . . . . . . . . . . 798 Tanvi Ghole, Shruti Karande, Harshita Mondkar, and Sujata Kulkarni A Sparse Representation Based Learning Algorithm for Denoising in Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 809 Senthil Kumar Thangavel and Sudipta Rudra A Collaborative Mobile Based Interactive Framework for Improving Tribal Empowerment . . . . . . . . . . . . . . . . . . . . . . . . . . . 827 Senthilkumar Thangavel, T. V. Rajeevan, S. Rajendrakumar, P. Subramaniam, Udhaya Kumar, and B. Meenakshi IoT Cloud Based Waste Management System . . . . . . . . . . . . . . . . . . . . 843 A. Sheryl Oliver, M. Anuradha, Anuradha Krishnarathinam, S. Nivetha, and N. Maheswari Vision Connect: A Smartphone Based Object Detection for Visually Impaired People . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 863 Devakunchari Ramalingam, Swapnil Tiwari, and Harsh Seth Two-Phase Machine Learning Approach for Extractive Single Document Summarization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 871 A. R. Manju Priya and Deepa Gupta Sensor Based Non Invasive Dual Monitoring System to Measure Glucose and Potassium Level in Human Body . . . . . . . . . . . . . . . . . . . . 882 K. T. Sithara Surendran and T. Sasikala A State of Art Methodology for Real-Time Surveillance Video Denoising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 894 Fancy Joy and V. Vijayakumar
Contents
xxi
Disease Inference on Medical Datasets Using Machine Learning and Deep Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 902 Arunkumar Chinnaswamy, Ramakrishnan Srinivasan, and Desai Prutha Gaurang Artificial Intelligence Based System for Preliminary Rounds of Recruitment Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 909 Sayli Uttarwar, Simran Gambani, Tej Thakkar, and Nikahat Mulla Eye Movement Event Detection with Deep Neural Networks . . . . . . . . . 921 K. Anusree and J. Amudha A Review on Various Biometric Techniques, Its Features, Methods, Security Issues and Application Areas . . . . . . . . . . . . . . . . . . . . . . . . . . 931 M. Gayathri, C. Malathy, and M. Prabhakaran Blind Guidance System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 942 A. S. Nandagopal and Ani Sunny A Novel Multimodal Biometrics System with Fingerprint and Gait Recognition Traits Using Contourlet Derivative Weighted Rank Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 950 R. Vinothkanna and P. K. Sasikumar Implementation of Real-Time Skin Segmentation Based on K-Means Clustering Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 964 Souranil De, Soumik Rakshit, Abhik Biswas, Srinjoy Saha, and Sujoy Datta An Image Processing Based Approach for Monitoring Changes in Webpages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 974 Raj Kothari and Gargi Vyas Vehicle Detection Using Point Cloud and 3D LIDAR Sensor to Draw 3D Bounding Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 983 H. S. Gagana, N. R. Sunitha, and K. N. Nishanth An Experiment with Random Walks and GrabCut in One Cut Interactive Image Segmentation Techniques on MRI Images . . . . . . . . . 993 Anuja Deshpande, Pradeep Dahikar, and Pankaj Agrawal A Comprehensive Survey on Web Recommendations Systems with Special Focus on Filtering Techniques and Usage of Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1009 K. N. Asha and R. Rajkumar A Review on Business Intelligence Systems Using Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1023 Aastha Jain, Dhruvesh Shah, and Prathamesh Churi
xxii
Contents
Image Completion of Highly Noisy Images Using Deep Learning . . . . . 1031 Piyush Agrawal and Sajal Kaushik Virtual Vision for Blind People Using Mobile Camera and Sonar Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1044 Shams Shahriar Suny, Setu Basak, and S. M. Mazharul Hoque Chowdhury Study on Performance of Classification Algorithms Based on the Sample Size for Crop Prediction . . . . . . . . . . . . . . . . . . . . . . . . . 1051 I. Rajeshwari and K. Shyamala Global Normalization for Fingerprint Image Enhancement . . . . . . . . . . 1059 Meghna B. Patel, Satyen M. Parikh, and Ashok R. Patel Recurrent Neural Network for Content Based Image Retrieval Using Image Captioning Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1067 S. Sindu and R. Kousalya Using Deep Learning on Satellite Images to Identify Deforestation/Afforestation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1078 Apurva Mhatre, Navin Kumar Mudaliar, Mahadevan Narayanan, Aaditya Gurav, Ajun Nair, and Akash Nair An Application of Cellular Automata: Satellite Image Classification . . . 1085 S. Poonkuntran, V. Abinaya, S. Manthira Moorthi, and M. P. Oza Morlet Wavelet Threshold Based Glow Warm Optimized X-Means Clustering for Forest Fire Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1094 B. Pushpa and M. Kamarasan Advance Assessment of Neural Network for Identification of Diabetic Nephropathy Using Renal Biopsies Images . . . . . . . . . . . . . 1106 Yogini B. Patil and Seema Kawathekar Advanced Techniques to Enhance Human Visual Ability – A Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1116 A. Vani and M. N. Mamatha Cross Media Feature Retrieval and Optimization: A Contemporary Review of Research Scope, Challenges and Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1125 Monelli Ayyavaraiah and Bondu Venkateswarlu Conversion from Lossless to Gray Scale Image Using Color Space Conversion Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1137 C. S. Sridhar, G. Mahadevan, S. K. Khadar Basha, and P. Sudir Iris Recognition Using Bicubic Interpolation and Multi Level DWT Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1146 C. R. Prashanth, Sunil S. Harakannanavar, and K. B. Raja
Contents
xxiii
Retinal Image Classification System Using Multi Phase Level Set Formulation and ANFIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1154 A. Jayachandran, T. Sreekesh Namboodiri, and L. Arokia Jesu Prabhu FSO: Issues, Challenges and Heuristic Solutions . . . . . . . . . . . . . . . . . . 1162 Saloni Rai and Amit Kumar Garg Design of Canny Edge Detection Hardware Accelerator Using xfOpenCV Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1171 Lokender Vashist and Mukesh Kumar Application of Cloud Computing for Economic Load Dispatch and Unit Commitment Computations of the Power System Network . . . 1179 Nithiyananthan Kannan, Youssef Mobarak, and Fahd Alharbi Hamiltonian Fuzzy Cycles in Generalized Quartic Fuzzy Graphs with Girth k . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1190 N. Jayalakshmi, S. Vimal Kumar, and P. Thangaraj Effective Model Integration Algorithm for Improving Prediction Accuracy of Healthcare Ontology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1203 P. Monika and G. T. Raju Detection of Leaf Disease Using Hybrid Feature Extraction Techniques and CNN Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1213 Vidyashree Kanabur, Sunil S. Harakannanavar, Veena I. Purnikmath, Pramod Hullole, and Dattaprasad Torse Fish Detection and Classification Using Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1221 B. S. Rekha, G. N. Srinivasan, Sravan Kumar Reddy, Divyanshu Kakwani, and Niraj Bhattad Feature Selection and Ensemble Entropy Attribute Weighted Deep Neural Network (EEAw-DNN) for Chronic Kidney Disease (CKD) Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1232 S. Belina V. J. Sara and K. Kalaiselvi Swarm Intelligence Based Feature Clustering for Continuous Speech Recognition Under Noisy Environments . . . . . . . . . . . . . . . . . . . 1248 M. Kalamani, M. Krishnamoorthi, R. Harikumar, and R. S. Valarmathi Video-Based Fire Detection by Transforming to Optimal Color Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1256 M. Thanga Manickam, M. Yogesh, P. Sridhar, Senthil Kumar Thangavel, and Latha Parameswaran Image Context Based Similarity Retrieval System . . . . . . . . . . . . . . . . . 1265 Arpana D. Mahajan and Sanjay Chaudhary
xxiv
Contents
Investigation of Non-invasive Hemoglobin Estimation Using Photoplethysmograph Signal and Machine Learning . . . . . . . . . . . . . . . 1273 M. Lakshmi, S. Bhavani, and P. Manimegalai Sentiment Analysis on Tweets for a Disease and Treatment Combination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1283 R. Meena, V. Thulasi Bai, and J. Omana Design and Performance Analysis of Artificial Neural Network Based Artificial Synapse for Bio-inspired Computing . . . . . . . . . . . . . . . 1294 B. U. V. Prashanth and Mohammed Riyaz Ahmed Sentiment Analysis of Amazon Book Review Data Using Lexicon Based Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1303 Fatema Khatun, S. M. Mazharul Hoque Chowdhury, Zerin Nasrin Tumpa, SK. Fazlee Rabby, Syed Akhter Hossain, and Sheikh Abujar MicroRNA-Drug Association Prediction by Known Nearest Neighbour Algorithm and Label Propagation Using Linear Neighbourhood Similarities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1310 Gill Varghese Sajan and Joby George RF-HYB: Prediction of DNA-Binding Residues and Interaction of Drug in Proteins Using Random-Forest Model by Hybrid Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1319 E. Shahna and Joby George Investigation of Machine Learning Methodologies in Microaneurysms Discernment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1327 S. Iwin Thanakumar Joseph, Tatiparthi Sravanthi, V. Karunakaran, and C. Priyadharsini Segmentation Algorithm Using Temporal Features and Group Delay for Speech Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1335 J. S. Mohith B. Varma, B. Girish K. Reddy, G. V. S. N. Koushik, and K. Jeeva Priya Efficient Ranking Framework for Information Retrieval Using Similarity Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1344 Shadab Irfan and Subhajit Ghosh Detection and Removal of RainDrop from Images Using DeepLearning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1355 Y. Himabindu, R. Manjusha, and Latha Parameswaran A Review on Recent Developments for the Retinal Vessel Segmentation Methodologies and Exudate Detection in Fundus Images Using Deep Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 1363 Silpa Ajith Kumar and J. Satheesh Kumar
Contents
xxv
Smart Imaging System for Tumor Detection in Larynx Using Radial Basis Function Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1371 N. P. G. Bhavani, K. Sujatha, V. Karthikeyan, R. Shobarani, and S. Meena Sutha Effective Spam Image Classification Using CNN and Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1378 Shrihari Rao and Radhakrishan Gopalapillai Unsupervised Clustering Algorithm as Region of Interest Proposals for Cancer Detection Using CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1386 Ajay K. Gogineni, Raj Kishore, Pranay Raj, Suprava Naik, and Kisor K. Sahu Assistive Communication Application for Amyotrophic Lateral Sclerosis Patients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1397 T. Shravani, Ramya Sai, M. Vani Shree, J. Amudha, and C. Jyotsna Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1409
Psychosocial Analysis of Policy Confidence Through Multifactorial Statistics Jesús Silva1(&), Darwin Solano2, Claudia Fernández2, Ligia Romero2, and Omar Bonerge Pineda Lezama3 1
2
Universidad Peruana de Ciencias Aplicadas, Lima, Peru [email protected] Universidad de la Costa, St. 58 #66, Barranquilla, Atlántico, Colombia {dsolano1,cfernand10,lromero11}@cuc.edu.co 3 Universidad Tecnológica Centroamericana (UNITEC), San Pedro Sula, Honduras [email protected]
Abstract. Trust is defined as widespread belief or rooted value orientation in evaluative standards of technical and ethical competence, and in the future actions of a person (interpersonal trust) or an institution (institutional trust). From a psychosocial perspective, trust transcends positive or negative affectivity, alludes to the belief that the behavior of others can be predicted and implies a positive attitude and expectation regarding the behavior of the person or institution. This belief refers to the likelihood that individuals or institutions will take certain actions or refrain from inflicting harm, for the sake of personal or collective well-being. The objective of this study is to examine psychosocial factors related to the interaction between the police and the public that predict the perception of trust in police groups in Colombia. Keywords: Trust
Police Colombia Honesty Performance
1 Introduction In the field of security institutions, the police are one of the closest to the public and one of the most relevant to the perception of insecurity and, consequently, to the maintenance of democracy. Trust in the police strengthens the sense of security in citizens, to the extent that police work is perceived to be part of a set of policy actions planned by the rulers in order to improve citizen security in the community [1]. The link between trust, closeness and security stimulates the climate of public collaboration with public institutions, such as the police [2, 3]. In this sense, as some authors point out, criminal incidence, violence and police performance provide information about the trust in the police and the democratic governance of a country [4, 5]. On the contrary, high criminality has been found to undermine confidence in political institutions in general and towards political institutions in particular [6]. When this mistrust occurs in communities with high crime, the sense of vulnerability is even higher and, consequently, the loss of confidence tends to worsen, which is evident, among other aspects, in a decrease in rather than the procurement of private security or © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 1–7, 2020. https://doi.org/10.1007/978-3-030-37218-7_1
2
J. Silva et al.
the exercise of justice outside public proceedings [7–9]. As [10], the inconsistency between democratic norms and poor police action, characterized by poor performance, high corruption, and involvement in crimes, in contexts where criminal incidence has increased, have led to an increase in distrust of the police and a sense of insecurity widely shared by the public. In this regard, [11] he states that distrust of the police sharpens resistance to reporting, particularly in communities with a high crime incidence. Considering the previous history, and given that there are still few work in the Latin American context that examines how trust in the police is defined [12], incorporating psychosocial variables, such as perception of performance and honesty of the police, the fear of victimization and the perception of insecurity, this study analyses the role of the perception of the performance and honesty of the police, the perception of insecurity, and the fear of victimization, as determinants of trust towards the police in Bogotá, Colombia. In this sense, the experience of victimization, performance and honesty are expected to be predictors of trust in the police.
2 Methodology 2.1
Data
Random sampling and stratified sampling proportional to population density was carried out, in order to select participants from all regions and from the eight delegations that make up the Colombian capital (Bogotá) [13]. To do this, the number of participants in each area was proportional to the number of dwellings inhabited, with a minimum of five participants per area being established to ensure the representativeness of all, especially the least populated. Under these criteria, 12,458 citizens among men and women participated, residents of the city of Bogotá at the time of the study. In addition, by the sample size it is possible to make predictions with the variables selected here, with a coefficient of determination of .05 and a power of .90 [14]. 2.2
Methods
The battery of questionnaires was administered per individual, in interview format, by 102 surveyors previously trained by experts and members of the research group of a university in the Region. This application strategy was chosen to ensure that all items are understood among all respondents. The pollsters are randomly assigned in a balanced way to the four sectors in which the municipality was conventionally divided (north, south, east and west). A supervisor/supervisor and a trainer/trainer coordinated each sector and ensured that at least one trainer and one supervisor were involved in each delegation to resolve any doubts. The completion of the instruments was carried out by the pollsters. In addition, the participation of pollsters, supervisors and participants was voluntary. Participants were informed of the objective and the confidentiality of the data was ensured. 1.46% of respondents refused to participate in the study. In these cases, other participants were selected according to the same criteria show them. The time of application of the questionnaire was between 40 and 45 min [15, 16].
Psychosocial Analysis of Policy Confidence
3
3 Results The following describes the variables used in the study, the selected instruments and their properties [17]. – Victimization. To assess direct and indirect victimization, he wondered whether both respondents and their families had been victims of any crime in Bogota during the past year (2019). To this end, the question was asked “In the last year, have you or the people living in your household been the victims of any crime in Bogota?” The question was coded with two answer options (1 Yes, 2 No). – Fear of victimization. To assess the fear of being a victim of crime, the adoption of measures to protect against crime has been used as an indicator. Measures to protect against crime. This dichotomous scale consists of thirteen reagents that assess the frequency of use of different measures to protect against the possibility of crimes. An exploratory factor analysis with oblimin rotation was calculated that yielded a two-factor structure. The first, physical protection measures, explains 55.45% of the variance and refers to how often measures have been used such as buying and carrying a weapon, installing alarms at home, hiring personal security, taking joint actions with the neighborhood, hire private security on the street or in the region, buy a dog, place fences or fences and increase security on doors or windows. The second factor explains 12.12% of the variance and alludes to the control of personal information such as avoiding giving telephone information, avoiding giving keys or personal data over the Internet, not providing information to strangers and using caller ID Telephone. The results of the factorial analyses performed are presented in Table 1. Table 1. Exploratory factor analysis and confirmative factorial analysis for the questionnaire measures to protect against insecurity Protective measures
Hire personal security Hire private security on the street or colony Don’t give phone information Place fences, fences Get a dog Putting extra locks on doors/windows Don’t give keys or data online Take joint action with your neighbors Install alarms at home or work Not giving information to strangers Buy and carry a gun Use phone call ID
Exploratory factor analysis
F1
F2
.86 .82 .34 .81 .82 .79 .33 .85 .84 .52 .88 .51
.44 .39 .82 .48 .45 .57 .73 .44 .45 .72 .52 .75
Confirmatory factor analysis*: standardized solution*** F1 F2 .31 .37 0.60 .39 .29 .39 0.73 .51 .63 0.59 .24 0.53
*Adjustment rates of confirmatory factor analysis: [S-B s2 s 375.82, gl .47, p < .001, CFI. .93, RMSEA .05 (.04, .05)]. ***Significant coefficients (p < .001)
4
J. Silva et al.
Table 2. Exploratory factor analysis and confirmatory factorial analysis for the Crime frequency scale in the colony Protective measures
Exploratory factor analysis
Confirmatory factor analysis*: standardized solution*** F1 F2 F1 F2 Vandalism .84 −.39 .75 Drug sales .83 −.52 .81 Robbery of people .76 −.43 .67 Murder .54 −.82 .84 Assault .76 −.43 .65 fraud .44 −.81 .76 Sex offences .44 −.88 .78 Kidnapping .46 −.87 .75 Drug use .81 −.50 .73 *Adjustment rates of confirmatory factor analysis: [S-B s2 x 173.13, gl .22, p < .001, CFI .98, RMSEA .05 (0.04, 0.06) ***Significant coefficients (p < .001).
– Type of crime in the region. The frequency of different crimes in the region was assessed on a Likert scale consisting of eleven reagents, with a range of five response options (not frequent, rare, regular, frequent and very frequent). An exploratory factor analysis with oblimin rotation was calculated, resulting in two factors. The first, called violence and drug use, explains 54.22% of the variance and refers to the frequency with which drug-dealing, drug use, vandalism, robbery and assault occur. The second factor, serious crimes, explains 12.31% of the variance and groups together criminal acts of seriousness: sexual offences, kidnapping, fraud and homicide. Reagents “threats” and “injuries and quarrels” have been eliminated because they have similar saturations in both factors, therefore they are complex items saturated with uncertainty. The final scale consists of nine reagents. Cronbach’s alpha coefficient for factors is .84 and .88, respectively. Table 2 contains the results of exploratory and confirmatory factorial analyses. – Factors that promote crime. This scale is composed of twenty reagents that measure the perception of citizens of possible causes that favor the increase of crime. The Likert scale presents a range of five response options (totally disagree, disagree, disagreement, agreement, agreement, and total agreement). An exploratory factor analysis was performed with oblimin rotation that yielded a three-factor structure. The first, psychosocial factors, explains 42.93% of the variance and alludes to the following causes: [crime] is a way to obtain easy resources, bad companies, economic needs, loss of values and lack of empathy or concern for other. The second factor, corruption and institutional impunity, explains 11.62% of the variance and groups the reagents as corruption, impunity and decomposition of the institutions.
Psychosocial Analysis of Policy Confidence
5
Finally, the third, socialization factors, explains 8.43% of the variance and encompasses the reagents that allude to socialization processes: lack of recreation spaces, learning from childhood, influence of the media, and discrimination. The Cronbach alpha obtained for the factors is .82, .80 and 0.63, respectively. – Trust in security institutions. It assesses the public’s trust in public security institutions. The Likert scale consists of seven items with five answer options (unreliable, regular, reliable, very reliable). An exploratory factor analysis was performed with oblimin rotation that showed a single-factor structure that explains 61.77% of the variance. Cronbach’s alpha obtained is .90. Performance and Honesty in security institutions. This Likert scale consists of seven reagents to measure the performance of security institutions, with five response options (very poor performance, poor performance, regular, good performance and very good performance), and seven that evaluate honesty, with five response options (nothing honest, dishonest, regular, honest and very honest). A factor analysis with oblimin rotation showed the presence of two factors that explained 66.44% of the variance. The first, which explains 53.80% of the variance, groups the reagents related to the performance of the Preventive Police, the Transit Police, Judicial Agents, Local and Neighborhood Round, State Police, Federal Preventive Police and Army. The second factor explains 11.64% of the variance and groups the reagents that measure the perception of honesty towards these institutions. Involvement in security. This Likert-type scale with five response options (completely disagree, disagree, neutral, agree, and completely agree) is composed of five reagents that assess the weight of the involvement of the following agents in the safety of Bogota: One self; neighbors, government, public security institutions, and society. A factor analysis with oblimin rotation was performed showing a one-factor structure that explains 53.67% of the variance. Cronbach’s alpha obtained is .80. Willingness to participate in actions that promote safety. This Likert scale with five response options (nothing, little, from time to time, much and always) is configured by eight reagents that evaluate availability to participate in the following actions or activities associated with security: organization neighborhood, promoting more values in the family, recovering public spaces, generating spaces of coexistence, promoting sport and recreation, instilling culture, promoting arts and crafts workshops, and conducting community work. A factorial analysis with oblimin rotation that has shown a one-factor structure explaining 64.89% of the variance was calculated. Cronbach’s alpha obtained is .96. Provision to do justice outside the law. This dichotomous questionnaire has two answer options (1 Yes, 2 No) asking what crimes you are willing to do justice outside the law. This questionnaire consists of the twelve reagents that constitute crimes such as theft/assault, express kidnapping, abuse of authority, homicide, abuse of trust, threats, extortion, property damage, injury, corruption, fraud, and sexual offense. A factor analysis with oblimin rotation was performed showing a single-factor structure that explains 62.77% of the variance. Cronbach’s alpha obtained is .93.
6
J. Silva et al.
Subsequently, a logistical linear regression was performed with the variables included in the t-test, in order to know their specific weight in predicting trust towards police groups, such as security institutions. Trust in the various security institutions that coexist in the region was used as a dicomic dependent variable. The bus test on the coefficients of the model indicates that the variables introduced in the regression improve the fit of the model (*2 (19) 1365.70; p .000). The value of the −2 logarithm of likelihood (−2LL) is 644.69. In addition, the R-squared coefficient of Cox and Snell (.60) indicates that 60.1% of the dependent variable variation is explained by the variables included in the analysis. Similarly, the value obtained with the R squared of Nagelkerke (.83) indicates that the calculated regression explains a variance rate of 80.3%. Both values indicate that the model calculated in the logistic regression shows an appropriate fit and validity. The proportion of cases correctly classified by the regression model has resulted in 94.1% for low confidence and 93.3% for high confidence, so it can be verified that the model has a high specificity and sensitivity. The variable that most affects trust in security institutions is the perception of honesty of these institutions (Wald x 214.56; p < .00; B x 2.80; Exp. (B) x 19.23), to the extent that those who consider security institutions to be honest are 18.72 times more likely to present high confidence towards them. Second, it has been found that the appreciation of the performance of security institutions by citizens predicts confidence, in the sense that those who positively value the performance of these institutions are 4.88 times likely higher than have high confidence. (Wald s 68.14; p < .000; B x 1.45; Exp. (B) x 4.88). Similarly, citizens who are less willing to exercise justice outside the action of security institutions are 2.10 times more likely to have high confidence in the institutions (Wald 5.44; p < .05; B…73; Exp. (B) x 2.50). In addition, having higher education increases by 1.77 the likelihood of having confidence in institutions (Wald s 3.74; p < .05; B…53; Exp. (B) −1.45).
4 Conclusions The objective of this study has been to examine the psychosocial factors, related to the interaction between police groups and citizens, involved in the process of building trust towards the police in the municipality of Cuernavaca. A poor assessment of the various security agents can be seen from the previous analyses, a trend that has been found in previous work [5, 8, 10, 16]. With regard to the determinants of the institutions’ trust through logistic regression analysis, the results partially confirm the expected relationships. Thus, it shows that the main predictive factor of trust is honesty; in other words, people who perceive security institutions as honest and very honest show them greater confidence. In fact, honesty carries more weight than performance in predicting trust in police groups, something that has not been highlighted in previous studies. This finding is suggestive, since it makes it possible to underline the importance of the values and the ethical-moral dimension of the institutions and their implications, both in corruption and in the relationship with citizens.
Psychosocial Analysis of Policy Confidence
7
Compliance with Ethical Standards. ∙ All authors declare that there is no conflict of interest. ∙ No humans/animals involved in this research work. ∙ We have used our own data.
References 1. Newton, K., Norris, P.: Confidence in public institutions, faith, culture and performance? In: Pharr, S.J., Putnam, R.D. (eds.) Disaffected Democracies Princeton, pp. 52–73. Princeton University Press, New Jersey (2000) 2. León Medina, F.J.: Mecanismos generadores de la confianza en la institución policial. Indret: Revista para el Análisis del Derecho (2), 15–30 (2014) 3. Lunecke, A.: Exclusión social, tráfico de drogas y vulnerabilidad barrial. In: Lunecke, A., Munizaga, A.M., Ruiz, J.C. (eds.) Violencia y delincuencia en barrios: Sistematización de experiencias, pp. 40–52 (2009). http://www.pazciudadana.cl/wp-content/uploads/2013/07/ 2009-11-04_Violencia-y-delincuencia-en-barriossistematizaci%C3%83%C2%B3n-deexperiencias.pdf 4. Malone, M.F.T.: The verdict is in the impact of crime on public trust in Central American justice systems. J. Polit. Lat. Am. (2), 99–128 (2010) 5. Ávila Guerrero, M.E., Vera Jiménez, J.A., Ferrer, B.M., Bahena Rivera, A.: Un análisis psicosocial de la confianza en los grupos policiales: el caso de Cuernavaca (México). Perfiles Latinoamericanos 24(47), 151–174 (2016). https://doi.org/10.18504/pl2447-009-2016 6. Bento, N., Gianfrate, G., Thoni, M.H.: Crowdfunding for sustainability ventures. J. Clean. Prod. 237(10), 117751 (2019) 7. Lis-Gutiérrez, J.P., Reyna-Niño, H.E., Gaitán-Angulo, M., Viloria, A., Abril, J.E.S.: Hierarchical ascending classification: an application to contraband apprehensions in Colombia (2015–2016). In: International Conference on Data Mining and Big Data, pp. 168–178. Springer, Cham (2018) 8. Amelec, V., Carmen, V.: Relationship between variables of performance social and financial of microfinance institutions. Adv. Sci. Lett. 21(6), 1931–1934 (2015) 9. Wang, W., Mahmood, A., Sismeiro, C., Vulkan, N.: The evolution of equity crowdfunding: insights from co-investments of angels and the crowd. Res. Policy 48(8), 103727 (2019). https://doi.org/10.1016/j.respol.2019.01.003 10. Bergman, M., Flom, H.: Determinantes de la confianza en la policía. una comparación entre argentina y México. Perfiles Latinoamericanos 20(40), 97–122 (2012) 11. Buendía, J., Somuano, F.: Participación electoral en nuevas democracias: la elección presidencial de 2000 en México. Política y Gobierno 2(2), 289–323 (2003) 12. Costa, G.: La inseguridad en América Latina ¿Cómo estamos? Foro Brasileño de Seguridad Pública. Revista Brasileña de Seguridad Pública 5(8), 6–36 (2011) 13. Segovia, C., Haye, A., González, R., Manzi, J.: Confianza en instituciones políticas en Chile: un modelo de los componentes centrales de juicios de confianza. Revista de Ciencia Política 28(2), 39–602 (2008) 14. Tankebe, J.: Police effectiveness and police trustworthiness in Ghana: an empirical appraisal. Criminol. Crim. Justice (8), 185–202 (2008) 15. DANE: Proyecciones de Población [database]. DANE, Bogotá (2019) 16. Oviedo, C., Campo, A.: Aproximación al uso del coeficiente alfa de Cronbach. Revista Colombianan de Psiquiatría 34(4), 572–580 (2005) 17. Demsar, J., Curk, T., Erjavec, A., Gorup, C., Hocevar, T., Milutinovic, M., Mozina, M., Polajnar, M., Toplak, M., Staric, A., Stajdohar, M., Umek, L., Zagar, L., Zbontar, J., Zitnik, M., Zupan, B.: Orange: data mining toolbox in python. J. Mach. Learn. Res. 14(Aug), 2349– 2353 (2013)
Human Pose Detection: A Machine Learning Approach Munindra Kakati and Parismita Sarma(&) Department of Information Technology, Gauhati University, Guwahati, India [email protected], [email protected]
Abstract. With the recent advancements in computer science and information technology, computers can now leverage human-like abilities in their operations. Image processing and computer vision technology made many practical applications easy and thus making our life comfortable with machine aid. Medical diagnosis and automatic car driving are two prominent examples in this field. Different types of human poses will be observed in our day to day life. The poses are distinguished from each other at the joints of body parts. In this paper we will discuss the methods to identify four fundamental human poses viz. sitting, standing, handshaking and waving. We are proposing here a framework which can automatically recognize the aforementioned four human poses in any angle or direction from the digital image. For working in this frame work we have created a image database of our own, which is described in subsequent chapter. Our working methodology involves image processing and neural network techniques. Finally, the performance of the proposed system is illustrated. Keywords: Human pose processing
Computer vision Neural network Image
1 Introduction 1.1
Importance of Human Pose
Human beings will quite often express their mental and physical health with different types of poses framed with different body parts. Proper recognition of those poses will help to understand the current mental as well as physical condition of the particular person. Human pose detection can be marked as the initial stage of human behavior analysis [1]. Now a days researchers are interested to recognize human poses with the help of computer. If any human pose can be automatically recognized with the help of computer then many other applications like medical diagnosis, crime detection will be easier for full machine implementation. This project has so many applications in our day to day life. We can consider it as advanced level reasoning activity identification in the field of human computer interaction. This work mainly focuses on localizing the parts or joints of human body from the digital image. Pose Estimation As already we have mentioned that human pose will be straight or bent of different parts present in the human body. This topic is a study on the most emerging research © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 8–18, 2020. https://doi.org/10.1007/978-3-030-37218-7_2
Human Pose Detection: A Machine Learning Approach
9
area called Computer vision. This area of research, studies the imitation and duplication of what the human eye see and analyse. That is why to achieve this level of reasoning we should make the machine to be intelligent enough to retrieve the posture information from digital image data. Thus the way of evaluating the posture of human body from image or video is termed as pose estimation. Pose estimation is generally determined by spotting location and direction of an object. 1.2
Machine Learning in Computer Vision
Learning about the information hidden inside an image and analyzing them is an efficient way for understanding the image [2]. Computer vision technology embraces image understanding and machine learning technologies to understand a digital image. The root of computer vision lies on pattern recognition and image processing, by yielding model oriented and knowledge base vision with more data extraction features. This system has capability of reasoning and aptitude of learning. People are interested in this field due to commercial product that can be created from different projects of computer vision. Different machine learning technologies can be used in the field of computer vision. In two different ways machine learning training can be incorporated in computer vision. In case of supervised learning future is predicted from past facts which are known. After a rigorous training the model becomes ready to compare the result with the correct one and evaluate errors, which then rectify the model accordingly. In case of unsupervised learning, training is done on the unlabeled data. This learning method tries to find out different patterns present in the different unlabeled data. This paper is organized into five sections. Section 2 describes Related Study. Section 3 is on Proposed Problem and Methodology. Section 4 is all about Results and Discussion. Section 5 is on Conclusion and Future Work.
2 Related Study To get a clear idea on the topic we have gone through a number of journals and research papers. A few important among them are discussed herewith. According to Chaitra et al. [3] human pose can be detected using a map based and self categorizing retrieval manner. In the year 2018 Liu et al. [4] published a review paper and put emphasis on deep learning used for estimation of human pose from digital image, they did some experimental work on single and multi stage convolution neural network. They also experimented with multi branch, recurrent neural methods. Belagiannis and Zisserman [5] in the year 2017 proposed another method based on heat map depiction for human figure key point detection. In this method the system was able to learn and symbolize body component appearances and testimonial of the human part arrangement. Another paper by Chu et al. [6] proposed a new model for structured feature learning which can verify correlation among the body joints. Zhuang et al. [7] in the year 2018, a new approach was seen with a direction map. This approach
10
M. Kakati and P. Sarma
rendered directional information on different human body components. This model was divided into two distinct parts, first part learns about direction of every component, with reference to central body part, the second part works as a collector by combining places as well as direction information to acquire an ideal posture. In 2018, Geetha and Samundeeswari [8] had a review paper where they put light on human actions recognition methods. Recently new ideas on pose regression is applied by Ionescu et al. [9], they are very much interested on 3D pose.
3 Proposed Problem and Methodology Human pose detection by machine is helpful in different inspection application like activity or behaviour analysis. In Human machine interaction control systems like athlete performance analysis take help of pose identification techniques. Figures 1 and 2 show flowchart for training and testing phases for the whole human pose detection method. The input digital image was preprocessed for more enhancement. Next important and necessary features of the interested regions are extracted. In the last phase recognition is done from the features of extracted image. 3.1
Flowchart of the Proposed System
As shown in the flowchart for training purpose, training images are preprocessed with different preprocessing techniques like grayscaling, resizing, filtering, gray to binary conversion etc. Then from the preprocessed images, the required features are extracted using Principal Component Analysis and Discrete Cosine transform. After that the extracted features are trained using a neural network classifier.
Fig. 1. Flowchart of the training process
Human Pose Detection: A Machine Learning Approach
11
Fig. 2. Flowchart of the testing process
According to flowchart for testing purpose, at first testing images are preprocessed with different preprocessing techniques, which are also used for training the image. Then from the preprocessed images features are extracted. After that extracted features are matched with the trained neural network classifier. In the last phase, the pose will be displayed according to the matching percentage. 3.2
Image Database
We have created the database of our own for implementation. Thus we were able to collect almost 500 images with various human poses. The format of the photograph is in .jpg. The input images are divided into training image and testing image. The image database is created for the recognition of human pose and is classified into four classes for distinct training and testing purposes. During training purpose almost 500 images are used, Fig. 3 shows a preview of the database for the training image.
12
M. Kakati and P. Sarma
Fig. 3. Few of the images from the database [6]
3.3
Preprocessing
The aim of pre-processing of digital image is to improve the image data which may be suppressed unwilling or for removal of salt and pepper as well as Gaussian noises. Preprocessing performs two tasks, they are calculation of Region of Interest (ROI) and conversion of colored image to gray scale image [10]. Enhancement of digital image is important as this process make the image clear, bright and distinct but no extra information is added to it. Some preprocessing techniques that are used in this process are mentioned below. Grayscaling: A grayscale image is one which has only shades of gray colors scaling from 0 to 255. İn black and white image 0 means total black color and 1 means absolute white color. Using gray scale in digital image processing amount of information to be processed is reduced. Which in turn reduces processing cost. Gray scale poses equal intensity for all the components of Red, Green and Blue in the common RGB scale and hence single intensity value for every pixel is enough for processing. Figure 4 shows one of our example image which is converted from RGB to grayscale image. Filtering: Filtering is necessary in digital image enhancement phase. We can find three types of filters high pass, band pass and low pass. High pass filter allows to pass high frequency region like line, border etc. low pass filter permits low frequency components like background or region with minute variation in gray values. Image smoothing, blurring or contrasting are done with the help of these filters.
Human Pose Detection: A Machine Learning Approach
Fig. 4. RGB to gray conversion
13
Fig. 5. Resized image
Image Scaling: Image scaling is changing the existing coordinates from older one to new one. After scaling a new image must be generated which are composed of higher or lower number of pixels in comparison with original image. Figure 5 shows a resized image which is prepared for further processing. Gray Image to Binary Image Conversion: A binary image assigns two values for each pixel, 0 and 1 for black and white respectively. The image contains two types of color namely foreground and background color. The foreground color is used for the object in the image while the rest of the image uses the background color. 3.4
Feature Extraction
In the field of computer vision, feature play a very important role. Various image preprocessing methods such as gray scaling, thresholding, resizing, filtering etc. are applied on the image before feature is extracted. The features are extracted using specific feature extraction methods that can later be used for classification and recognition of images. Principal Component Analysis (PCA) PCA minimizes the broad dimensionality of the data space to a smaller fundamental dimensionality of feature space. PCA is basically used for feature extraction, data compression, redundancy removal, prediction etc. PCA function determines the eigen vectors of a covariance matrix with the largest eigenvalues and then uses this value to project the data into a new subspace of less or equal dimensions. PCA converts matrix of n features into a new dataset of smaller than n. It minimizes the number of features by forming a new, lesser number variables which captures an important portion of the information which are found in the original features.
14
M. Kakati and P. Sarma
Fig. 6. PCA features value of training dataset
Fig. 7. DCT features value of training dataset
As shown in the Figs. 6 and 7, The column of the tables represent number of extracted features from the image and the row of the tables represent number of images. Discrete Cosine Transform (DCT) The DCT is a transform that maps a block of pixel color values in the spatial domain to values in the frequency domain [11]. DCT is defined by finite sequence of data points as sum of cosine functions rolling at various frequencies. The DCT applied to the digital image gives a DCT matrix with different coefficients. The coefficients are divided into two blocks. The white blocks is about the illumination with detail information of the image. Whereas the dark blocks represent the edges of the image. Here block 8 8 DCT is used, the reason behind this is to have enough sized blocks to provide enough compression, though it keeps the transform simple. 3.5
Training Neural Network Classifier
For training purpose, we have collected human images in different poses. We are interested to classify the poses like sitting, standing, waving and handshakeing. After extracting the input features, training set is used to train using feedforward neural network and the output is divided into four classes i.e., Sitting, Standing, Waving and Handshake. Feed-Forward Neural Networks: Feed-forward neural networks are the simplest type of neural network, including an input layer, one or more hidden layers and an output layer. It has no back propagation only front propagated wave by using a classifying activation function. FF Neural networks are usually applied in fields where classification of target classes are found to be complicated. This may include field such as speech recognition and computer vision. Not only such neural networks are responsive towards noiseless data but also easier to maintain.
Human Pose Detection: A Machine Learning Approach
15
Levenberg-Marquardt Algorithm (L-M): The learning phase is one of the more critical phases in the building of a successful neural network application [12]. In order to reduce network error the training algorithm is designed to compute the best network framework. For ANN model training various optimization methods can be used, one of which is the L-M algorithm. 3.6
Testing Image
The trained classifier are then tested with a dataset of images which weren’t used for training the classifiers. The images to be tested were first pre processed by following the same method as those with the trained images. Then the features were extracted using PCA and DCT which is also used for training. These features were then tested on the neural network classifier that we have previously trained. 3.7
Tools/Environment/Experimental Platform
MATLAB is a 4th generation programming language. MATLAB allows different types of features which are implementation of algorithms, matrix manipulations, plotting of functions and data, creation of user interfaces etc. During the whole project windows 8 operating system was used. For capturing images an android phone was used.
4 Result and Discussion We have trained this system to detect the Sitting, Standing, Waving and Handshake position of humans using ANN. We have tested the system with random images. As shown in the Fig. 8, the interface of the proposed system has been designed. The load button is used to load the testing of RGB image. The preprocessing method is there and it process the loaded RGB image. The feature extraction button is used to extract the features from the preprocessed binary image. The ANN classifier button is used to classify the testing image. As seen in Fig. 8, it is a screen shot of our proposed system. İt is the main interface of our system. Different buttons are there and functions of those buttons are already discussed earlier. Figure 9 shows a screen shot of a hand shaking pose with the PCA and DCT values calculated. Now we have got an idea on how the recognition system works. Figures 10, 11, 12 and 13 are screen shots of human pose detected by our system with tagging handshaking, waving, standing and sitting respectively. Every screen shot has correct tagging with their DCT and PCA values. Table 1 shows the overall performance of our implementation.
16
M. Kakati and P. Sarma
Fig. 8. Interface of the proposed system
Fig. 9. The extracted features
Human Pose Detection: A Machine Learning Approach
17
Fıg. 10. Screenshot of hand shake output
Fig. 11. Screen shot of waving
Fig. 12. Screenshot of standing pose
Fig. 13. Screen shot of sitting pose
4.1
Accuracy of the Proposed System
We have tested the trained Feed-Forward Neural Network classifier with the help of different images. We have tested random images containing frequently used human poses. An acceptable means good accuracy of detection was obtained. The following Table 1 gives an idea about the accuracy of the system. Table 1. Accuracy of the proposed system Class Sitting Standing Waving Handshake
No. of testing image Correctly identified Accuracy in percentage 30 22 73% 40 31 77% 25 22 88% 40 37 92%
So, a over all of 82.5% accuracy was found when tested with 135 random images.
18
M. Kakati and P. Sarma
5 Conclusion and Future Work We were able to detect four human poses viz. Sitting, Standing, Handshake and Waving with overall accuracy of 82.5% as shown in the Table 1. Our challenge to this work was to prepare a good image quality database. We have prepared the database by taking photographs of human body from different positions. We were also able to tag the poses successfully. Our system is more accurate for hand pose identification as the accuracy percentage are above 80% known from Table 1. In future we will train and test the system with more data and try to identify more poses over these four and also try to increase the efficiency of our model, more than now achieved. We will also look into tag complex human poses, which have more than one pose. The work will be extended more by using CNN machine learning techniques in the near future. Acknowledgement. We have taken photographs of author and we got permission to display this pictures. All author states that there is no conflict of interest. We used our own data.
References 1. Kakati, M., Sarma, P.: Human pose detection from a digital image with machine learning technique. Int. J. Comput. Sci. Eng. 7(5), 1389–1393 (2019) 2. Sebe, N., Cohen, I., Garg, A., Huang, T.S.: Introduction. In: Machine Learning in Computer Vision. Computational Imaging and Vision, vol. 29. Springer, Dordrecht (2005) 3. Chaitra, B.H., Anupama, H.S., Cauvery, N.K.: Human action recognition using image processing and artificial neural networks. Int. J. Comput. Appl. (0975-8887) 80(9), 31–34 (2013) 4. Liu, Y., Xu, Y., Li, S.B.: 2-D human pose estimation from images based on deep learning: a review. In: 2018 2nd IEEE Advanced Information Management, Communicates, Electronic and Automation Control Conference, pp. 462–465 (2018) 5. Belagiannis, V., Zisserman, A.: Recurrent human pose estimation. In: 2017 12th IEEE International Conference on Automatic Face and Gesture Recognition, pp. 468–475 (2017) 6. Chu, X., Ouyang, W., Li, H., Wang, X.: Structured feature learning for pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4715–4723 (2016) 7. Zhuang, W., Xia, S., Wang, Y.: Human pose estimation using direction maps. In: 33rd Youth Academic Annual Conference of Chinese Association of Automation, pp. 977–982 (2018) 8. Geetha, N., Samundeeswari, E.S.: A review on human activity recognition system. Int. J. Comput. Sci. Eng. 6(12), 825–829 (2018) 9. Ionescu, C., Li, F., Sminchisescu, C.: Latent structured models for human pose estimation. In: ICCV, pp. 2220–2227 (2011) 10. Barman, K., Sarma, P.: Glaucoma detection using fuzzy C-means clustering algorithm and thresholding. Int. J. Comput. Sci. Eng. 7(3), 859–864 (2019) 11. Robinson, J., Kecman, V.: Combining support vector machine learning with the discrete cosine transform in image compression. IEEE Trans. Neural Netw. 14(4), 950–958 (2003) 12. Lera, G., Pinzolas, M.: Neighborhood based Levenberg-Marquardt algorithm for neural network training. IEEE Trans. Neural Netw. 13(5), 1200–1203 (2005)
Machine Learning Algorithm in Smart Farming for Crop Identification N. D. Vindya(B) and H. K. Vedamurthy(B) Department of CSE, SIT, Tumakuru, India [email protected], [email protected]
Abstract. Agriculture is the main supporting sector of Indian economy and most of the rural population’s livelihood depends on it. Crop yielding has been decreasing day by day because it’s growth mainly depends on monsoon which is highly unpredictable in India. Further, the crop growth also depends on soil parameters, which varies due to ground contamination. For increasing the crop productivity, there is a need to incorporate new technologies in the field of agriculture. By utilizing new technologies like Machine Learning and Internet of Things applications in the field of agriculture will be enhance the financial growth of the country. Here, we have presented a framework which mainly focuses on agriculture in Tumakuru district of Karnataka that uses Naive Bayes classification method to suggest the farmers with the best suitable crop for sowing in that particular environmental condition. Keywords: Machine learning · Gaussian Naive Bayes · Data analytics · Prediction
1 Introduction Agriculture is the backbone of India; around 61.5% of Indian population in rural area depends on agriculture. In crop productivity, India is ranked second in world wide. Agriculture contributed 17–18% of GDPA and employed 50% of Indian work force [1]. In India, crop productivity mainly depends on monsoon. When there is good monsoon then crop yield will be more, when there is poor monsoon crop yield will be less. But nowadays monsoon is getting varied, farmers may go wrong in forecasting the monsoon. Along with this, crops productivity also depends on soil conditions, but contamination of ground, reduction in water body, green house emission are all affecting the productivity. Therefore, integrating new technologies like Machine Learning and Internet of Things (IoT) can able to predict the monsoon and to decide which crop is best for the present environmental condition. To develop new openings for recognising the intensity of data, quantify and unravel its processes in agriculture sector, machine learning has been merged with IoT, cloud computing, and big data. It is a scientific field which gives ability for machines to learn without being explicitly programmed [2]. This technology is now applied in many fields like medical, meteorology, economic science, robotics, agriculture, food, climatic, etc. © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 19–25, 2020. https://doi.org/10.1007/978-3-030-37218-7_3
20
N. D. Vindya and H. K. Vedamurthy
Internet of Things is also a main part in machine learning because that helps in gathering of data from different fields. Real time soil moisture, humidity and temperature data are taken by integrating sensor elements with Raspberry pi for crop prediction [3]. Data analytics (DA) is a method for inspecting datasets in order to obtain the hidden information related to the data, which is increasingly used with the support of specific software and system [4]. Smart farming is an idea that focuses on data involvement and proportionate inventions in the physical, administration and digital cycle. Smart farm helps farmers for taking good decision in order to gain the high quality output by applying data analysis and automation in the agriculture field. In agriculture, crops are mainly lossed due to variations in monsoon. Farmers are able to predict the monsoon based on past years’ experience, since it is continuously varying, prediction made by them may also fail. Once crops related data and rainfall data are gathered, then prediction can be made and decision is given to the farmers. In this paper, we focused agriculture in Tumakuru district of Karnataka and collected dataset related to crops and rainfall from Krishi Vignana Kendra Hirehalli and Karnataka State Disaster Monitoring Centre Bangalore. Gaussian Naive Bayes classifier model is developed to provide information about which is the best crop for sowing in particular season in each taluk of Tumakuru district.
2 Literature Survey Using sensors real time weather and farm related data are collected and analysis is made on collected data then result is given to the farmers. Data from irrigation related reports, sensor recorded field data and satellite images, weather data and crops data are collected from various sources. If there are any missing values in collected data, then it is filled by calculating the mean values. By using Naive Bayes classifier for collected data sets a model is developed to recommend about crops to the farmers [5]. Soil parameters like, Crop rotation, Phosphorus, Soil moisture, Surface temperature, Nitrogen, Potassium, etc. are important for growing crops. Using these parameters agriculture field can be monitored, which will increase the productivity in excessive amount. From IMD (Indian Metrological Department) weather data and soil parameters are integrated and developed a model by applying a machine learning algorithm. This provides crops related information to the farmers [6]. A model for discovering and controlling soil parameters and five types of diseases in cotton leaf is developed using Support Vector Machine algorithm. A motor driver is developed having raspberry pi with four different sensors containing LM 35 soil moisture, DHT-22 humidity-temperature and water sensors, which helps to manage soil conditions at different places [7]. To increase the growth of different types of crops in a closed environment usually in a Greenhouse is done by controlling the changing factors like humidity, soil moisture, temperature is done by connecting sensors with Raspberry pi and also a mobile application is developed which helps to alerts the user when there is a variation in these factors. The sensor values are transferred to the ThingSpeak cloud which will be used for analysing [3].
Machine Learning Algorithm in Smart Farming for Crop Identification
21
By applying data mining techniques on soil data sets can predict the better yielding crops. Naïve Bayes and K-Nearest Neighbour algorithms are applied to the soil data sets for predicting better yielding crops [8]. Machine learning is currently used in multidisciplinary agricultural area using computing technologies and analysis, which contains (1) management of crops like disease detection, crop yield analysis, species identification; (2) management of soil; (3) management of animal welfare; (4) management of water etc. Farm management systems are developed for identification and taking decision for the farm by the farmers using data analysis, machine learning and artificial intelligence [9].
3 Process Description 3.1 Modality Tumakuru is a district in Karnataka state having 2,678,980 populations as per 2011 census [10]. It falls under agriculture zone 4, 5 and 6 with Central dry, Eastern dry and Southern dry zones respectively. Six taluks namely Chikkanayakanahalli, Madhugiri, Tiptur, Sira, Pavagada and Koratagere are come under zone 4. Tumakuru and Gubbi taluks comes under zone 5 and the zone 6 contains Turuvekere and Kunigal. In Tumakuru agriculture is considered as one of the parameters for economic growth. It is having 6.1% GSDP (Gross State Domestic Product) for Karnataka. District has average annual rainfall around 593.0 mm in the district. Three major types of soil found in Tumakuru, that are Red loamy soil, mixed red and black soil, Red sandy soil. Red loamy soil found in Koratagere, Tumakuru, Madhugiri and Kunigal. Red sandy soil found in Tiptur, Turuvekere, Sira, Pavagada and Gubbi. Mixed red and black soil found in Chikkanayakana Halli. There are many crops have grown in Tumakuru like Paddy, Maize, Bajra, Ground nuts, etc. Paddy can be grown with temperature ranges from 21–37, humidity 60–80, soil type black soil, red loamy, red soil with PH 4–6. Like that there are many crops have grown in Tumakuru, here we considered eighteen crops that are Paddy, Ragi, Jowar, Maize, Bajra, Groundnut Mustard, Sunflower, Castor, Green gram, Horse gram, Red gram, Tomato, Beans, Cabbage, Brinjal and Turmeric. By collecting required environmental conditions needed for above mentioned crops, the proposed model is done. Rainfall data of each taluk from the year 2014–2018 are collected from Karnataka State Disaster Monitoring Centre. Some crop details are taken from Krishi Vignana Kendra Hirehalli, Tumakuru. Along with this real time soil moisture, temperature and humidity data are taken from sensors. Attributes containing temperature, humidity, rainfall, soil type and soil PH which helps for sowing is considered. 3.2 Methodology Bayesian models (BM) works based on analysis framework of Bayesian interface, which is a family of conditional probability [11]. For solving regression and classification problem Naïve Bayes algorithm is used and it comes under supervised learning of machine
22
N. D. Vindya and H. K. Vedamurthy
learning algorithm [12]. Bernoulli, Gaussian and Multinomial are different types of Naïve Bayes algorithm. This algorithm is having low computational cost because, it works by taking each class value is independent to other and to find maximum likelihood closed form expression is taken. Initially using training datasets classifier model is trained. The training datasets having tuples X with features Y1 , Y2 , Y3 , …, Yn . These tuples are checked against each class. Figure 1, shows the crop identification system which is trained using crops data sets contain major crops grown in Tumakuru. Before loading the crops data, import the packages like numpy, seaborn, pandas and sklearn libraries to the algorithm. Numpy is used for the mathematical functions, pandas library is used for analytics purpose, seaborn is a library for plotting of graphs, scikit library is used for creating different models. Then import the crop datasets which contains crops name, humidity, rainfall, soil type, temperature and soil PH features with eighteen classes by using read_csv() function. Header function is used to identify the header in first row, in our dataset first row contains crop names.
Fig. 1. Crop identification system
After this missing values are filled by calculating mean values of data. In Naïve Bayes method all the data should be in a single format. But our dataset contains thirteen types of soils with string format, therefore One-Hot Encoding [13] method is performed by taking each soil type in separate columns with assigning binary values for each. Then perform the reshaping of datasets to convert it into the 3D format. After converting it into a single format, data pre-processing is done on the updated datasets and split it into test and train data, which is done by using method train_test_split(). After splitting the datasets, Gaussian Naïve Bayes [14] model is selected using scikit learn method and fit() method is used to train the model with training dataset.
Machine Learning Algorithm in Smart Farming for Crop Identification
23
Using predict() method, prediction is made by passing test data to the trained classifier model. Then accuracy of the model is done by comparing predicted values with target values by accuracy_score() method. As shown in Fig. 1, the model is used for the prediction by passing new real time data which is obtained by connecting YL 69 soil moisture and DHT22 humidity and temperature sensor components with Raspberry pi 2 Model-B. From this sensor unit, real time data for every 5 s are taken. Then data are directly uploaded into ThingSpeak IoT cloud using Raspberry pi as a gateway between sensor unit and cloud. This stored data along with rainfall data from past five years of each taluk, soil PH and soil types are passed to the model. The final output consists of predicted crop i.e., which is suitable for sowing in the given environmental conditions.
4 Results Figure 2 shows, that user select the Tumakuru taluk and April month for the prediction of crop in that particular region and time.
Fig. 2. Selection of taluk and month
Figure 3 shows that, when taluk and month are selected, it will display the humidity, soil PH, soil type, rainfall and temperature of that particular region.
24
N. D. Vindya and H. K. Vedamurthy
Fig. 3. Displaying of parameter values
Figure 4 shows, if the field having temperature 32, rainfall 38, soil Ph 6, humidity 41 and black soil then Horse Gram can be grown in that area.
Fig. 4. Display of predicted crop
Machine Learning Algorithm in Smart Farming for Crop Identification
25
5 Conclusion In this research work, we have designed a model of crop identification system using Gaussian Naïve Bayes classifier. Based on the parameters like humidity, soil PH, temperature, rainfall and soil type, we predicted the suitable crop for the given region. When a user request for crop recommendation in the particular region and month, he will get the real time data on the humidity and temperature values which is taken by the deployed sensors in the different regions of each taluk. Along with this he will get the rainfall prediction and the crop name which is suitable for that particular environmental condition. The presented work archives 70% of accuracy. In further, we have planned to extend our work to identify different crops in various regions of Karnataka. Compliance with Ethical Standards. All author states that there is no conflict of interest. We used our own data. Animals/human are not involved in this work.
References 1. Agriculture in India. https://en.wikipedia.org/wiki/Agriculture_in_India 2. Samuel, A.L.: Some studies in machine learning using the game of checkers. IBM J. Res. Dev. 44, 206–226 (1959) 3. Dedeepya, P., Srinija, U.S.A., Krishna, M.G., Sindhusha, G., Gnanesh, T.: Smart greenhouse farming based on IOT. In: 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA), pp. 1890–1893 (2018) 4. Data analytics. https://searchdatamanagemsent.techtarget.com/definition/data-analytics 5. Priya, R., Ramesh, D.: Crop prediction on the region belts of india: a Naïve Bayes MapReduce precision agricultural model. In: International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 99–104 (2018) 6. Zingade, D.S., Buchade, O., Mehta, N., Ghodekar, S., Mehta, C.: Crop prediction system using machine learning. Int. J. Adv. Eng. Res. Dev. Spec. Issue Recent Trends Data Eng. 4(5), 1–6 (2017) 7. Sarangdhar, A.A., Pawar, V.R.: Machine learning regression technique for cotton leaf disease detection and controlling using IoT. In: International Conference on Electronics, Communication and Aerospace Technology, ICECA, vol. 2, pp. 449–454 (2017) 8. Paul, M., Vishwakarma, S.K., Verma, A.: Analysis of soil behaviour and prediction of crop yield using data mining approach. In: 2015 International Conference on Computational Intelligence and Communication Networks (CICN), pp. 766–771 (2015) 9. Liakos, K.G., Busato, P., Moshou, D., Pearson, S., Bochtis, D.: Machine learning in agriculture: a review. Sensors 18, 2674 (2018). https://doi.org/10.3390/s18082674 10. Census in Tumakuru. https://www.census2011.co.in/census/district/267-tumkur.html 11. Alipio, M.I., Cruz, A.E.M.D., Doria, J.D.A., Fruto, R.M.: A smart hydroponics farming system using exact inference in Bayesian network. In: 2017 IEEE 6th Global Conference on Consumer Electronics, pp. 1–5 (2017) 12. Chen, J., Huang, H., Tian, S., Qu, Y.: Feature selection for text classification with Naïve Bayes. Expert Syst. Appl. 36(3), 5432–5435 (2009) 13. Tahmassebi, A., Gandomi, A.H., Schulte, M.H.J., Goudriaan, A.E., Foo, S.Y., Meyer-Baese, A.: Optimized Naive-Bayes and decision tree approaches for fMRI smoking cessation classification 2018, 24 p. Article ID 2740817 14. One-Hot Bit Encoding. https://machinelearningmastery.com/why-one-hot-encode-data-inmachine-learning/
Analysis of Feature Extraction Algorithm Using Two Dimensional Discrete Wavelet Transforms in Mammograms to Detect Microcalcifications Subash Chandra Bose1(&), Murugesh Veerasamy2, Azath Mubarakali3, Ninoslav Marina1, and Elena Hadzieva1 1
3
University of Information Science and Technology (UIST) “St. Paul the Apostle”, Ohrid, Republic of Macedonia [email protected] 2 Department of Computer Science, Cihan University-Erbil, Kurdistan Region, Iraq [email protected] College of Computer Science, Department of CNE, King Khalid University, Abha, Saudi Arabia [email protected]
Abstract. The paper focusing on the analysis of the feature extraction (FE) algorithm using a two-dimensional discrete wavelet transform (DWT) in mammogram to detect microcalcifications. We are extracting nine features namely Mean(M), Standard deviation(SD), Variance(V), Covariance(Co_V), Entropy(E), Energy(En), Kurtosis(K), Area(A) and Sum(S). FE is a tool for dimensionality reduction. Instead of too much information the data should be reduced representation of the input size. The extraction has been done to construct the dataset in the proper format of the segmented tumor, then only it can be given to the classifier tools and achieving high classification accuracy. The nine statistical features are extracted from the mammogram is determined the effectiveness of the proposed technique. This experiment has been conducted for 322 mammogram images and in this paper, we listed only a few. Actually, the detection of microcalcifications has been done with various algorithms discussed below. The analysis of the FE algorithm using two-dimensional discrete wavelet transform in mammogram to detect macrocalcification technique has been analyzed using MATLAB. Keywords: Breast cancer Mammogram extraction Discrete wavelet transform
Microcalcifications Feature
1 Introduction In Macedonia, breast cancer is leading cancer among all other types of cancer. Breast cancer in Macedonia is alarming. The morality of breast cancer in Macedonia is higher than all other cancer types. According to the latest WHO data published in 2017 Breast Cancer deaths in Macedonia reached 328 or 1.78% of total deaths. The death rate based © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 26–39, 2020. https://doi.org/10.1007/978-3-030-37218-7_4
Analysis of FE Algorithm Using Two Dimensional DWT in Mammograms
27
on the age-adjusted is 20.76 per 100,000 of population ranks Macedonia #50 in the world [1]. Here, a new method for Detection of microcalcifications in mammograms using soft segmentation technique is presented [2]. Different image processing techniques have been done like preprocessing enhancement of the image. Segmentation using Fuzzy c-means clustering (FCM), FE using two-dimensional discrete wavelet transform, and finally normal or abnormal (benign or malignant) images have been classified by using artificial neural network (ANN).
2 Literature Survey Last several decades many authors had been conducted research in this field here we listed few detections of breast cancer techniques. Li [3] used a Markov random field (MRF) in image analysis for detection of tumors. Vega-Corona et al. [4] using a method CAD system to detect microcalcifications in mammograms using a general regression neural network (GRNN). D’Elia et al. [5] used a MRF and support vector machine classification (SVM) to detect early stage of breast cancer. Fu and Lee [6] used the GRNN and SVM to detect microcalcifications. Osta et al. [7] Used wavelet-based FE for dimensionality reduction. Jiji et al. [8] used Region of Interest (ROI) and Backpropagation Neural Network (BPNN) for detection of microcalcification.
3 Image Acquisition The mammograms are available on the website: http://peipa.essex.ac.uk/info/mias.html. This database contains 322 mammograms [9, 11]. Detection of microcalcifications has been done in four stages Preprocessing, Segmentation, FE, and classification are shown in Fig. 1. Enhancement and Preprocessing: It has been done using adaptive median filtering for noise removal. Segmentation: It has been done using Fuzzy c-means clustering [12]. Feature extraction: The features have been extracted using two-dimensional discrete wavelet transforms discussed and the experimental results have been discussed below in detail. Classification: The extracted features have been feed to the ANN and the benign or malignant image has been classified.
28
S. C. Bose et al.
Fig. 1. Flow chart
4 Experimental Results The peak to noise ratio (PSNR), Mean Square Error (MSE) values for 322 mammogram image has been evaluated and listed few has been shown in Figs. 2 and 3.
Fig. 2. Performance analysis of PSNR
To get a better performance result the image should be in better quality. If the PSNR is high and MSE is valued is less then, the image quality will be good and the performance will be better.
Analysis of FE Algorithm Using Two Dimensional DWT in Mammograms
29
Fig. 3. Performance analysis of MSE
Here, PSNR and ASNR, the values c and d respectively is larger, then the enhancement and preprocessing method is better compared to the existing method. The PSNR value has been calculated using the formula: PSNR ¼ ðc aÞ =r. The ASNR value has been calculated using the formula: ASNR ¼ ðc aÞ =r. The MSE value has been calculated using the formula: MSE ¼ ðc dÞ=m n. Where a is the mean gray level value of the original image, c is the maximum gray level value and d is the average gray level value of the enhanced image, m, n is the pixel values of an enhanced image and r is the SD of the original image. The enhanced mammogram image is evaluated using PSNR values and SNR values. The result is shown in Figs. 4 and 5.
Performace analysis of PSNR 94
95 88
90
87 85
85
82
80 75 Gaussian smoothening
Noise equalization
Matched filtering
LOG filtering
Fig. 4. Performance analysis of PSNR.
Tracking algorithms
30
S. C. Bose et al.
Performace analysis of ASNR 95
92 88
90 85
86 81
80 80 75 70 Gaussian smoothening
Noise equalization
Matched filtering
LOG filtering
Tracking algorithms
Fig. 5. Performance analysis of ASNR.
5 Feature Extraction FE had been conducted by using discrete wavelet transforms. The block diagram for FE has been shown in the Fig. 6.
Fig. 6. Block diagram for feature extraction
Analysis of FE Algorithm Using Two Dimensional DWT in Mammograms
5.1
31
Discrete Wavelet Transforms
Discrete wavelet transforms (DWT) technique has been used to classify the system [10]. Statistical Features Calculation: There are nine features extracted from the Low, Low band of discrete wavelet transform such as, M, SD, V, Co_V, E, En, K, A, and S. *Equivalent Matlab code* [a1, h1, v1, d1] = dwt2 (STI, ‘db1’); [a2, h2, v2, d2] = dwt2 (a1, ‘db1’); imshow ([a2, h2, v2, d2]); title (‘Two dimensional discrete wavelet transform’); a. Mean (M) The mean value has been calculated using the formula: l¼
1 XM XN pði; jÞ i¼1 j¼1 MN
MATLAB code for calculating M mean1 = mean ðdouble ða ð:ÞÞÞ The performance analysis of mean for various mammogram image has been shown in the Fig. 7.
Fig. 7. Performance analysis of Standard deviation
32
S. C. Bose et al.
b. Standard Deviation (SD) It has been evaluated of the mean square deviation of pixel value p (i, j) from its M value. MATLAB code for calculating the SD SD = std (std (double (a))) Steps to calculate the SD SD = sqrt (V) = sqrt (8.7) = roughly 2.95 The performance analysis of SD for various mammogram image has been shown in the Fig. 8.
Fig. 8. Performance analysis of Standard deviation
c. Variance (V) MATLAB code for calculating Variance V = var (double (a (:))) The performance analysis of variance for various mammogram image has been shown in the Fig. 9.
Analysis of FE Algorithm Using Two Dimensional DWT in Mammograms
33
Fig. 9. Performance analysis of Variance
d. Covariance (Co_V) MATLAB code for calculating Co_V Co V = Co V ðdouble ðaÞÞ; Co V = sum ðsum ðCo VÞÞ=ðlength ðCo VÞ 1000Þ The performance analysis of Co-variance for various mammogram image has been shown in the Fig. 10.
Fig. 10. Performance analysis of Co-variance
34
S. C. Bose et al.
e. Entropy (E) MATLAB code for calculating E Entropy1 = entropy (a) The performance analysis of Co-variance for various mammogram image has been shown in the Fig. 11.
Fig. 11. Performance analysis of Entropy
f. Energy MATLAB code for calculating Energy energy = sum (h_norm) The performance analysis of Energy for various mammogram image has been shown in the Fig. 12.
Fig. 12. Performance analysis of Energy
Analysis of FE Algorithm Using Two Dimensional DWT in Mammograms
35
g. Kurtosis MATLAB code for calculating Kurtosis kurtosis1 = kurtosis (double (a (:))) The performance analysis of kurtosis for various mammogram image has been shown in the Fig. 13.
Fig. 13. Performance analysis of Kurtosis
h. Area MATLAB code for calculating Area area1 = bwarea (double (a (:))) The performance analysis of kurtosis for various mammogram image has been shown in the Fig. 14.
Fig. 14. Performance analysis of Area
36
S. C. Bose et al.
i. Sum MATLAB code for calculating Sum Bags = sum (sum (a))/(sze (1) * sze (2)) The performance analysis of Bags for various mammogram image has been shown in the Fig. 15.
Fig. 15. Performance analysis of bag
MATLAB code for combining the features data = [ag;ahg;energy;entropy1;standarddeviation;covariance;mean;variance;kutosis; area 1;bags]/100. The above mentioned all these nine statistical features M, SD, V, Co_V, E, En, K, A and S are extracted from the mammogram is determined the effectiveness of the proposed technique. Calculating the V and SD example. (1) Mean Assume sample size is 4 The sum of numbers 3 + 6 + 9 + 4 = 22 Mean = 22/4 = 5.5 (2) Find the difference of numbers with mean (3 – 5.5), (6 – 5.5), (9 – 5.5), (4 – 5.5) = –2.5, 0.5, 3.5, –1.5 (3) Square the differences 6.25, 0.25, 12.25, 2.25 (4) Sum of Squares 6.25 + 0.25 + 12.25 + 2.25 = 21
Analysis of FE Algorithm Using Two Dimensional DWT in Mammograms
37
(5) V = Divide Sum of squares by sample size V = 21/4 = 5.25 (6) SD = Sqrt(V) Sqrt(5.25) = 2.29 (approximately) The various methods of detection rate and the comparison is shown in the Table 1 and Fig. 16 below. Table 1. Performance analysis comparison Authors Lau et al. [20] Nishikawa et al. [21] Sallman et al. [22] Kim et al. [23] Cordella et al. [24] Ferrari et al. [25] Thangavel et al. [26] Thangavel et al. [27] Thangavel et al. [28] Bose et al. [17] Bose et al. [18] Bose et al. [2] Karnan et al. (2011)
Methods Asymmetric measures Linear classifier Unwrapping technique Neural networks Multiple expert system Directional filtering with Gabor wavelets MRF-ACO Ant colony optimization Genetic algorithm Bilateral subtraction using PSO MRF-PSO segmentation Two dimensional discrete wavelet transforms Particle Swarm Optimization
Fig. 16. Performance analysis comparison
Detection rate 85% 70% 86.60% 88% 78.6% 74.4% 94.8% 96.48% 93.21% 94.6% 98.3% 97.3% 96.4%
38
S. C. Bose et al.
In Discrete wavelet transformation (DWT) method the true positive (TP) and false positive (FP) detection rate for different threshold values of images are used to measure the performance. These rates have been found using the Receiver Operating Characteristic (ROC) curves (Az value) ROC curve range should lie between [0, 1]. The AZ value is 1.0 the detection of diagnosis is perfect, then TP is 100% and FP is 0%. In the FE using the discrete wavelet transformation (FE-DWT) method, the Az value is 94%. In other methods like Sallam and Bowyer (1999), Lau and Bischof (1991) the overlap is about 40% TP. In the FE-DWT method, the overlap is about 75%. Bilateral Subtraction and MRF – Ant colony optimization method if the threshold value is low then the TP is merged with FP regions.
6 Conclusion The analysis of FE algorithm using two-dimensional discrete wavelet transform (DWT) in mammogram to detect macrocalcification techniques has been discussed in detail and the results have been displayed. Also, this method can be used in mammogram after lossless compression, encryption techniques and for corrupted mammogram images in future [13–18]. This detection of microcalcifications technique can be also used to detect the microparticle in firefighter robotic industries [19]. Compliance with Ethical Standards. ✓ All authors declare that there is no conflict of interest. ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. World Health Organization: Cancer incidence, mortality and survival by site for 14 regions of the world 2. Bose, J.S.C., Shankar Kumar, K.R.: Detection of micro classification in mammograms using soft computing techniques. Eur. J. Sci. Res. 86(1), 103–122 (2012) 3. Li, H., et al.: Markov random field for tumor detection in digital mammography. IEEE Trans. Med. Imaging 14(3), 565–576 (1995) 4. Vega-Corona, A., Álvarez, A., Andina, D.: Feature vectors generation for detection of micro calcifications in digitized mammography using neural networks. In: Artificial Neural Nets Problem Solving Methods, LNCS, vol. 2687, pp. 583–590 (2003) 5. D’Elia, C., Marrocco, C., Molinara, M., Poggi, G., Scarpa, G., Tortorella, F.: Detection of microcalcifications clusters in mammograms through TS-MRF segmentation and SVMbased classification. In: 17th International Conference on Pattern Recognition (2004) 6. Fu, J., Lee, S., Wong, S.J., Yeh, J.A., Wang, A., Wu, H.: Image segmentation, feature selection and pattern classification for mammographic micro calcifications. Comput. Med. Imaging Graph. 29, 419–429 (2005) 7. Osta, H., Qahwaji, R., Ipson, S.: Wavelet-based feature extraction and classification for mammogram images using RBF and SVM. In: Visualization, Imaging, and Image Processing (VIIP), Palma de Mallorca, Spain (2008)
Analysis of FE Algorithm Using Two Dimensional DWT in Mammograms
39
8. Dheeba, J., Jiji, W.G.: Detection of microcalcification clusters in mammograms using neural network. Int. J. Adv. Sci. Technol. 19, 13–22 (2010) 9. MIAS database. www.mias.org 10. Juarez, L.C., Ponomaryov, V., Sanchez, R.J.L.: Detection of micro calcifications in digital mammograms images using wavelet transform. In: Electronics, Robotics and Automotive Mechanics Conference, September 2006, vol. 2, pp. 58–61 (2006) 11. Suckling, J., Parker, J., Dance, D., Astley, S., Hutt, I., Boggis, C., et al.: The mammographic images analysis society digital mammogram database. Exerpta Med. Int. Congr. Ser. 1069, 375–378 (1994) 12. Mohanad Alata, M., Molhim, M., Ramini, A.: Optimizing of fuzzy C-means clustering algorithm. World Acad. Sci. Eng. Technol. 39, 224–229 (2008) 13. Bose, J.S.C., Gopinath, G.: A survey based on image encryption then compression techniques for efficient image transmission. J. Ind. Eng. Res. 1(1), 15–18 (2015). ISSN 2077-4559 14. Bose, J.S.C., Gopinath, G.: An ETC system using advanced encryption standard and arithmetic coding. Middle-East J. Sci. Res. 23(5), 932–935 (2015). https://doi.org/10.5829/ idosi.mejsr.2015.23.05.22233. ISSN 1990-9233, © IDOSI Publications 15. Bose, J.S.C., Saranya, B., Monisha, S.: Optimization and generalization of Lloyd’s algorithm for medical image compression. Middle-East J. Sci. Res. 23(4), 647–651 (2015). https://doi.org/10.5829/idosi.mejsr.2015.23.04.103. ISSN 1990-9233 © IDOSI Publications 16. Sebastian, L., Bose, J.S.: Efficient restoration of corrupted images and data hiding in encrypted images. J. Ind. Eng. Res. 1(2), 38–44 (2015). ISSN 2077-4559 17. Bose, J.S.C.: Detection of ovarian cancer through protein analysis using ant colony and particle swarm optimization. Proc. Int. J. Multimed. Comput. Vis. Mach. Learn. 1, 59–65 (2010) 18. Bose, J.S.C.: Detection of masses in digital mammograms. Int. J. Comput. Netw. Secur. 2, 78–86 (2010) 19. Bose, J.S.C., Mehrez, M., Badawy, A.S., Ghribi, W., Bangali, H., Basha, A.: Development and designing of fire fighter robotics using cyber security. In: Proceedings of IEEE - 2nd International Conference on Anti-Cyber Crimes (ICACC 2017), 26–27 March 2017, pp. 118–122 (2017) 20. Lau, T.-K., et al.: Automated detection of breast tumors using the asymmetry approach. Comput. Biomed. Res. 24, 273–295 (1991) 21. Nishikawa, R.M., et al.: Computer-aided detection of clustered microcalcifications on digital mammograms. Med. Biol. Eng. Comput. 33(2), 174–178 (1995) 22. Sallam, M.Y., et al.: Registration and difference analysis of corresponding mammogram images. Med. Image Anal. 3(2), 103–118 (1999) 23. Kim, J.K., et al.: Statistical textural features for detection of microcalcifications in digitized mammograms. IEEE Trans. Med. Image 18(3), 231–238 (1999) 24. Cordella, L.P., et al.: Combing experts with different features for classifying clustered microcalcifications in mammograms. In: Proceedings of 15th International Conference on Patten Recognition, pp. 324–327 (2000) 25. Ferrari, R.J., et al.: Analysis of asymmetry in mammograms via directional filtering with Gabor wavelets. IEEE Trans. Med. Imaging 20(9), 953–964 (2001) 26. Thangavel, K., et al.: Ant colony system for segmentation and classification of microcalcification in mammograms. Int. J. Artif. Intell. Mach. Learn. 5(3), 29–40 (2005) 27. Karnan, M., et al.: Ant colony optimization for feature selection and classification of microcalcifications in digital mammograms. In: International Conference on Advanced Computing and Communications (2006) 28. Karnan, M., et al.: Hybrid Markov random field with parallel Ant Colony Optimization and fuzzy C means for MRI brain image segmentation. In: 2010 IEEE International Conference on Computational Intelligence and Computing Research (2010)
Prediction of Fetal Distress Using Linear and Non-linear Features of CTG Signals E. Ramanujam1(&), T. Chandrakumar2, K. Nandhana1, and N. T. Laaxmi1 1
Department of Information Technology, Thiagarajar College of Engineering, Madurai 625015, Tamil Nadu, India [email protected], [email protected],[email protected] 2 Department of Computer Applications, Thiagarajar College of Engineering, Madurai, Tamil Nadu, India [email protected]
Abstract. Cardiotocography records the fetal heart rate and Uterine contractions which is used to monitor the fetal distress during delivery. This signal supports the physicians to assess the fetal and maternal risk. During the last decade, various technique have proposed computer-aided assessment of fetal distress. The drawback of these techniques are complex in extraction of features and costlier in classification algorithm utilized. This paper proposes a feature selection technique Multivariate Adaptive Regression Spline and Recursive Feature Elimination to evaluate 48 numbers of linear and non-linear features and to classify using Decision tree and k-Nearest Neighbor algorithms. Experimental results shows the performance of the proposed with state-of-the-art techniques. Keywords: Cardiotocography Feature extraction Feature selection Decision tree Fetal heart rate Uterine contractions
1 Introduction Electronic Fetal Monitoring (EFM) from Cardiotocography (CTG) signals is a widely used technique to identify the fetal distress. CTG has two signals - Fetal Heart Rate (FHR) and Uterine Contraction (UC). Obstetricians use these recorded signals to assess the condition of the fetus during antepartum and intra-partum periods [1]. In general, lack of sufficient oxygen during the pregnancy results in various fetal distress conditions which can only be diagnosed through CTG signals. Manual Interpretation of CTG signals by a gynecologist for analyzing those abnormalities depend on the expertise level which shows high false positive rate [2]. In this case there, is a more chance of misclassifying the normal recordings as pathological. To overcome this issue, there requires a computer-aided diagnosis to provide better classification of CTG signals. In recent years, various techniques have proposed a computer-aided system which follows the process of data collection, pre-processing, feature extraction and classification. The data collections are often obtained from various benchmark data [3] available online, preprocessing is to clear the noise and signal attenuations. Feature extraction is the core for the classification process. The research work [1, 4–10] has utilized various numbers © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 40–47, 2020. https://doi.org/10.1007/978-3-030-37218-7_5
Prediction of Fetal Distress Using Linear and Non-Linear Features
41
of features to show up their performance in classification. The major challenge with existing techniques is the efficient selection of minimal features and their evaluation. This paper uses feature selection technique Multivariate Adaptive Regression Spline (MARS) and Recursive Feature Elimination (RFE) to select minimal number of linear and non-linear features collected from state-of-the-art techniques for better classification performance. The paper is structured as follows. Section 2 describes the existing feature extraction techniques with their corresponding linear and non-linear features, Sect. 3 discusses the dataset used, Sect. 4 deals with features of various state-of-the-art techniques, Sect. 5 proposes feature selection and Sect. 6 discusses the evaluation of features using various classification algorithms and Sect. 7 provides the conclusion.
2 Literature Review In the fetal distress classification, researchers have categorized features as linear and non-linear features. Most of the researchers used the combination of both linear and non-linear features. Iraj et al. in [4] have used Multi-layer of sub ANFIS topology technique with 21 features of CTG signal. In addition, deep stacked sparse autoencoders and deep ANFIS were utilized to achieve the accuracy of 99.503%. Ramla et al. in [5] have utilized 20 statistical features to predict the CTG signals using CART decision tree algorithm and achieved accuracy of 90.12% in 5-fold cross validation. Comert et al. in [6] have utilized 21 set of features from CTG signals. The proposed work achieved high classification performance through ensemble and neural network classifiers. Deb et al. in [7] has proposed a strong and robust mathematical analysis model by applying Hilbert Huang Transform (HHT) and extract 6 numbers of linear statistical features to classify CTG signals. The proposed achieved 98.5% accuracy for Multi-Layer Perceptron (MLP) and 98.2% accuracy for Support Vector Machine (SVM). Comert et al. in [1] have used various 6 linear features and 4 Non-linear features and achieved 92.40%, 82.39%, 79.22% accuracy through Artificial Neural Network (ANN) algorithm at three stages of analysis. Karabulut et al. in [8] provided a computer based classification approach for determining a fetus to be normal or pathological. 5 numbers of linear and non-linear features are extracted to evaluate with ensemble of decision trees using Adaptive Boosting, Bayesian network, and RBFN. The proposed achieved to the maximum of 92.6% accuracy for Bayesian Network. State-of-the-art techniques have utilized various Linear and non-linear features to show up their performance in classification. However, the major part of proposed performance depends on costly classifiers as discussed above. Most of classifiers are ensemble based and neural network based techniques. Naturally, the ensemble and neural network based algorithms provides better performance while training more number of time than simpler classification algorithms. To avoid costliness and to provide better accuracy, this paper proposes a simple feature selection algorithm MARS [9] and RFE [10] to provide better performance in classification through k-NN and Decision tree (Information Gain). Overall architecture of the proposed paper is shown in Fig. 1.
42
E. Ramanujam et al.
CTG Signal database
Feature SelecƟon Feature ExtracƟon
MARS
RFE
classificaƟon
Fig. 1. Overall architecture of the proposed work.
3 Data Set Collection The fetal CTG data set used in the proposed evaluation technique is available as a public source downloadable from Phsyiobank [3]. The database contains 552 CTG recordings which have both normal and pathological signals recorded during the period of 2010 and 2012. The signals are of duration 90 min prior to delivery and some has lesser duration due to immediate delivery and are sampled at 4 Hz digitized form.
4 Feature Extraction The primary concern of the proposed technique is to select/evaluate the linear and nonlinear features to achieve higher accuracy using simple classification algorithms. These features are collected through a complete research on various papers [1, 4–10] that deals with ECG, EEG and fetal distress classification. The Linear features of FHR are listed in Table 1 and non-linear features are listed in Table 2 with their decriptions.
Table 1. Linear features S. No. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.
Feature
Description
Mean Standard deviation Variance Long term irregularity Range Interval index Short term variability Median Skewness Kurtosis Mean absolute deviation Bandwidth 1st quartile
Arithmetic average of range of values or quantities Measure to quantify the amount of variation of set of values Measurement of spread between numbers in a dataset Describes the irregularity conditions over a particular period Difference between maximum and minimum value Defines the gross change Variability condition over a short period Midpoint of the range of values Extent to which a distribution differs from normal distribution Sharpness of the peak of frequency distribution curve Average of the absolute deviation from the central point The range of frequencies within a given band Middle number between the smallest number and the median of the dataset (continued)
Prediction of Fetal Distress Using Linear and Non-Linear Features
43
Table 1. (continued) S. No. 14.
Feature
Description
3rd quartile
15. 16.
Frequency bw.nrd0
17. 18. 19. 20. 21. 22. 23. 24.
bw.nrd Root mean square Harmonic mean Geometric mean Sterling Sharpness Roughness Rugosity
Middle value between the median and the highest value of the data set Number of occurrences in a dataset Rule-of-thumb for choosing the bandwidth of a Gaussian kernel density estimator More common variation given by Scott by a factor of 1.06 Square root of the mean of the squares of values Reciprocal of the mean of the reciprocals The central number in a geometric progression Computes the Sterling ratio of the univariate time series Return per unit risk Total curvature of a curve or a time wave of spectrum Small scale variations in the height of the surface Table 2. Non-linear features
S. No. 1. 2. 3. 4. 5. 6. 7. 8. 9 10 11 12 13 14 15 16. 17. 18. 19. 20. 21. 22. 23. 24.
Feature Approximate entropy Sample entropy Fast entropy Fast sample entropy 1-norm Spectral norm Amplitude Shannon entropy Temporal entropy Hs Hrs He Hal Ht Fractal dimension Fractal dimension Burg$aicc Fractal dimension Burg$phi Burg$se.phi Burg$se.theta Burg$theta Burg$sigma2 SFM
Description Quantifies the amount of regularity Measures the complexity of time series Calculates ApEn in an efficient way Calculates SampEn in an efficient way Sum of absolute value of the columns Maximum singular value of a matrix Angular peak amplitude of the flapping motion Information-theoretic formulation of entropy Entropy of a temporal envelope Simple R/S estimation Corrected R over S exponent Empirical hurst exponent Corrected empirical hurst exponent Theoretical hurst exponent A dim-dimensional array of fractal dimensions A dim-dimensional array of scales Autocorrelation or co-variance Size of the actual sliding window Vector of AR coefficients Standard error of AR coefficients Defaults to 0 Defaults to 0 White noise variance Ratio between the geometric and the arithmetic mean
44
E. Ramanujam et al.
5 Feature Selection/Evaluation To evaluate the linear and non-linear features collected from various research works, the proposed uses MARS and RFE to select the best features. Feature selection is one of the core concepts in machine learning that have huge impacts in the performance of the model. The irrelevant features may degrade the performance of the algorithm also it may increase the computation time of the classification algorithm. Selection of relevant features has following advantages such as reduces overfitting, improves accuracy and reduced training time. 5.1
Multivariate Adaptive Regression Spline (MARS)
MARS [11] is a variety and extension of linear regression analysis also termed as nonparametric regression technique. The model automatically categorizes the linearity and interaction between the features using recursive partitioning method as like regression trees and decision tree. MARS is more flexible and scalable than linear regression models, easy to understand and interpret, can handle both categorical and continuous data. ^f ð xÞ ¼
Xk
c B ð xÞ i¼1 i i
ð1Þ
The model is a weighted sum of basis function Bi ð xÞ where each ci is a constant coefficient. 5.2
Recursive Feature Elimination (RFE)
RFE [12] is a feature selection method that fits a model using ranking of feature coefficient and feature importance to remove the weakest feature until the number of features are reached. RFE uses the model accuracy to determine the most important feature as well as to identify the weak features. As a result, least important features are eliminated from the dataset. RFE procedure is recursively iterated on the pruned dataset until the desired result is obtained. To identify the optimal solution of features crossvalidation is used with RFE to produce best features. Table 3. Classification performance (accuracy) of extracted 48 set of features Methods Training-testing (60–40) Training-testing (80–20) 10-fold cross validation 5-fold cross validation
Decision tree k-NN 84.14 85.41 87.67 90.57 89.03 93.46 90.16 94.15
Prediction of Fetal Distress Using Linear and Non-Linear Features
45
Fig. 2. Plotting of 20 best features by MARS feature selection
Table 4. Classification performance of best 16 (8-linear and 8 non-linear) features selected through MARS and RFE. Methods
MARS RFE Decision tree k-NN Decision tree k-NN Training-testing (60–40) 89.03 90.19 88.24 90.16 Training-testing (80–20) 93.91 93.67 93.35 93.67 10-fold cross validation 96.89 95.82 97.41 94.48 5-fold cross validation 94.5 95.12 91.8 90.76
6 Results and Discussions The 48 number of linear and non-linear features (24 each) of CTG signals are collected from fetal distress dataset of Physionet. These features are evaluated and selected using the proposed MARS and RFE to produce efficient accuracy through classification algorithms such as Decision tree using Information Gain and k-NN. The features are evaluated using both training-testing and cross-validated performance of classification algorithms. Initially the extracted original 48 features are tested for accuracies using classification algorithms and the accuracies are reported in Table 3. The performance of extracted 48 features shows higher accuracy of 93.46% and 94.15% for 10-fold and 5fold cross validation of k-NN algorithm. To provide best accuracy, the 48 features are further reduced by applying MARS and RFE feature selection techniques. The best 20 features based on the importance measure of MARS and RFE selection techniques are shown in Figs. 2 and 3. In Fig. 2, X-axis shows the best features and Y-axis the repetitive subset of iterations performed to extract the features. In Fig. 3, X-axis shows the importance measure and Y-axis shows the list of best features. For best accuracy, 16 best features of top 8 linear and 8 non-linear features from both MARS and RFE are evaluated individually for their performance using the classification algorithms on various training-testing (80–20, 60–40) and cross-
46
E. Ramanujam et al.
validations of 5 and 10 as shown in Table 4. On comparing the performance of selected 16 best features and original 48 features of CTG signals from Tables 3 and 4, the selected features shows better performance than the original 48 features. From the observations of Table 4, the RFE and MARS selection technique with decision tree achieves higher accuracy of 96.89% and 97.41% for 10-fold cross validation.
Fig. 3. Plotting of 20 best features by RFE feature selection
Table 5. Performace comparison of proposed with various state-of-the-art techniques for CTG signal classification. Reference [4] [5] [6] [7] [1] [8] Proposed
Method ANFIS CART ANN HHT + MLP ANN Adaptive Boosting + Bayesian MARS + Decision Tree RFE + Decision Tree
# of features 21 20 21 6 10 5 16 16
Accuracy (%) 99.503 90.12 99.73 (Sensitivity) 98.5 92.40 92.6 96.89 97.81
To justify the performance of the proposed feature selection with the state-of-the-art techniques, the comparison has been made in Table 5 with the features and accuracies obtained with various research techniques. On comparing the performance of the proposed with the state-of-the-art techniques, the proposed ranked 3 where research work [4] ranked 1 and research work [7] ranked 2. However, these techniques used Neural network based classifier and dimensionality reduction techniques for their better performance. This makes the algorithm costlier in terms of execution than the proposed technique. The performance of the proposed certainly outperforms the state-of-the-art technique using simple feature selection technique with less complex classification algorithms.
Prediction of Fetal Distress Using Linear and Non-Linear Features
47
7 Conclusion In this proposed work, the linear and non-linear features of various state-of-the-art techniques are efficiently selected through feature selection techniques such as MARS and RFE to produce better classification performance with minimal features. Out of 48 extracted features of CTG signals 16 best features are selected through proposed and evaluated using classifiers such as k-Nearest Neighbor and Decision tree. The proposed achieved 96.89% and 97.81% of accuracy for the combination of MARS and RFE with Decision Tree algorithm to the CTG signals. In future, the proposed can be implemented in realtime CTG signal machine for classifications. Compliance with Ethical Standards. All author states that there is no conflict of interest. We used our own data. No animals/human are not involved in this work.
References 1. Cömert, Z., Fatih, A.: Evaluation of fetal distress diagnosis during delivery stages based on linear and nonlinear features of fetal heart rate for neural network community. Int. J. Comput. Appl. 156(4), 26–31 (2016) 2. Sameni, R.: A review of fetal ECG signal processing ıssues and promising directions. Open Pacing Electrophysiol. Ther. J. 3, 4–20 (2010) 3. Goldberger, A.L., Amaral, L.A.N., Glass, L., et al.: PhysioBank, PhysioToolkit, and PhysioNet. Circulation 101(23), E215–E220 (2000) 4. Iraji, M.S.: Prediction of fetal state from the cardiotocogram recordings using neural network models. Artif. Intell. Med. 96, 33–44 (2019) 5. Ramla, M., Sangeetha, S., Nickolas, S.: Fetal health state monitoring using decision tree classifier from cardiotocography measurements. In: 2018 Second International Conference on Intelligent Computing and Control Systems (ICICCS), pp. 1799–1803. IEEE (2018) 6. Cömert, Z., Kocamaz, A.F.: Comparison of machine learning techniques for fetal heart rate classification. Acta Phys. Pol. A 132(3), 451–454 (2017) 7. Deb, S., Islam, S.M.R., Johura, F.T., Huang, X.: Extraction of linear and non-linear features of electrocardiogram signal and classification. In: 2017 2nd International Conference on Electrical and Electronic Engineering (ICEEE), pp. 1–4. IEEE (2017) 8. Karabulut, E.M., Ibrikci, T.: Analysis of cardiotocogram data for fetal distress determination by decision tree based adaptive boosting approach. J. Comput. Commun. 2(09), 32 (2014) 9. Padmavathi, S., Ramanujam, E.: Naïve Bayes classifier for ECG abnormalities using multivariate maximal time series Motif. Procedia Comput. Sci. 47, 222–228 (2015) 10. Balayla, J., Shrem, G.: Use of artificial intelligence (AI) in the interpretation of intrapartum fetal heart rate (FHR) tracings: a systematic review and meta-analysis. Arch. Gynecol. Obstet. 300, 1–8 (2019) 11. Friedman, J.H.: Multivariate adaptive regression splines. Ann. Stat. 19(1), 1–67 (1991) 12. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)
Optimizing Street Mobility Through a NetLogo Simulation Environment Jesús Silva1(&), Noel Varela2, and Omar Bonerge Pineda Lezama3 1
Universidad Peruana de Ciencias Aplicadas, Lima, Peru [email protected] 2 Universidad de la Costa (CUC), Calle 58 # 55-66, Baranquilla, Atlantico, Colombia [email protected] 3 Universidad Tecnológica Centroamericana (UNITEC), San Pedro Sula, Honduras [email protected]
Abstract. The routes and streets make it possible to drive and travel through the cities, but unfortunately traffic and particularly congestion leads to drivers losing time while traveling from one place to another, because of the time it takes to transit on the roads, in addition to waiting times by traffic lights. This research introduces the extension of an agent-oriented system aimed at reducing driver waiting times at a street intersection. The simulation environment was implemented in NetLogo, which allowed comparison of the impact of smart traffic light use versus a fixed-time traffic light. Keywords: Multi-agent systems NetLogo
Agent-oriented programming Traffic
1 Introduction Various disciplines have studied the problem of mobility in cities in order to understand the phenomenon [1, 2, 3], and achieve a viable solution [4, 5, 6]. Some authors have found that drivers stuck in high congestion conditions have high levels of stress [7], which can lead them to modify their behavior while driving their vehicle. As a result, it’s important to change the way people drive, and find better solutions to deal with traffic, such as smart traffic lights or even cars communicating with each other. The rapid growth of cities, the location of educational institutions and the diversity of jobs lead to an increase in the daily flow of traffic, leading to an increase in the number of vehicles and means of transport on the streets in and in cities in general. While several domains have addressed the problem of traffic congestion [8–12] we consider that the implementation of a simple approach, implementing agent-based modeling and simulation techniques, with a low-medium difficulty level of implementation is able to be effective and viable for its application in real life. There are a large number of tools and platforms such as NetLogo [13] and Repast, and a large number of applications, where the most common uses take place in social simulations
© Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 48–55, 2020. https://doi.org/10.1007/978-3-030-37218-7_6
Optimizing Street Mobility Through a NetLogo Simulation Environment
49
and optimization problems such as human behavior, urban simulation, traffic management, among others [14]. In [15], the authors present a traffic simulation in NetLogo, at an intersection for optimization through an agent-based approach, which decreases the wait time of vehicles, preventing them from waiting for an indeterminate time or excessively long; with the ultimate goal of reducing traffic congestion at a two-lane intersection - doublehand and double-lane, where vehicles circulate. In this context, the objective pursued by this article is to introduce an extension of [15], adding to the simulation, left turn traffic lights, taxis and pedestrians; and adapting the smart traffic light to the new scenario.
2 Description of Modeled Agents The agents implemented are those seen in Tables 1, 2 and 3, which describe the agents by their attributes, by the information of the environment and their behavior. Table 1. Description of the intersection agent. Description Attributes
Information censored of the environment Behavior
2.1
Intersection agent Traffic light: current state of traffic lights, Intelligence (Yes or No): Defines the algorithm to use for handling traffic lights, Minimum green time, Time since the last light change, Direction, It’s turn, It is car: If you apply cars or pedestrians Number of cars behind the limits of the intersection in all directions, Car wait times Change traffic light: Without intelligence: it is done for defined times, the traffic light is green for a certain time and then changes to red, With intelligence: it does so from the censed data of the environment. From an algorithm that counts how many cars are waiting on the other side and how long they’ve been waiting decides whether to change or continue in the current state
Smart Traffic Light Algorithm
The intelligent traffic light algorithm developed coordinates two traffic lights (X and Y) one for each direction of the intersection of the streets, with their respective left turn traffic lights and two pedestrians. In addition, consider 4 pedestrian traffic lights, which at the same time are all green or red, so the algorithm considers it as a unit. 5 different states are distinguished, described in Table 4. The algorithm will cause the heavier traffic light to turn green, unless the minimum green time set through the user controls has not yet been completed. It is worth noting
50
J. Silva et al. Table 2. Description of agent auto
Description Attributes
Information censored of the environment
Behavior
Auto agent Speed: current speed of the car, Patience: the remaining tolerance level that the driver has to wait for, Direction: where the car is going, Current Lane, Current Coordinates, Turn right (Yes or no): Indicates whether the car will rotate at the intersection, Turn left (Yes or no): Indicates whether the car will rotate in the intersection, Standby time at traffic light, Acceleration, Deceleration, Taxi (yes or no): Indicates if it is a taxi, It is free (Applies only for taxis): Indicates if you have passengers, Shocked Nearby cars: using a 15° viewing angle and a radius that varies depending on the speed of the car, Maximum street speed, Light color of traffic light, Cars in adjacent lane, Position within the environment, Pedestrians waiting for taxis Accelerate: It does so if you don’t have any cars in your view spectrum going at a speed lower than yours, if you don’t exceed the speed, and if you’re not at the pre-traffic light limit, while it’s red, Brake: It does so if you have a car in your vision spectrum going at a speed lower than yours and if you are at the pre-traffic light limit, while it is in red, Turn right: it does if you are in the right lane when you reach the critical area of the intersection with a probability defined by a variable and the traffic light is green, Turn left: It does if it is in the left lane when you reach the critical area of the intersection with a probability defined by a variable and the turn light is green, Change lanes: it does if the driver’s patience reaches 0 (it is decreasing as the waiting time increases) and if there is no car in its possible future position, Lift a passenger: It does if it is a taxi and is free and passes by a pedestrian who is waiting for a taxi, Crash: Overlapping with another vehicle is considered shocked and stays still in place until it is removed after a time limit
that the green light corresponds to the direction with the highest weight and the weights i, are calculated according to (1): Pi ¼ Ai þ Tsri Fa
ð1Þ
Optimizing Street Mobility Through a NetLogo Simulation Environment
51
Where: • i, are the weight for the x direction, Y, Left turn axis X, Left turn axis Y and Pedestrians • Ai, agents (cars or pedestrians) arriving at the intersection at the traffic light (braking and heading to) • Tsri , red light time • F, it’s the adjustment factor Table 3. Description of the pedestrian agent Description Attributes
Information censored of the environment Behavior
Agent pedestrian Speed: pedestrian’s current speed, Direction: where the pedestrian goes, Current coordinates, Turn right (Yes or no), Turn left (Yes or not), Rotate Before: Indicates whether to rotate before crossing, rotate after: Indicates whether to rotate after crossing, Waiting time at the traffic light Pedestrians nearby, Traffic light color, Position within the environment, Taxi waiting for the pedestrian to come up Accelerate: It does so if you don’t have any pedestrians exactly in front of you, Brake: It does so if you have a pedestrian in front of you going at a speed lower than yours and if you are at the pre-traffic light limit, while it is in red, Turn right: it does if you are in a corner with the probability defined to bend, Wait for a taxi: It does so if it meets the defined probability to wait for the taxi and is in an area where one can be stopped, Get in a taxi: It does if you are waiting for a taxi and one stops to get it up
3 Results In order to explicitly determine the feasibility of the developed algorithm and demonstrate emerging behavior, the following six (6) scenarios have been raised, in which the frequency of agents and the adjustment factor for the smart semaphore will be varied. An adjustment factor 0.001 and 0.01 were selected. Increasing the factor value for turn traffic lights we manage to increase the weight for these traffic lights, modifying their priority, since the selection of the next traffic light to change to green, it will be the traffic light with the greatest weight. In addition, each scenario will be evaluated with a turn frequency of 20 and 80. • Scenario 1: Same frequency of agents in all directions, same smart traffic light factor (0.001) at all traffic lights.
52
J. Silva et al. Table 4. States of traffic lights Traffic lights X-axis X turn traffic light Y-axis Turn traffic light Y Pedestrians
E1 V r r r r
E2 r V r r r
E3 r r V r r
E4 r r r V r
E5 r r r r V
• Scenario 2: Same frequency of agents in all directions. Smart traffic light factor 0.001 at X axis traffic lights, Y axis, and pedestrians. Smart traffic light factor 0.01 at turn traffic lights. • Scenario 3: Agent generation frequency of 10 on the Y axis, and agent generation frequency of 3 on the X axis and pedestrians. Same smart traffic light factor (0.001) at all traffic lights. • Scenario 4: Agent generation frequency of 10 on the Y axis, and agent generation frequency of 3 on the X axis and pedestrians. Smart traffic light factor 0.001 at X axis traffic lights, Y axis, and pedestrians. Smart traffic light factor 0.01 at turn traffic lights. • Scenario 5: Agent generation frequency of 15 on the Y axis, and agent generation frequency of 1 on the X axis and pedestrians. Same smart traffic light factor (0.001) at all traffic lights. • Scenario 6: Agent generation frequency of 15 on the Y axis, and agent generation frequency of 1 on the X axis and pedestrians. Smart traffic light factor 0.001 at X axis traffic lights, Y axis, and pedestrians. Smart traffic light factor 0.01 at turn traffic lights. In the case of scenario 1, the average wait time is greatly reduced with the smart traffic light. The problem is that it increases the maximum wait value. In the case of the frequently 22 spin, the maximum timeout was not found in the 10010 ticks. This is because having such a low turn factor and so few cars can bend, the weight of that traffic light never gets higher than others. With the frequency of 82 turn, cars double faster, but wait longer than cars controlled by a fixed-time traffic light. In scenario 2, with another semaphore factor, the values change because, although the timeout value is not reduced as much as with the previous scenario, the maximum wait value takes a dimensioned value and less than the value in scenario 1. For scenario 3, the average wait value is significantly reduced with the intelligent system, by more than 69% compared to the smart semaphore. The maximum waiting time increases, because there are few cars that rotate, and considering that the turn factor is equal to the rest this takes a long time to occur; 10010 ticks never happened. A higher turning frequency allows for better turning flow, and although the average wait is not reduced as much as with 22, the maximum wait value is similar to that with non-smart traffic light.
Optimizing Street Mobility Through a NetLogo Simulation Environment
53
However, in scenario 4, smart traffic lights also improve the average wait relative to the smart traffic light for both turning frequencies. The lowest average wait time of all simulations is obtained for a frequency of 82 rotations; it is reduced by approximately 59% while also reducing the maximum waiting time by 33%, compared to the fixed traffic light. With the 21 turn the average wait time is reduced less and the maximum wait time increases too much. Then, in Scenario 5, something similar to Scenario 3 happens, the average wait times show an even greater improvement. At the same time, you can see that in situations with very different flows the factor 0.0015 can leave a car waiting a long time. Finally, in scenario 6, for a larger car flow gap (1–15), factor 0.015 also fails to make much difference with the other factor 0.001, but manages to reduce the maximum wait value when the turn flow is low. However, the average wait improvement obtained between the smart semaphore and the fixed time in all scenarios is remarkable (average 43%, varying [14%–69%]).
4 Analysis of Scenarios Smart traffic lights in all scenarios improve the average waiting of vehicles (cars and taxis) at the traffic light. A better result was obtained with a higher turning frequency (80 compared to 20), preventing vehicles from waiting for excessive time. The 0.0015 factor introduced in the authors’ previous article [15] is not recommended for this environment as if there are not many cars that bend, these cars will have a long wait. Decreased the factor to 0.01 at turn traffic lights and keeping 0.0015 in the rest, improves the result. The system is optimized when more cars want to turn left and exert more weight to do so. When defining what kind of vehicle flows works best, you should analyze how the average wait time influences. In the case of a similar flow in all directions, the average wait time is slightly lower but the cost of the extra waiting time is minimal so, at an intersection with this type of flow it would be convenient if there are many cars spinning. In the event that the flow is higher at an intersection, but not excessively higher, it is convenient no matter how many cars turn left because the average improvement is very high (for a turn of 20 low from 1258 to 686 and for a turn of 82 low from 914 to 388). In the scenario where the flows are very disparate is not convenient since, although the average time drops almost three times in both cases, the maximum wait time goes up too high (for a turn of 20 from 4978 to 8281 and for a turn of 80 from 3312 to 7478), so it’s not convenient. We conclude that the intelligent algorithm manages to reduce the average and maximum waiting time for a medium frequency difference between the two directions and a 0.015 factor. If the frequency of agents in the two directions is similar or greater, the average waiting time with respect to the fixed time traffic light is reduced, but the maximum waiting time will be a little longer for similar agent frequencies and a little more double for very different frequencies between the X and Y axes.
54
J. Silva et al.
5 Conclusions Traffic is a reality that affects many people, and there is evidence of decreased driver wait times if dynamic and adaptive traffic control systems are implemented. The agentoriented approach has essential features that make it easy to model and simulate. There are various applications, systems and simulations that allow the area to remain active, because the increase in society’s demand for improvement is not only in infrastructure but also in management systems. The traffic light control system at a complex intersection described, manages to optimize the average wait time of vehicles at a intersection of streets with double lanes applying an algorithm that evaluates weights for traffic directions (X and Y), turns (right and left), and pedestrians. Reducing the waiting time at the traffic light is achieved with an adjustment factor of 0.015 for right and left turns and an adjustment factor of 0.0015 for all other directions and a turning frequency of 82. The best-optimized scenario is achieved with vehicle flows in the X and Y direction, which is moderately different. Analysis of the different scenarios makes it possible to understand how vehicle flow variation (cars and taxis) affects pedestrian traffic, and rotation frequency variation. The agent-oriented environment provides an appropriate context for performing this analysis. It allows the extension of agents such as bicycles, public transport, ambulances, motorbikes, among others. We emphasize that the simulation carried out shows that it is feasible to make a sensitive reduction in the wait times of a traffic light, at an intersection of complex streets, applying a simple algorithm, aspect that has a positive impact on the life of a City. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Banos, A., Lang, C., Marilleau, N.: Agent-Based Spatial Simulation with NetLogo, vol. 1. Elsevier, Amsterdam (2015) 2. Amelec, V.: Validation process container for distribution of articles for home. Adv. Sci. Lett. 21(5), 1413–1415 (2015) 3. Nagatani, T.: Vehicular traffic through a sequence of green-wave lights. Physica A Stat. Mech. Appl. 380, 503–511 (2007) 4. Bui, K.H.N., Jung, J.E., Camacho, D.: Game theoretic approach on real-time decision making for IoT-based traffic light control. Concurr. Comput. Pract. Exp. 29(11), e4077 (2017) 5. Bui, K.H.N., Camacho, D., Jung, J.E.: Real-time traffic flow management based on interobject communication: a case study at intersection. Mob. Netw. Appl. 22(4), 613–624 (2017) 6. Rao, A.M., Rao, K.R.: Measuring urban traffic congestion-a review. Int. J. Traffic Transp. Eng. 2(4), 286–305 (2012)
Optimizing Street Mobility Through a NetLogo Simulation Environment
55
7. Hennessy, D.A., Wiesenthal, D.L.: Traffic congestion, driver stress, and driver aggression. Aggress. Behav. 25(6), 409–423 (1999) 8. Putha, R., Quadrifoglio, L., Zechman, E.: Comparing ant colony optimization and genetic algorithm approaches for solving traffic signal coordination under oversaturation conditions. Comput. Aided Civ. Infrastruct. Eng. 27(1), 14–28 (2012) 9. Teodorović, D., Dell’Orco, M.: Mitigating traffic congestion: solving the ride-matching problem by bee colony optimization. Transp. Plan. Technol. 31(2), 135–152 (2008) 10. Bazzan, A.L., Klügl, F.: A review on agent-based technology for traffic and transportation. Knowl. Eng. Rev. 29(3), 375–403 (2014) 11. Oviedo, C., Campo, A.: Aproximación al uso del coeficiente alfa de Cronbach. Revista Colombianan de Psiquiatría 34(4), 572–580 (2005) 12. Demsar, J., Curk, T., Erjavec, A., Gorup, C., Hocevar, T., Milutinovic, M., Mozina, M., Polajnar, M., Toplak, M., Staric, A., Stajdohar, M., Umek, L., Zagar, L., Zbontar, J., Zitnik, M., Zupan, B.: Orange: data mining toolbox in Python. J. Mach. Learn. Res. 14(1), 2349– 2353 (2013) 13. Amelec, V., Alexander, P.: Improvements in the automatic distribution process of finished product for pet food category in multinational company. Adv. Sci. Lett. 21(5), 1419–1421 (2015) 14. Tan, F., Wu, J., Xia, Y., Chi, K.T.: Traffic congestion in interconnected complex networks. Phys. Rev. E 89(6), 062813 (2014) 15. Battolla, T.F., Fuentes, S., Illi, J.I., Nacht, J., Falco, M., Pezzuchi, G., Robiolo, G.: Sistema dinámico y adaptativo para el control del tráfico de una intersección de calles: modelación y simulación de un sistema multi-agente. En: Simposio Argentino de Inteligencia Artificial (ASAI) – Jornadas Argentinas de Informática, Universidad de Palermo, Septiembre de 2018
Image Processing Based Lane Crossing Detection Alert System to Avoid Vehicle Collision B. S. Priyangha1(B) , B. Thiyaneswaran2 , and N. S. Yoganathan2 1 Communication Systems, Sona College of Technology, Salem, India
[email protected] 2 Department of Electronics and Communication, Sona College of Technology, Salem, India
Abstract. In recent years, the lackadaisical lane crossing of vehicles causes major accidents on the highway roads. Particularly, in India different lane systems are available, they are 4-way, 6 way, and 8-way lane systems. The proposed work helps to identify the unwanted lane crossing activities and send an alert to the vehicle driver. A smart mobile is fixed in the dashpot and captures the lane video. The video is then converted into frames and the frame is detected using median filter, thresholding, and Hough transforms. In the lane, track marking is detected using pattern match and Kalman filter. The detected lane track is marked in different coloring and shown in the smart phone display. The proposed work is tested and compared with the existing lane crossing algorithms. Keywords: Lane · Hough transform · Median filter · Android
1 Introduction In India, careless and rash driving can cause other drivers and passengers in risky situations. On highways one of the most reasons for the accident is careless lane crossing. A detailed research work is going to improve safety in the road transport system. Lane with separations regulates the traffic from preventing accidents. The lane crossing system helps the driver while crossing the lane. The driver may change the lane by intention or due to their careless attitude. These systems are designed to minimize the accidents caused by disturbances, collision and drowsiness. Lane warning/keeping systems are based on video sensors, laser sensors and infrared sensors. The lane departure warning system detects lane lines from real-time camera images fed from the front-end camera of the automobile by employing the principle of Hough transform. Advanced driverassistance system helps driver in the driving process with a safe human-machine interface. ADAS is dependent on inputs from multiple data sources that includes automotive imaging, image processing, LiDAR, computer vision, radar. Additional inputs are from the vehicular ad-hoc network.
2 Related Work The lane detection is done by Edge Drawing (ED) algorithm and works faster with a particular Region of Interest. The algorithm was deployed in intel 3.3 GHz processor. © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 56–61, 2020. https://doi.org/10.1007/978-3-030-37218-7_7
Image Processing Based Lane Crossing Detection Alert System
57
The final result analysis shows that for processing of each image takes up a time of 13 ms [1]. A robust wiped out point detection in the lanes was proposed by Youjin et al. [2]. Initially the vertical lines are detected in the frames. The priority-based directional approach is used to the remove noises present in the frame. A computable vision-built lane finding approach is proposed by Yurtsever et al. [3]. In this approach, closely related 2 frames are selected. The ROI of corresponding binary are extracted. A AND operation was imposed on binary images. A bird’s eye view was extracted from the resultant image. A nature and causes of human distractions which leads accident are discussed in this paper. The author proposed statute based approach to determine the causes [4]. A method to forecast a vehicle’s path and to sense the lane changes of adjacent vehicles. This technique is adopted as two parts such as intentional driving approximation and vehicle path forecast [5]. The use of smartphones to sense the driving condition of vehicles on highways using lane-level localization and implementation of phone hardware to capture the change of lane behaviors. The built-in sensors are available in the mobile phone provides the required data [6]. The author used adaptive traffic management system, Hough transforms, angles based on maximum likely hood, dynamic based pole identification, and region of interest. Further they are given suggestions to improve the system using Geo-based information and google maps [7]. A histogram-oriented gradient, region of interest, and Haar features based target detection system was proposed for two-way lane system. The classification of system is performed using support vector machine [8]. An accurate lane detection method was proposed using constrained disappearing points. The algorithm was suitable for various road conditions such as misty conditions, constrained lane detection method which performs accurately in a variety of road conditions based on stereo vision, echo images [9]. An effective method for sensing, tracking the lanes in spatiotemporal frames, which are constructed based on scanlines and the consecutive scan lines. It provides the consistent parallel vertical lines [10]. A constantly varying functionality and parting of vehicle based on the distance was proposed to detect the lanes. A warning signal was issued when the vehicle exceeds threshold values. The distance measure is based on the Euclidean distance transform [11]. The top view of the forward lane frame was computed using arbitrary consent approach. Further lanes are detected using region of interest using curvature lane model [12]. A monocular camera is used for detecting lanes and ROI is performed using computer vision technologies and the adjacent lanes are also estimated in this method [13]. Fourier based line detector method is used for detecting the lanes on the roads. It uses the Hough transform to detect lanes and finds the orientation of the potential lines [14]. The author used region of interest approach for detecting bending lanes, and edges are identified using a canny edge detector and Hough transform. A RANSAC and Kalman approach if further applied to detect the lanes effectively [15]. A Snake part-B type was
58
B. S. Priyangha et al.
used in the lane identification and tracking. The author used CHEVP algorithm to find the left and right end of the road [16].
3 Proposed Work Our proposed system is based on detecting the lane when the vehicle approach departure unintentionally. The proposed system captures the video using smart phone. The proposed implementation efficiently detects the lane crossing and also provides a voice-based announcement. 3.1 Block Diagram In this section we split the road and lane detection task into different blocks, and enumerate the possible approaches for the implementation of each block. The block diagram of the model consists of input video, lane finding model, lane tracing, parting warning and output. The input to the lane detection is live streaming video which is fragmented into frames and processed to remove noise in the video. The lane identification is required for lane crossing announcement system. The lane identification system detects lane points in a complicated atmosphere. The lane detection model consists of median filter and Auto-thresholding blocks to detect the lane boundaries in the current video frame. The lane points form the straight lines. Hough transform with which the lane lines are detected based on the parameterization i.e., finding a parametric equation for the curve are lined. Hough transform uses 2Darray called accumulator array to detect the existence of line by the equation ρ = x cos θ + y sin θ . For each pixel and its neighborhood pixel (ρ, θ ) value is determined and stored in bin. Compares the (ρ, θ ) value with the pixel and determines the straight line. This system uses the limited highest block to find the Polar coordinate position of the lane points (Fig. 1).
LANE DETECTION
LANE TRACKING
Android Camera
DEPARTURE WARNING
Android display/ Speaker Fig. 1. Block of proposed system
Image Processing Based Lane Crossing Detection Alert System
59
Lane tracking model consist of the repository in which the identified lanes are stored based on number of epochs each lane is detected. Then matching the current lane with the data stored in the repository. If a current lane matches with the data base the lane is identified, or otherwise it updates new data in the database. Furthermore, Kalman filter is used to track the lane accurately. The departure notice system uses the Hough positions which converts the Polar quantity to Cartesian. Departure warning system uses coordinate Cartesian model to compute the distance from the markings of the lane to the camera centric. If the system boundary is smaller than the particular edge value, a cautionary signal is initiated. Then the warning is provided by smartphone display and an external speaker or smartphone speaker.
4 Results The output is obtained from the blocks created in the Simulink model with video as the input to the lane departure warning system. The images symbolise the output with detecting the lane on road. The original image is shown in the Fig. 2a and the lane detected image is shown in the Fig. 2b with lane markings on both the side of the lane. The Fig. 2c represents the model detecting middle lane in the video stream. The Simulink model finds drawback in case of detecting a vehicle at the middle of the two-way lane indicated in the Fig. 3a and b and doesn’t not indicate the departure at specific locations. The Fig. 3c represents the model not detecting the barriers and sign boards that are created temporarily in Indian highways. Further, the implement of lane detection model that includes the design of an android application to detect the lanes on the roads.
a. Original image
b. Lane detected image
Fig. 2. Demonstration of the algorithm [7]
c. The middle lane detected
60
B. S. Priyangha et al.
a. Departure not identified
b. Middle lane not detected
c. Barriers not identified
Fig. 3. Drawbacks of the existing algorithm [6]
Fig. 4. Lane change of vehicle
The above Fig. 4 indicates the fall of signal for lane change of vehicles from middle lane to left lane and back to the middle lane and the stable signal indicates the vehicle travelling in particular lane with no departure. The implementation of the system is by using an android app with the android camera as the input device and the output device as the android display. The warning to the system is based on the smartphone/android display or from the external speakers or by android’s speaker.
5 Conclusion Based on the camera centric view point, the lane crossing is identified. The lane tracking is carried out using pattern matching and Kalman filter and the departure warning is indicated by android display with coloured markings on the lane and an audio output is provided externally by the android speaker. The implementation is tested in 4-way, 6-way, and 8-way lanes.
Image Processing Based Lane Crossing Detection Alert System
61
Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Nguyen, V., Kim, H.: A study on real-time detection of lane and vehicle for lane change assistance system using vision system on highway. 21(5) (2018). 978-1-5090 2. Youjin, T., Wei, C.: A robust lane detection method based on vanishing point estimation. Procedia Comput. Sci. 131, 354–360 (2018) 3. Yurtsever, E., Yamazaki, S.: Integrating driving behaviour and traffic context through signal symbolization for data reduction and risky lane change detection. IEEE Trans. Intell. Veh. 3(3), 242–253 (2018) 4. Yeniaydin, Y., Schmidt, K.W.: A lane detection algorithm based on reliable lane markings. In: Signal Processing and Communications Applications Conference (SIU). IEEE, May 2018. 978-1-5386 5. Woo, H., Ji, Y., Kono, H.: Lane change detection based on vehicle-trajectory prediction. IEEE Robot. Autom. 2(2), 1109–1116 (2017) 6. Xu, X., Yu, J., Zhu, Y.: Leveraging smartphones for vehicle lane-level localization on highways. IEEE Trans. Mob. Comput. 17(8), 1894–1907 (2018) 7. Song, W., Yang, Y., Ful, M.: Lane detection and classification for forward collision warning system based on stereo vision. IEEE Sens. J. 18(12), 5151–5163 (2018) 8. Wei, Y., Tian, Q.: Multi-vehicle detection algorithm through combined Harr and HOG features. Math. Comput. Simul. 155, 130–145 (2017) 9. Su, Y., Zhang, Y., Lu, T., Yang, J., Kong, H.: Vanishing point constrained lane detection with a stereo camera. IEEE Trans. Intell. Transp. Syst. 19(8), 2739–2744 (2018) 10. Jung, S., Youn, J.: Efficient lane detection based on spatiotemporal images. IEEE Trans. Intell. Transp. Syst. 17(1), 289–295 (2016) 11. Gaikwad, V., Lokhande, S.: Lane departure identification for advanced driver assistance. IEEE Trans. Intell. Transp. Syst. 16(2), 910–918 (2015) 12. Yi, S.-C., Chen, Y.-C., Chang, C.-H.: A lane detection approach based on intelligent vision. Comput. Electr. Eng. 42, 23–29 (2015) 13. Shin, J., Lee, E., Kwon, K., Lee, S.: Lane detection algorithm based on top-view image using random sample consensus algorithm and curve road model. In: Sixth International Conference on Ubiquitous and Future Networks. IEEE (2014). 978-1-4799 14. Kim, H., Kwon, O., Song, B., Lee, H., Jang, H.: Lane confidence assessment and lane change decision for lane-level localization. In: 14th International Conference on Control, Automation and Systems (ICCAS), October 2014. 978-89-93215 15. Rahmdel, P.S., Shi, D., Comley, R.: Lane detection using Fourier-based line detector. In: IEEE 56th International Midwest Symposium on Circuits and Systems, August 2013. 978-1-4799 16. Wang, Y., Teoh, E.K., Shen, D.: Lane detection and tracking using B-Snake. Image Vis. Comput. 22(4), 269–280 (2004)
Stratified Meta Structure Based Similarity Measure in Heterogeneous Information Networks for Medical Diagnosis Ganga Gireesan(&) and Linda Sara Mathew Computer Science and Engineering, Mar Athanasius College of Engineering, Kothamangalam 686666, Kerala, India [email protected], [email protected]
Abstract. Electronic Health Records (EHR) offers point by point documentation on various clinical occasions that crop up amid a patient’s stay in the emergency clinic. Clinical occasions can be spoken to as Heterogeneous Information Networks (HIN) which comprises of multi-composed and interconnected articles. A focal issue in HINs is that of estimating the likenesses between articles by means of basic and semantic information. The proposed strategy utilizes a Stratified Meta Structure-centred Resemblance measure named SMSS in heterogeneous data systems. The stratified meta-structure can be made consequently and seizure opulent semantics. At that point, the com-quieting framework of the stratified meta-edifice is characterized by the prudence of the transforming matrix of meta-ways and meta-structures. The primary point is to decipher EHR information and its rich connections into a heterogeneous data for robust therapeutic analysis. This demonstrating approach takes into account the direct handling of missing qualities and heterogeneity of information. Keywords: Heterogeneous Information Networks Electronic Health Records Meta path Meta structure Stratified Meta Structure
1 Introduction Numerous genuine frameworks, for example, biological frameworks and social medium can be formed utilizing networks. Therefore, network analysis turns into a warm research point in the field of information mining. Most genuine frameworks by and largely comprised of countless acting, numerous composed constituents, similar to human social exercises, correspondence, and computer frameworks, and biological systems. In such frameworks, the participating constituents portray unified systems, that be able to be call information networks or data systems lacking forfeiture of review [1]. Nonetheless, the genuine data organizes mostly include interconnected and various composed segments. This sort of data systems is for the most part called Heterogeneous Information Networks (HIN) [2]. Contrasted with generally utilized homogeneous data organize, the heterogeneous data system can effectively join more data and contain rich semantics in hubs and joints, and therefore it figures another advancement of information mining. Deciding the closeness between items assumes essential and fundamental jobs in heterogeneous data arrange digging tasks. © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 62–70, 2020. https://doi.org/10.1007/978-3-030-37218-7_8
Stratified Meta Structure Based Similarity Measure
63
In situations of restorative field, Electronic Health Records (EHR) give nitty gritty reported data on a few clinical occasions that crop up all through a patient’s stopover in the emergency clinic [3]. Research Centre tests, prescriptions, nurture notes, and diagnoses are occasions of assorted kinds of clinical minutes. The primary point of numerous examinations is illuminating clinical basic leadership and malady analysis. Clinical occasions can be described as Heterogeneous Information Networks. Likewise, the rate of numerous illnesses can cause trouble in perceptions and their relations. In this way, fabricating a computer supported determination framework is of incredible significance in dropping mistake and convalescing social insurance. At that point create Stratified Meta-Structure to bring composite connection semantics into the system and capture those that are useful for critical thinking purposes [4]. This structure need not be indicated ahead of time, and it consolidates numerous meta-ways (meta-paths) and meta-erections (meta-structures). This gathering guarantees that: (1) purchasers don’t have to stress over how to pick the meta-way or meta erection; (2) opulent semantics can be secured. Accordingly, the SMS is fundamentally a coordinated non-cyclic graph involving item sorts with modified stratum marks. The SMS be able to naturally created by means of monotonously checking the item sorts in the netting schema or outline. The SMS as an unpredictable structure is in this way a composite connection. This is the reason the SMS can seizure deep semantics. The meaning contained in meta-ways are typically noble via their commuting milieus. Generally, the meta-structures have indistinguishable nature from the meta-paths since they all have hierarchical structures. Along these lines, the driving networks of meta-structures are demarcated by the prudence of the Cartesian item, and the driving lattice of the SMS is defined by normally blending the boundless quantity of the driving frameworks of meta-erections. SMSS is outlined by the driving framework of the SMS. Utilizing HIN for demonstrating EHR and determination expectation beats best in class models in two dimensions of general illness gathering and explicit finding forecast.
2 Related Works Similarity measure is an old activity in database and web indexes. It is essential to contemplate likeness seek in huge scale heterogeneous data systems, for example, the bibliographic systems and online life systems [2]. Intuitively, two articles are comparable on the off chance that they are connected by a few ways in the system. All things considered, most existing likeness strategies are clear for homogeneous systems. Disparate semantic implications behind ways are not taken into dread. In this manner they can’t be unswervingly connected to heterogeneous systems. By taking into consideration modified linkage ways in a system, one could get diverse closeness semantics. In this manner, start the idea of meta path (MP) centered comparability, where a MP is a route comprised of a keep running of associations particular amongst different item sorts. A large portion of the current measurements relies upon client indicated meta-paths (MP) or meta-structures (MS). For representation, PathSim and Biased Path Con-stressed Random Walk (BPCRW) [2] take a MP determined by clients as information, and Biased Structure Constrained Subgraph Expansion
64
G. Gireesan and L. S. Mathew
(BSCSE) takes a MS indicated by clients as information. These measurements are defenseless to the pre-indicated meta-ways or meta-erections. It is much trying for handlers to imply MPs or MSs. For instance, an organic data system may include a few distinct sorts of items. Moreover, the meta-paths can just restrict one-sided and generally inconvenience free semantics. Along these lines, the meta-structure are acquainted all together with catch progressively complex semantics. Indeed, the metastructure can just internment one-sided semantics too. The MPs and MSs are fundamentally two sorts of schematic structures. A heterogeneous information network (HIN) is an outline demonstrate in which entities and relations are recognized by sorts. Gigantic and multifaceted databases, for example, YAGO and DBLP, are able to symbolized as HINs. An essential issue in HINs is the calculation of vicinity, or pertinence, among two HIN articles [5]. Significance procedures are able to reused in a few solicitations like substance goals, suggestion, and data recovery.
3 Schematic Structures An information network is a coordinated diagram G = (V, E, A, R) where V speaks to a lot of items and E speaks to a lot of connections. A and R separately demonstrate the arrangement of article types and connection types. G is entitled as a heterogeneous info net (HIN) if jAj [ 1 or jRj [ 1. If not, it’s known as homogeneous data net. Heterogeneous information systems contain multi-composed articles and their interconnected relations. For any item v 2 V, it fits to an article type /ðvÞ 2 A. For each connection e 2 E, it has a place with a connection type WðeÞ 2 R. Basically, WðeÞ describes a connection from its source object type to its target item type [1]. On the off chance that two connections have a place with the indistinguishable connection type, they share a similar beginning object type just as the target object type [6].
Fig. 1. A Realistic Bibliographic information network with clear papers, author, terms and venues. The triangles, circles, squares, and pentagons separately emblematize authors or creators, papers or journals, terms or topics and venues or places.
Stratified Meta Structure Based Similarity Measure
65
Figure 1 exhibits a realistic bibliographic data connect with four real article types, that is, Author or creator (A), Paper or journal (P), Venue or place (V) and Term or topic (T). The sort Author encases four events: Yizhou Sun, Jiawei Han, Philip S. Yu, and Jie Tang. The sort Venue involves four cases: VLDB, AAAI, KDD, TKDE. The sort Paper contains six events: PathSim, GenClus, RAIN, TPFG, SpiderMine and HeteSim. The type Term includes six examples: Pattern, Information, Mining, Social, Clustering, Similarity, and Network. Each paper distributed at a setting basically has its creators and its related terms. From now on, they contain three sorts of connections: P\ ¼ [ A; P\ ¼ [ V and P\ ¼ [ T. 3.1
Network Schema
The Schema offers a meta-glassy delineation for heterogeneous data systems. Unequivocally, the system outline TG = (A, R) of G is a coordinated diagram containing the item sorts in A and the connection sorts in R [7].
Fig. 2. (a) Bibliographic network schema. (b) Biological network schema.
Figure 2 illustrates the network schema (NS) for the heterogeneous data system in Fig. 1. The biological system is an elective case of heterogeneous information network. An organic information organize includes six entity forms, that is, Gene or Genetic material (G), Tissue or matter (T), GeneOntology (GO), ChemicalCompound (CC), Substructure (Sub) and SideEffect (Si), and five connection forms, that is, GO\ ¼ [ G, T\ ¼ [ G, G\ ¼ [ CC, CC\ ¼ [ Si, CC\ ¼ [ Sub. Its system schema appears in Fig. 2(b). 3.2
Metapaths
Presently are two classes of graphic arrangements (MPs and MSs) for the systematic mapping of HIN. The two sorts pass on some semantics [1]. The semantic basically be demonstrated by clients ahead of time when utilizing these structures. There is rich semantics in Heterogeneous data organize G. This semantics can be taken by metapaths, meta structures or considerably increasingly complex schematic structures in a
66
G. Gireesan and L. S. Mathew
Network pattern. A meta-paths is essentially a substitute grouping of article types and connection types, that is:
Rj 2 R; j ¼ 1; . . .; l1 & Rj is a connection form initiating from Oj to Oj+1 where, j ¼ 1; . . .; l1 . Fundamentally, the MP involves few amalgamated meaning on the grounds that it represents a multifaceted connection. The MP, P is able to minimally signified as O1 ; O2 ; . . .Ol1 ; Ol . There are sure significant ideas identified with the meta-way, that is, length of P, way example succeeding P, switch MP of P, symmetric meta-way and commuting matrix MP of P.
Fig. 3. Some meta-paths and meta-structures.
For instance, Fig. 3(a) to (c) demonstrates MPs in the NS appeared in Fig. 3(a). The meta-ways can be thickly assigned as (A, P, A), (A, P, V, P, A) and (A, P, T, P, A). These MPs separately depict unique semantics. (A, P, A) communicates “Two authors collaborate on a paper”. (A, P, V, P, An) indicates “Two authors distribute two papers on a similar venue”. (A, P, T, P, A) depict “Two authors distribute two papers containing similar terms”. 3.3
Meta Structures
The meta-structure S = (VS, ES, Ts, Tt) is fundamentally a coordinated non-cyclic diagram by means of a solitary cradle entity form Ts and a solitary destination entity form Tt. Vs represents group of entity sorts, and ES indicates group of connection sorts. Figure 4(d)–(e) indicates different sorts of MSs for the system mapping uncovered in Fig. 3(a). These MSs are able to productively assigned as (VP (AT) PV) and (AP (VT) PA). Figure 4(f) speaks to a meta-erection aimed at the system mapping appeared in Fig. 3(b). The MS be able to trimly signified as (G (GO T) G). The MS (VP (AT) PV) communicates the most mind-boggling meaning “Two venues acknowledge two papers both containing similar terms and composed by similar authors”.
Stratified Meta Structure Based Similarity Measure
67
The meta-erection (AP (VT) PA) represents the most mind-boggling meaning “Two authors distribute two papers both containing similar terms and distributed on a similar venue”. 3.4
Stratified Meta Structure
Meta-ways and meta-erections can just catch generally straightforward semantics and their meanings can be communicated via amalgamated associations [9]. The complex associations are as a progressive structure accomplished by the return of entity forms. An epic schematic erection called Stratified Meta-Structure (SMS) be able to utilized as a comparability measure in HINs. The SMS will catch multifaceted meaning since it is mainly made out of MSs through various ranges or potentially the meta-ways via various ranges. Here, every single entity form is rehashed an unbounded amount of epochs, since catching rich semantics need to consolidate the MPs as well as MSs with various extents [10]. A SMS is fundamentally a coordinated non-cyclic diagram including entity forms utilizing diverse stratum names. The noticeable favourable position of the SMS is that it tends to be mechanically developed by every now and again visiting article sorts during the time spent navigating the system pattern. Stated a heterogeneous info net G, initially concentrate its net scheme TG, and after that pick a starting entity form and an objective item form and ruminate the circumstance, where the starting entity form is the equivalent as the objective sort. In the event that the starting entity form isn’t equivalent with objective, despite everything unique practice the development principle of SMS to build a graphic arrangement.
Fig. 4. Construction of the SMS of the toy bibliographic information network.
The development principle of SMS DG of Heterogeneous info net G is spoken to in Fig. 4 and it incorporates the ensuing strategy [1]. The starting entity form is situated in 0th stratum. The article forms on the stratum l ¼ 1; 2; . . .1 are com-presented of the neighbours of article forms in stratum (l − 1) in TG. The contiguous entity forms are connected via a bolt directing as of (l − 1)th stratum despondent to the lth stratum. So, now the wake of gaining the objective article form in stratum l 1, their active connections are removed. Subsequent to getting the SMS the following stage is to discover the similitude between objects. Stratified meta-structure based similitude can be characterized dependent on the transforming milieus of MPs and MSs [11]. The driving network for a
68
G. Gireesan and L. S. Mathew
SMS is for the most part characterized as the precis of the driving grids of MPs and MSs. By utilizing SMS based closeness it is conceivable to anticipate the complex relationship between various sorts of articles. In this way, SMS can be utilized to recognize the closeness between different items in Electronic Health Records. Such a strategy can be connected, all things considered, for making the therapeutic finding process less demanding.
4 Methodology Ailment examination has turned into an imperative procedure in social insurance. Relations among clinical occasions convey different semantics and sponsor oppositely to infection analysis. To talk these issues, this work connotes interpretation of highdimensional EHR information and its rich connections into HIN for extreme restorative finding [12]. This evil presence starting approach allows direct treatment of missing qualities and heterogeneity of information. For the conclusion reason, it abuses SMS to catch larger amount and semantically essential relations identified with sickness determination. The symptomatic procedure incorporates watchful thought of clinical perceptions (side effects and demonstrative tests), mining of applicable data, and more prominently paying thought to its associations [11]. A medical reflection is regularly non-explicit towards a solitary sickness. The aforementioned indicates its connection or co-event through different perceptions that are able to emblematic upon sickness. In addition, the event of numerous sicknesses be able to become a reason inconvenience in perceptions too their associations. These impediments alongside an immense measure of data to be analysed by clinicians settle on their choices powerless to scholarly blunder and as a rule problematic, which can be extremely expensive and now and again deadly. To plan such a framework, acquiring an organized and instructive model of the EHR information is important. EHR can be changed into a HIN and present hub mining systems from various arrangements of information from HER [12]. At that point make utilization of SMS as a comparability measure to catch rich semantics. This enables the model to learn the comparability of clinical occasions and patients. While this analysis forecast model just works indicative information for recognizing the ailment, abusing the treatment data next to the season upon unsubstantiated showing to enhance cultured embedding and discover comparability of scientific occasions as far as result.
5 Conclusion Closeness measure as a fundamental assignment in heterogeneous information networks analysis has been helpful to numerous fields, for instance, item suggestion, bunching, and Web search. Loads of the overarching measurements dependent upon the MP or MS demonstrated by clients ahead of time. These measurements are in this manner fragile to the previously determined MP or MS. The SMS centred comparability SMSS can be utilized among heterogeneous info nets. The SMS as a compound
Stratified Meta Structure Based Similarity Measure
69
graphic arrangement be able to unexpectedly amassed via redundantly navigating the system pattern. This pattern implies that its not at all compulsory for clients to trouble on in what way to take a proper MP or MS. First, portray the transforming milieu of the Stratified Meta-Structure by conjoining all the transforming milieus of the critical or meta-ways, then afterward utilize this relation milieu to prompt the closeness degree SMSS. Stratified meta-structure put together similitude with respect to the entire overwhelms the baselines as far as grouping and positioning. SMSS can be spread in EHR datasets for therapeutic analysis reason. When utilizing HIN for therapeutic finding, it is proficient of getting educational associations for the analysis objective and utilizes the superlative connection inspecting procedure when learning clinical occasion portrayals. It comparatively takes into consideration the simple treatment of missing qualities and learning embeddings customized to the sickness forecast objective utilizing a joint inserting framework. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Zhou, Y., Huang, J., Sun, H., Sun, Y.: Recurrent meta-structure for robust similarity measure in heterogeneous information networks. arXiv:1712.09008v2 [cs.DB], 23 May 2018, @ 2010 Association for Computing Machinery 2. Sun, Y., Han, J., Yan, X., Yu, P.S., Wu, T.: PathSim: meta path-based top-k similarity search in heterogeneous information networks. Proc. VLDB Endow. 4(11), 992–1003 (2011) 3. Zhoua, Y., Huang, J., Lia, H., Sunc, H., Pengd, Y., Xu, Y.: A semantic-rich similarity measure in heterogeneous information networks. Knowl.-Based Syst. 154, 32–42 (2018) 4. Huang, Z., Zheng, Y., Cheng, R., Zhou, Y., Mamoulis, N., Li, X.: Meta structure: computing relevance in large heterogeneous information networks. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1595–1604. ACM, San Francisco (2016) 5. Sun, Y., Barber, R., Gupta, M., Aggarwal, C.C., Ha, J.: Co-author relationship prediction in heterogeneous bibliographic networks. In: 2011 International Conference on Advances in Social Networks Analysis and Mining (2011) 6. Shi, C., Li, Y., Zhang, J., Sun, Y., Yu, P.S.: A survey of heterogeneous information network analysis. IEEE Trans. Knowl. Data Eng. 29(1), 17–37 (2017) 7. Sun, Y., Han, J.: Mining heterogeneous information networks: a structural analysis approach. SIGKDD Explor. 14(2), 20–28 (2012). https://doi.org/10.1145/2481244.2481248 8. Shi, C., Wang, R., Li, Y., Yu, P.S., Wu, B.: Ranking-based clustering on general heterogeneous information networks by network projection. In: Proceedings of the ACM CIKM International Conference on Information and Knowledge Management, pp. 699–708. ACM, Shanghai (2014) 9. Gupta, M., Kumar, P., Bhasker, B.: HeteClass: a meta-path based framework for transductive classification of objects in heterogeneous information networks. Knowl.-Based Syst. 68(1), 106–122 (2017)
70
G. Gireesan and L. S. Mathew
10. Shi, C., Kong, X., Huang, Y., Yu, P.S., Wu, B.: HeteSim: a general framework for relevance measure in heterogeneous networks. IEEE Trans. Knowl. Data Eng. 26(10), 2479–2492 (2014) 11. Jeh, G., Widom, J.: SimRank: a measure of structural-context similarity. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 538–543. ACM, Edmonton (2002) 12. Hosseini, A., Chen, T., Wu, W., Sun, Y., Sarrafzadeh, M.: HeteroMed: heterogeneous information network for medical diagnosis. In: WOODSTOCK 1997, El Paso, Texas, USA, July 1997
An Effective Imputation Model for Vehicle Traffic Data Using Stacked Denoise Autoencoder S. Narmadha(&) and V. Vijayakumar Sri Ramakrishna College of Arts and Science (Autonomous), Coimbatore, India [email protected], [email protected]
Abstract. Modern transportation systems are highly depend on quality and complete source of data for traffic state identification, prediction and forecasting processes. Due to device (sensor, camera, and detector) failures, communication problems, some sources inevitably miss the data, which leads to the degradation of traffic data quality. Data pre processing is an important one for transport related applications. Imputation is the process of finding missing data and make available as complete data. Both Spatial and temporal information has been a high impact on impute the traffic data. In this paper deep learning based stacked denoise autoencoder (one autoencoder at a time) is proposed to impute the traffic data with less computational complexity and high performance. Experimental results demonstrate that autoencoder performs well in random corruption aspect with less complexity. Keywords: Denoise autoencoder autoencoder
Traffic flow Imputation Train one
1 Introduction Smart city has emerged all over the world due to urban growth, better assessment of all facilities, safety and economic growth etc. Transportation systems is a main component in a crowded city which has limited space constraint of road and resources. But every time it is complicated to make a new road or extending the road structure for flexible migration of vehicles and people because of economy and environment [13]. In recent year’s usage of surveillance systems, fast growing technology, and limited cost of storage and computing sources vast amount of data has been collecting in all spatial locations for every time serious. However, still missing traffic data problem exists in many traffic information systems and it is not sensible for further utilization [9]. Many transport information systems from different countries gradually suffers from missing traffic data [6]. For example, PEMS (Performance measurement system) is a California based database system to maintain real time traffic data and incidents collection which has been using by all researchers in the transportation field, has higher than 10% of missing data [16]. The missing ratio [6, 9] of the daily traffic flow in Beijing, china has around 10%.
© Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 71–78, 2020. https://doi.org/10.1007/978-3-030-37218-7_9
72
S. Narmadha and V. Vijayakumar
The consequence of missing data for traffic prediction and estimation can be divided into two ways: (i) data loss of a certain location and particular time periods may be the important to solve the transportation issue. (ii) statistical information loss.
2 Related Work Auto Regressive Integrated Moving Average (ARIMA) model was [12] investigated the application of box Jenkins analysis techniques for freeway traffic prediction and imputation. It was more accurate in terms of representing time serious data. Least square support vector machine technique (LS-SVM) proposed to predict the missing traffic flow data in the arterial road with spatio temporal information [17]. Cokriging methodology [3] is used to impute multisource data (Radar detecter data, Probe vehicle data). Matrix and tensor based methods used [4, 5, 8] for traffic data imputation. Hybrid models were effectively imputed the data. Fuzzy c-means (FCM) [11] applied to estimate the missing values in a multi attribute data set and parameters were optimized by Genetic algorithm (GA). An iterative approach based on the integration and time serious algorithm was proposed [7] to fill small and large missing values. Most of the imputation methods fail for real time and historical data which may contains unavailable and inappropriate neighbouring data. Deep learning is used to extract high and complex representation of data. An imputation approach deep stacked denoise autoencoder [10, 14, 15] proposed based on statistical learning. It imputes simple structured data (single location, single period) to complex structure (single period multiple location, multiple period multiple location etc.). Stacked autoencoders and their variations have been used in various applications of noise removal and prediction. But it takes higher running time for larger datasets. It increased the attention to stacked autoencoder (run one at a time) to reduce the computational complexity and save the running time.
3 Methodology In this work, stacked denoise autoencoder (run one autoencoder at a time) is used to impute the traffic flow data. 3.1
Autoencoder (AE)
Fig. 1. Simple denoise autoencoder
An Effective Imputation Model for Vehicle Traffic Data
73
Autoencoder is a neural network with backpropagation. It works by encoding and decoding as shown in Fig. 1. An encoder [10] takes an corrupted input vector ẋ and map into hidden representation g through a mapping function g ¼ fh ð xÞ ¼ sðwx þ bÞ where ϴ = {w, b}. Algorithm is summarized in Table 1. w is a weight matrix and b is a bias vector. The resulting latent representation g is mapped back to a reconstructed vector y ¼ fh0 ðgÞ ¼ sðw0 g þ b0 Þ The weight matrix w′ and b′ may be the reverse mapping of w, b. The model is optimize the parameters upto the number of iterations to minimize the error between x (input) and y (output). Final ϴ and ϴ′ are the updated parameters of a model. Table 1. Algorithm 1 - Autoencoder
Input: data X={x1…xn}, Iterations I={1,2..n} Initialize weights and bias randomly (w,b) Pass X into hidden layer ; Reconstruct the output from hidden layer through transpose T’;
3.2
Stacked Denoise Autoencoder (SDAE) (One Autoencoder at a Time)
A SDAE is a stack of multiple AE’s trained to reconstruct the clean output y from the noisy or corrupted version of input ẋ. The sequence of process as follows, corrupting the input x into ẋ by means of indiscriminate mapping ẋ * M(ẋ|x) [15]. Corrupted input is passed into the first auto encoder, and then it passed through the successive auto encoders to remove the noise value and learn the internal representation of data from the input layer. All auto encoders trained, copy the weights and biases from each autoencoder and use them to build SAE. One autoencoder at a time in SDAE (Fig. 2) runs faster than training whole SAE [2]. Training algorithm summarized in Table 2. While first phase of training, first AE learns to reconstruct the inputs. Hidden layer 1 is freeze (updation false) when processing phase 2, its output will be same for any training instance. It will avoid the re compute the output of hidden layer 1 at every epoch. Transpose the weights for hidden layer 3 & 4. Stack the weights of all hidden layers from top to bottom at the end of phase. The deep autoencoder greatly reduces the computational complexity and helps to increase the performance.
74
S. Narmadha and V. Vijayakumar
Fig. 2. Stacked denoise autoencoder (OAE)
4 Result Analysis 4.1
Data Description
Traffic flow data is collected from California based PeMS (Performance measurement system) [18] which contains traffic related data of all freeways in the state. PeMS has 39000 individual detectors which spreads all over the metropolitan areas and it collects the data at every 30 s and cumulated into 5 min interval. The data collected from the year 2017 of district 5. Total 243 vehicle detector stations (VDS) in the district. İn 365 days feb 1st has some improper data except that have 364 days data and each day is Table 2. Algorithm 2 - Stacked denoise autoencoder (One Auto Encoder (AE) at a time)
Input: traffic flow data X= {x1, x2…xn} with missing values Ø and noise ¥; Output: Reconstructed data without Ø and ¥ values; Initialization: No of hidden layers H, No of hidden nodes h= {1,2…H}, no of iterations I, epoch= Ω; Step1:Map the input data into noise ; Scale the value and pass into hidden layer1 (AE1) Freeze the weight and bias of first AE1; Extract the hidden layer1 weights(w ε h1),bias(b ε h1) and passed into the hidden layer 2( AE2) Transpose weights and bias to hidden layer 3(AE3) hi= (hi-1,wT)+bT and layer4 (AE4) Oi= (hi-2,wT)+bT Using optimizer to reduce the error rate e; Step 2:Fine tune the model Initialize weights and bias of all pre-trained auto encoders from top to bottom(in reverse order); Perform forward propagation to compute y; Perform backward propagation to compute x-y; Using optimizer to reduce the error rate;
An Effective Imputation Model for Vehicle Traffic Data
75
represented as vector K. Each vector has 288 dimensions L as it is 5 min data. Weekdays, non-weekdays (weekends and holidays), both weekdays and non-weekdays are taken as temporal factors. 115 non-weekdays and 250 weekdays in 2017. Train and test ratio is 80:20 for all experiments. To avoid overfitting early stop method is used. 20% to 40% of missing rate is considered as smoother [1], so here 30% of missing rate is fixed for all experiments. Random Corruption ratio PK PL Oij (RC) = i¼1 KL j¼1 100% {where o = observed value [0 or 1]}. Tesla k80 12 GB GPU RAM machine is used to train and test the model. Sigmoid and elu activation function is used in input and output layer respectively. Have 3 hidden layers with size 144,72,144. Performance measures [15] are, vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi uPk Pl u i¼1 j¼1 Oij ðxij yij Þ2 Root mean square error ðRMSEÞ ¼ t Pk Pl i¼1 j¼1 Oij Pk Pl Mean absolute error ðMAEÞ ¼
j¼1 Oij jxij yijj Pk Pl i¼1 j¼1 Oij
i¼1
Pk Pl Mean Relative error ðMREÞ ¼
i¼1
jxij yij j
Pk Pl i¼1
4.2
oij
j¼1
xij
j¼1 Oij
Results
Spatial and temporal correlation are the crucial for imputation and prediction of traffic data. In this imputation process current VDS (single), current VDS with upstream and downstream (augmented) and all VDS are considered as spatial factors to impute single station. Week days (WK), Non-weekdays (N-WK) and both weekdays and weekdays temporal data are evaluated with random corruption scenario. Single VDS Imputation VDS 500010092 is taken as current station for imputation and analysis of result. Find the missing values of single station based on the same station. Single Location and multiple periods of data are imputed and evaluate the result (Table 3). Both weekdays and non-weekdays gives better result for imputation. Table 3. Comparision based on single VDS Temporal type Week days Non week days Both
RMSE 12.51 23.3 12.3
MAE 10.68 14.6 10.2
MRE 0.45 0.47 0.41
76
S. Narmadha and V. Vijayakumar
Impact of Upstream (US) and Downstream (DS) on single VDS Imputation Both US and DS locations are highly correlated with current station. Current station data is augmented with upstream and/or downstream. From the analysis single VDS with downstream gives better imputation result than others (Table 4). Table 4. Single VDS with upstream (US) and downstream (DS) Temporal type Single VDS Single VDS with US Single VDS with DS Single VDS with US & DS
RMSE 10.41 10.62 9.9 10.3
MAE 9.03 9.56 8.91 9.38
MRE 0.47 0.46 0.46 0.49
Impact of All VDS on Single VDS One autoencoder train well in all stations data and fill the missing values very effectively. In this part result compare with linear stacked autoencoder. Train one autoencoder reconstruct the observed value with minimal error rate and high performance (Table 5). Table 5. All VDS on single station Temporal type
RMSE SDAE Weekdays (WD) 17.1 Non-weekdays 23.3 Both 12.3
MAE MRE SDAE (OAE) SDAE SDAE (OAE) SDAE 14.03 14.3 8.22 0.47 12.01 14.6 10.56 0.48 11.4 10.2 9.93 0.46
SDAE (OAE) 0.47 0.47 0.46
From all the observations both weekdays and non-weekdays gives more accurate value than considering single VDS and also Downstream and upstream. Realization of Processing Time Impact of all VDS running time is evaluated and shown in Fig. 3. SDAE one autoencoder runs faster (since takes only 6 s’s than SDAE 33 s) Autoencoder not only improves the performance, it reduces the processing time in both training and testing. In all timeline SDAE (OAE) process well in both small and large datasets.
Sec's
An Effective Imputation Model for Vehicle Traffic Data
35 30 25 20 15 10 5 0
77
SDAE
SDAE(OAE)
WK
N-WK
Both
Fig. 3. Comparision of running time
5 Conclusion Stacked denoise autoencoder run one at a time outperforms well in large datasets in terms of capturing hidden layer parameters without updating at each iteration. Fine tune the trained model again in top down (reverse order) fashion greatly improve the accuracy of reconstructed value and reduce the processing time. Simple and complex structure of data (both spatial and temporal) are evaluated with random corruption strategy. İn future the model may evaluated based on continuous corruption with various missing rates. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Costa, A.F., Santos, M.S., Soares, J.P.: Missing data ımputation via denoising autoencoders: the untold story. In: IDA 2018, pp. 87–98. Springer (2018) 2. Geron, A.: Hands on Machine Learning with Scikit-Learn and TensorFlow. O’Reilly Media, Sebastopol (2017) 3. Bae, B., Kim, H., Lim, H., Liu, Y., Han, L.D., Freeze, P.B.: Missing data imputation for traffic flow speed using spatiotemporal cokriging. Transp. Res. Part C 88, 124–139 (2018) 4. Ran, B., Tan, H., Feng, J., Liu, Y., Wang, W.: Traffic speed data ımputation method based on tensor completion. Comput. Intell. Neurosci. 2015, 9 pages (2015). Article ID 364089 5. Acar, E., Dunlavy, D.M., Kolda, T.G., Mørup, M.: Scalable tensor factorizations for incomplete data. Chemometr. Intell. Lab. Syst. 106(1), 41–56 (2011). arXiv:1005.2197v1 6. Chang, G., Ge, T.: Comparison of missing data imputation methods for traffic flow. In: Proceedings 2011 International Conference on Transportation, Mechanical, and Electrical Engineering (TMEE), Changchun, China, 16–18 December. IEEE (2011)
78
S. Narmadha and V. Vijayakumar
7. Abdelgawad, H., Abdulazim, T., Abdulhai, B., Hadayeghi, A., Harrett, W.: Data imputation and nested seasonality time series modelling for permanent data collection stations: methodology and application to Ontario. Can. J. Civ. Eng. 42, 287–302 (2015) 8. Yang, H., Yang, J., Han, L.D., Liu, X., Pu, L., Chin, S., Hwang, H.: A Kriging based spatiotemporal approach for traffic volume data imputation. PLoS ONE 13(4), e0195957 (2018) 9. Tan, H., Feng, G., Feng, J., Wang, W., Zhang, Y.J., Li, F.: A tensor-based method for missing traffic data completion. Transp. Res. Part C Emerg. Technol. 28, 15–27 (2013) 10. Liang, J., Liu, R.: Stacked denoise autoencoder and dropout together to prevent over fitting in deep neural network. In: 2015 8th International Congress on Image and Signal Processing (CISP 2015), pp. 697–701. IEEE (2015) 11. Tang, J., Wang, Y., Zhang, S., Wang, H., Liu, F., Yu, S.: On missing traffic data ımputation based on fuzzy C-means method by considering spatial–temporal correlation. Transp. Res. Rec. J. Transp. Res. Board 2528, 86–95 (2015) 12. Ahmed, M.S., Cook, A.R.: Analysis of freeway traffic time-series data by using Box-Jenkins techniques. Transp. Res. Rec. 722, 116 (1979) 13. Shang, Q., Yang, Z., Gao, S., Tan, D.: An imputation method for missing traffic data based on FCM optimized by PSO-SVR. J. Adv. Transp. 2018, 21 pages (2018). Article ID 2935248 14. Duan, Y., Lv, Y., Kang, W., Zhao, Y.: A deep learning based approach for traffic data imputation. In: 17th International IEEE Conference on Intelligent Transportation Systems (ITSC), pp. 912–917. IEEE (2014) 15. Duan, Y., Lv, Y., Liu, Y.L., Wang, F.Y.: An efficient realization of deep learning for traffic data imputation. Transp. Res. Part C 72, 168–181 (2016) 16. Li, Y., Li, Z., Li, L.: Missing traffic data: comparison of imputation methods. IET Intell. Transp. Syst. 8(1), 51–57 (2014) 17. Yang, Z.: Missing traffic flow data prediction using least squares support vector machines in urban arterial streets. In: 2009 IEEE Symposium on Computational Intelligence and Data Mining. IEEE Xplore (2009) 18. http://pems.dot.ca.gov/
Fruit Classification Using Traditional Machine Learning and Deep Learning Approach N. Saranya1(B) , K. Srinivasan2 , S. K. Pravin Kumar3 , V. Rukkumani2 , and R. Ramya2 1 Department of Information Technology, Sri Ramakrishna Engineering College, Coimbatore,
India [email protected] 2 Department of Electronics and Instrumentation Engineering, Sri Ramakrishna Engineering College, Coimbatore, India {hod-eie,rukkumani.v,ramya.r}@srec.ac.in 3 Department of Electronics and Communication Engineering, United Institute of Technology, Coimbatore, India [email protected]
Abstract. Advancement in image processing techniques and automation in industrial sector urge its usage in almost all the fields. Fruit classification and grading with its image still remain a challenging task. Fruit classification can be used to perform the sorting and grading process automatically. A traditional method for fruits classification is manual sorting which is time consuming and involves human presence always. Automated sorting process can be used to implement Smart Fresh Park. In this paper, various methods used for fruit classification have experimented. Different fruits considered for classification are five categories of apple, banana, orange and pomegranate. Results were compared by applying the fruit-360 dataset between typical machine learning and deep learning algorithms. To apply machine learning algorithms, basic features of the fruit like the color (RGB Color space), size, height and width were extracted from its image. Traditional machine learning algorithms KNN and SVM were applied over the extracted features. The result shows that using Convolutional Neural Network (CNN) gives a promising result than traditional machine learning algorithms. Keywords: Fruit classification · Machine learning · CNN
1 Introduction Growing population all over the world increases the need for huge amount of food products. Fruits are sources of many essential nutrients like potassium, dietary fiber, vitamin C, and folic acid. Computer vision based system can be designed to replace the manual sorting process and also the usage of the barcode system. Due to the Big Data era and advancement in Graphics Processing Unit (GPU), deep learning gains more popularity and acceleration. Application of image processing and deep learning in the field of agriculture and food factories helps to implement smart farms using machine vision based system. Deep learning techniques were applied in various agricultural and © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 79–89, 2020. https://doi.org/10.1007/978-3-030-37218-7_10
80
N. Saranya et al.
food production based industries [1]. These techniques lead to the creation of more UAV (Unmanned Aerial Vehicle) designs. In this paper, automatic sorting of fruits is done after capturing the images of the fruits. Features of the fruits were extracted from the image and classification process is done using the classifier model. Classifier model is built using images of the fruit-360 dataset. Sorted fruits are then collected into different buckets and the weight is automatically calculated for billing. This kind of smart fresh park based system automates the process and involves only less human efforts. In this paper, we compare the process involved in building a classifier model using traditional classification algorithm with deep learning techniques.
2 Literature Survey In traditional machine learning approach, image classification can be done after extracting the features of the image. It involves various processing steps like image preprocessing, feature extraction, building a classifier model and validation. Several works related to fruit classification have been proposed. Feed Forward Neural Network along with Fitness Scaled Chaotic Artificial Bee Colony (FFNN-FSCABC) based system has been proposed [2] to recognize multi-class fruit. Image preprocessing techniques used are image segmentation, split and merge procedure for background removal [3]. Features considered for classification process are color, texture and shape. The accuracy obtained through FFNN-FSCAB system was 89.1%. Another classifier system has been proposed based on Multiclass SVM along with Gaussian Radial Basis with 88.2% accuracy [4]. A shape-based fruit recognition and classification system were proposed [8] and the author used morphological operations for image preprocessing [9]. Several shape descriptors have been considered from the parameters area, perimeter, major axis length and minor axis length. Classifier model was built after implementing Naïve Bayes, KNN and Neural Network algorithms. Fruit defect detection system was proposed [10] to detect defects from the sample apple images. YCbCr color space and K-Means clustering were used to distinguish calyx region and the other defected regions, Classifiers like SVM, Multi-Layer perceptron and KNN were implemented. The authors discussed various image preprocessing and feature extraction techniques [11] used in machine vision based systems. Deep learning is an advanced machine learning which aims to achieve high accuracy in image recognition and classification. This can be used to predict the type of fruit by building a convolution neural network. A classifier was build using VGGNET [12] with an accuracy of 92.5% were image saliency is extracted to reduce the interference and noise due to a complex background. In another work [13] for fruit recognition using CNN was constructed with 11 layers such as 3-convolutional, 3-ReLU, 3-pooling and 2-fully connected layer and it gives an accuracy of 94.8%. Control system for fruit classification with CNN [14] has been developed with only 8 different layers such as convolution, max pooling and a fully connected layer with softmax activation function. CNN based system [15] was used in defect identification of mangosteen on its surface area using convolution, sub-sampling, fully connected and softmax layers were used. The accuracy of the system is 97.5 on average. An approach [16] was proposed to detect fungus in fruits with its image were features extracted from CNN is combined with the selective search algorithm. The accuracy of the system was 94.8%.
Fruit Classification Using Traditional Machine Learning
81
3 Dataset In this paper fruit classification is done on sample images of fruit-360 dataset. Initially, four fruit categories considered for classifications are apple, orange, banana and pomegranate. Later eight categories of fruits were considered such as five different varieties of Red apple, orange, banana and pomegranate. Sample images were shown in Fig. 1. Images in fruit-360 dataset are of size 100 × 100.
Red Apple Category 1 Red Apple Category 2 Red Apple Category 3 Red Apple Category 4 Red Apple Category 5
Banana
Orange
Pomegranate
Fig. 1. Sample Images in fruit-360 dataset of size 100 × 100 [3, 5–8]
4 Traditional Machine Learning Approach Methodology Used General methodology used in machine vision based system to classify different fruits involves the following steps given in Fig. 2. Various features of the fruit like color, size, height and width of the fruit extracted from the preprocessed image to classify the fruit. Suitable machine learning algorithm was used to build a classifier model. Test data set is used to check the accuracy of the classifier that is pre-trained.
82
N. Saranya et al.
Training Dataset
Image Acquisition
Image Preprocessing (Morphological Operation)
To build Model
Classifier Model
Performance Evaluation
Feature Extraction
Validation Dataset
To Test the Model
Fig. 2. Classification methodology used in machine vision based system
Preprocessing It is the initial step involved in preparing a dataset for classification process. It aims to eliminate noise, standardize and normalize the values used throughout the data set using various filters. These images were resized to standard size of 200 × 200 by padding the background pixel as image border (Fig. 3).
a) Original Image [3]
b) Resized Image
Fig. 3. (a) Original image of fruit-360 dataset and (b) Resized and padded image
Then the image is converted into gray scale to differentiate fruit from its background (Fig. 4). Gray Sca I mag0.299 ∗ R + 0.587 ∗ G + 0.114 ∗ B
(1)
Grayscale image is applied with ‘otsu’ thresholding. Morphological operations i.e. erosion followed by dilation is applied over the image to smoothen the boundaries of the image after removing noise. Erosion of image A on structuring element B is A B = {z ∈ E|Bz ⊆ A}
(2)
Dilation is the process used to enlarge the boundary of the foreground. Dilation of image A on structuring element B is Ab (3) A⊕B = b∈B
Fruit Classification Using Traditional Machine Learning
83
Fig. 4. Sample grayscale images after preprocessing
Feature Extraction Feature selection is the important phase in fruit classification and grading system. Classification of fruits mainly depends on the key features. At the same time, too many feature inclusion and also consideration of only few features for classification may degrade the system performance. Some of the features have been considered in our work for fruit classification is color, size, height and width of the fruit. Contour of the preprocessed image is drawn for extracting useful features from the image. Color of the image is extracted by calculating average RGB value per row. Size of the image is calculated through the number of pixels covered by the image. Rectangle is bounded over the image based on the contour area. It is used to find the height and width of the image given in Fig. 5.
Fig. 5. Contour detection and bounding rectangle
Result of Traditional Machine Learning Algorithm Part I In this work, images of fruit-360 dataset have been taken. Initially only four categories of different fruits considered are apple, banana, orange and pomegranate. 492 sample images of red apple, 490 banana, 479 orange and 246 pomegranate images were taken for training. Extracted features of these images were applied over KNN and SVM algorithms. KNN is a lazy learner algorithm which can be used when there is no prior knowledge about distribution of data. It is mainly based on feature similarity. SVM is most probably used classifier in high dimension space. 70% of dataset is used as training data and remaining 30% is considered for testing. Accuracy of the classifier model is 93.85% using KNN and 100% using SVM.
84
N. Saranya et al.
Part II Later classifier was build to recognize different varieties of same fruit. Additionally, five different categories of red apples were considered. It consists of 494 Apple1, 492 Apple2, 919 Apple3, 490 Apple4, 1411 Apple5, 490 Banana, 479 Orange and 246 Pomegranate images. Total number of images taken for classification process is 5,021. Classifiers were modeled using KNN and SVM. But the model fails miserably to classify apple fruit of different categorization. Overall accuracy of the classifier using KNN was 48.63% and using SVM was 60.65%. Accuracy of the algorithm has been calculated using the values obtained from confusion matrix. True Positive (TP), True Negative (TN), False Positive (FP) and False Negative (FN) values are used to calculate the performance measures. TP represents the correctly identified prediction. TN represents correctly rejected prediction. FP represents incorrectly identified prediction. FN represents incorrectly rejected predictions. Result of applying KNN and SVM algorithm on the later data set is analyzed using confusion matrix given in Tables 1 and 2. Confusion Matrix contains the result of classification with validation dataset. Table 1. Confusion matrix of KNN Actual class
Predicted class AppleC1 AppleC2 AppleC3 AppleC4 AppleC5 Banana Orange Pomegranate
AppleC1
109
11
20
1
15
0
0
1
AppleC2
12
99
20
0
18
0
4
0
AppleC3
8
19
60
74
112
0
1
0
AppleC4
4
1
89
23
29
0
0
0
AppleC5
18
32
169
37
117
11
14
0
Banana
0
6
0
0
14
122
0
0
Orange
0
3
0
0
5
0
153
0
17
2
0
1
6
0
0
50
Pomegranate
Table 2. Confusion matrix of SVM Actual class
Predicted class AppleC1 AppleC2 AppleC3 AppleC4 AppleC5 Banana Orange Pomegranate
AppleC1
28
0
0
0
129
0
0
0
AppleC2
0
102
0
0
51
0
0
0
AppleC3
2
0
119
34
119
0
0
0
AppleC4
0
0
92
39
15
0
0
0
AppleC5
2
3
116
30
247
0
0
0
Banana
0
0
0
0
0
142
0
0
Orange
0
0
0
0
0
0
161
0
Pomegranate
0
0
0
0
0
0
0
76
Fruit Classification Using Traditional Machine Learning
85
Table 3. Parameters for classifier performance evaluation Fruits
# of samples used
# of training samples
# of test samples
KNN TP
SVM FP
FN TP
FP
AppleC1
494
337
157
109
59
48
AppleC2
492
339
153
99
74
54 102
AppleC3
919
645
274
60 298 214 119 208 155
3
51
490
344
146
AppleC5
1411
1013
398
117 199 281 247 314 151
490
348
142
122
11
Orange
479
318
161
153
19
Pomegranate
246
170
76
50
1
39
4 129
AppleC4 Banana
23 113 123
28
FN
20 142 8 161 26
76
64 107 0
0
0
0
0
0
Accuracy of the algorithm has been calculated using the values obtained from confusion matrix. Since the number of samples is not evenly distributed other performance measures like precision, recall and F1Score were used for comparison. TP, FP and FN values in Table 3 were used to calculate the performance measures. Precision, recall and F1 Score values of the classifiers are given in the Table 4. No of samples predicted correctly Total no of Samples TP Precision = (TP + FP) TP Recall = (TP + FN)
(4)
Accuracy =
(5) (6)
Table 4. Classifier comparison using different performance measures Fruit
KNN
SVM
Precision
Recall
F1 Score Precision
Recall
F1 Score
AppleC1
0.65
0.69
0.67
0.88
0.18
0.30
AppleC2
0.57
0.65
0.61
0.97
0.67
0.79
AppleC3
0.17
0.22
0.19
0.36
0.43
0.40
AppleC4
0.17
0.16
0.16
0.38
0.27
0.31
AppleC5
0.39
0.29
0.33
0.44
0.62
0.52
Banana
0.92
0.86
0.89
1
1
1
Orange
0.89
0.95
0.92
1
1
1
Pomegranate 0.98
0.66
0.79
1
1
1
Avg
0.49
0.49
0.66
0.61
0.60
0.50
86
N. Saranya et al.
F1 Score = 2 ×
Precision × Recall Precision + Recall
(7)
The result shows that models accuracy and F1 Score is very low for both the classifiers. Even though SVM can able to classify banana, orange and pomegranate correctly it fails to predict different varieties of apple fruit. The reason for the more misclassification is due to the factor that there is only slight difference between RGB values of different apple fruits. The performance measure shows that the classifiers using KNN and SVM are good only in classifying fruits with distinguishing features. To improve the accuracy of the model each pixel of the fruit can be considered.
5 Deep Learning Approach A Simple CNN was constructed with two convolution layer and a fully connected layer. Input images are split into training and validation dataset. The image is given as input to CNN as 3D matrix along with class label value. Class label for each image is assigned using one hot encoding technique. Image is given as inputs to convolution layer were it is represented as 3D matrix with a dimension for width, height and depth. Hyper parameter values used in convolution layer is kernel or filter of size 5 × 5 and stride 2. Convolution is performed on the input data using kernel and produces feature map. Different kernels were applied to produce different feature map. ReLU is the activation function used in the each convolution layer to convert the data into nonlinear representation. Fully connected layer is the output layer were the flatten data is given as input to fit the data into any given classes. Softmax function is used as activation function in the fully connected layer. After training for 5 Epochs the model accuracy
Fig. 6. Accuracy curve with over fitting
Fruit Classification Using Traditional Machine Learning
87
Table 5. Model summary Layer (type)
Output shape
Param #
conv2d_16 (Conv2D)
(None, 30, 30, 32) 2432
conv2d_17 (Conv2D)
(None, 13, 13, 16) 12816
flatten_8 (Flatten)
(None, 2704)
0
dense_15 (Dense)
(None, 32)
86560
dropout_8 (Dropout)
(None, 32)
0
dense_16 (Dense)
(None, 8)
264
Total params: 102,072 Trainable params: 102,072
Fig. 7. Accuracy curve after 20 epochs
was 90.35% but the accuracy curve in Fig. 6 shows that there is data over fitting i.e. Model predicts the training data well when compared to validation data. Model was redesigned with dropout layer, which helps to overcome data over fitting. It is used as a network regularization parameter which is added in the fully connected layer. The model summary and accuracy curve after 20 epochs is shown in the Table 5 and Fig. 7 respectively. Accuracy of the CNN classifier with a validation set after 20 Epoch is 96.49%. Figure 7 shows the accuracy curve of the model with validation dataset has less data over fitting. Total time taken to train the network is 1783 s i.e. 30 min with Intel Core i3 processor and 4 GB RAM System. This time taken includes feature extraction, feature selection and also building of classifier.
88
N. Saranya et al.
6 Conclusion This paper presented a fruit classification system using KNN, SVM and CNN for a machine vision based system. Experiment was carried out with five different categories of apple, banana, orange and pomegranate. Classifier model were constructed using traditional machine learning algorithms KNN and SVM. Deep Learning approach was also applied for classification. Experimental results of fruit classification over the validation dataset show that the CNN model outperforms the other two traditional methods. Comparatively, Deep Learning accuracy is quite high than other machine learning algorithms. Classification accuracy of KNN and SVM was 48.63 and 60.65 respectively. Classification accuracy of these algorithms depends on number of training samples, quality of the images and distinguishing feature selection. The accuracy of fruit Classification system using CNN was 96.49% after 20 Epochs of training. This is because each and every pixel of the image is considered for feature extraction. Even though time taken to build the classifier model is high, it eliminates the time spent over feature extraction and feature selection process in traditional machine learning approach. This model can be deployed in a machine vision based system to automate the fresh park. Further, the model can be re-trained to identify defects in the fruit. Acknowledgements. This research work was supported and carried out at the department of Information Technology, Sri Ramakrishna Engineering College, Coimbatore. We would like to thank our Management, Director (Academics), Principal and Head of the Department for supporting us with the infrastructure and learning resource to carry out the research work.
Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Kamilaris, A., Prenafeta-boldú, F.X.: Deep learning in agriculture: a survey. Comput. Electron. Agric. 147, 70–90 (2018) 2. Dimatira, J.B.U., Dadios, E.P., Culibrina, F., et al.: Application of fuzzy logic in recognition of tomato fruit maturity in smart farming. In: IEEE Region 10 Annual International Conference, Proceedings/TENCON, pp. 2031–2035 (2017) 3. Zhang, Y., Wang, S., Ji, G., Phillips, P.: Fruit classification using computer vision and feedforward neural network. J. Food Eng. 143, 167–177 (2014) 4. Zhang, Y., Wu, L.: Classification of fruits using computer vision and a multiclass support vector machine. Sensors 12, 12489–12505 (2012) 5. Srinivasan, K., Porkumaran, K., Sainarayanan, G.: A new approach for human activity analysis through identification of body parts using skin colour segmentation. Int. J. Signal Imaging Syst. Eng. 3(2), 93–104 (2010) 6. Srinivasan, K., Porkumaran, K., Sainarayanan, G.: Background subtraction techniques for human body segmentation in indoor video surveillance. J. Sci. Ind. Res. 73, 342–345 (2014)
Fruit Classification Using Traditional Machine Learning
89
7. Srinivasan, K., Porkumaran, K., Sainarayanan, G.: Enhanced background subtraction techniques for monocular video applications. Int. J. Image Process. Appl. 1, 87–93 (2010) 8. Jana, S., Parekh, R.: Shape-based fruit recognition and classification, pp. 184–196. Springer (2017) 9. Karis, M.S., Hidayat, W., Saad, M., et al.: Fruit sorting based on machine vision technique. J. Telecommun. Electron. Comput. Eng. 8(4), 31–35 (2016) 10. Moallem, P., Serajoddin, A., Pourghassem, H.: Computer vision-based apple grading for golden delicious apples based on surface features. Inf. Process. Agric. 4(1), 33–40 (2017) 11. Mahendran, R., Gc, J., Alagusundaram, K.: Application of computer vision technique on sorting and grading of fruits and vegetables. J. Food Process. Technol. 10, 2157–7110 (2012) 12. Zeng, G.: Fruit and vegetables classification system using image saliency and convolutional neural network. In: IEEE Conference, pp. 613–617 (2017) 13. Hou, L., Wu, Q.: Fruit recognition based on convolution neural network. In: International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery, pp. 18–22 (2016) 14. Khaing, Z.M., Naung, Y., Htut, P.H.: Development of control system for fruit classification based on convolutional neural network. In: IEEE Conference, pp. 1805–1807 (2018) 15. Ma, L.: Deep learning ımplementation using convolutional neural network in mangosteen surface defect detection. In: IEEE International Conference on Control System, Computing and Engineering (ICCSCE 2017), Penang, Malaysia, pp. 24–26 (2017) 16. Tahir, M.W., Zaidi, N.A., Rao, A.A., Blank, R., Vellekoop, M.J., Lang, W.: A fungus spores dataset and a convolutional neural networks based approach for fungus detection. IEEE Trans. Nanobiosci. 17, 281–290 (2018)
Supervised and Unsupervised Learning Applied to Crowdfunding Oscar Iván Torralba Quitian1(&), Jenny Paola Lis-Gutiérrez2,3, and Amelec Viloria4 1
3
Universidad Central, Bogotá, Colombia [email protected] 2 Universidad Nacional de Colombia, Bogotá, Colombia [email protected] Fundación Universitaria Konrad Lorenz, Bogotá, Colombia 4 Universidad de la Costa (CUC), Calle 58 # 55-66, Atlantico Baranquilla, Colombia [email protected]
Abstract. This paper aims to establish the participation behavior of residents in the city of Bogotá between 25 and 44 years of age, to finance or seek funding for entrepreneurial projects through crowdfunding? In order to meet the proposed objective, the focus of this research is quantitative, non-experimental and transactional (2017). Through data collection and data analysis, we seek patterns of behavior of the target population. Two machine learning techniques will be used for the analysis: supervised learning (using the learning algorithm of the decision tree) and unsupervised learning (clustering). Among the main findings are that (i) most of the people who would participate as an entrepreneur and donor and entrepreneur simultaneously belong to stratum 3; (ii) Crowdfunding projects based on donations do not have a high interest on the part of Bogotans, but those in which they aspire to recover the investment. Keywords: Supervised learning Unsupervised learning Bogotá Decision tree Clustering Machine learning
Crowdfunding
1 Introduction Recently the implementation of collaborative economy scenarios has taken effect, in which as financing options are the loans between peers (P2P lending) and the Crowdfunding [1–3]. Peer-to-peer loans allow people who need access to capital to apply directly to another individual online [4]. Crowdfunding, on the other hand, is the process by which capital is raised for an initiative, project or enterprise through numerous and relatively small financial contributions or investments made by individuals through the Internet [5, 6]. Unlike peer loans, crowdfunding has no obligation to return the money collected for the development of projects and initiatives. Its actions are related to the motivations, incentives and consumption patterns that people must make their contribution. For this reason, this alternative financing mechanism could serve people and organizations that © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 90–97, 2020. https://doi.org/10.1007/978-3-030-37218-7_11
Supervised and Unsupervised Learning Applied to Crowdfunding
91
wish to obtain financing and do not comply with the requirements demanded by traditional sources of financing. Also, it should be considered that people who make digital consumption have potential to participate in Crowdfunding, because they use the Internet to make transactions and can access the platforms for this type of funding, to know projects and decide whether they want to participate or not, either as donors or project proponents. In Colombia, this population is between 25 and 44 years old [7]. In this context we propose the following research question: what is the participation behavior of residents in the city of Bogotá between 25 and 44 years of age, to finance or seek funding for entrepreneurial projects through crowdfunding? In order to meet the proposed objective, the focus of this research is quantitative, non-experimental and transactional (2017). Through data collection and data analysis, it is intended to establish patterns of behavior of the target population [2]. Two machine learning techniques will be used for the analysis: supervised learning (using the learning algorithm of the decision tree) and unsupervised learning (clustering).
2 Literature Review According to [8], the collaborative economy refers to the efficient use of goods and services. In this scheme, in order to make use of a good, the consumer does not have to make a purchase of it, but the ownership of the good is temporarily transferred to the consumer for the time he makes use of it. Thanks to the collaborative economy, extra costs are avoided, since the consumer does not make the complete purchase. In this scheme, the use of the good is made to supply a need with the following characteristics: (i) the sale of a good (either first or second hand) is not executed; (ii) between consumers (C2C); (iii) it is not considered as a lease or rent. For their part, [9] defined collaborative economics as a new business model that coordinates exchange between individuals in a flexible way, separates from traditional markets, breaking down industry categories and maximizing the use of limited resources. The Collaborative Economy arose from a number of technological developments, which simplified the action of sharing tangible and intangible goods and services, through the availability of various information systems on the Internet [9], where users can participate by making their contributions or providing ideas and suggestions through Crowdfunding’s web platforms. These platforms aim to increase the number of successful projects [10] by attracting investors and people who want to finance their projects. According to the work done by [11], the platforms are divided into two classes: the first based on investment and the second on donations and retributions. In the latter, investors offer their support in exchange for a product or offer their support for a cause or a combination of the two. [12] stated that collaborative economics refers to businesses seeking financing, platforms and consumers seeking to share resources and services, all within the framework of online applications based on business models. However, one of the forms of collaborative economics is Crowdfunding, which is reviewed below. In the term collaborative economy, according to [3] are the following products and services: crowdfunding, loans between individuals (P2P), file sharing, free software,
92
O. I. T. Quitian et al.
distributed computing, some social media, among others. One of the most employed is Crowdfunding, this term is synonymous with Micromatronage, collaborative financing, collective financing, massive financing or Crowdinvesting [13]. In 1989, Luís Von Fanta and Robe Iniesta, members of the Spanish rock band Extremoduro, raised funds to record their album Rock Transgresivo. Their initiative consisted of selling ballots at 1,000 pesetas, people who contributed received the ballot in exchange for their contribution. The total collection was 250,000 pesetas, with the value collected by this public group, their musical work in 1989, later edited and remastered in 1994 [7]. Later, in 1997, the British group Marillion carried out a process like Extremoduro. Announcing to their fans that they wished to make a tour in the United States, stating that in order to carry out the tour they would have to collect USD $60.000. His followers made the necessary monetary contributions to complete the proposed amount and thus managed to make the planned tour. It should be noted that the initiatives of Extremoduro and Marillion were pioneers, but at the time they were not known as Crowdfunding, because the term was proposed until 2006 by Michael Sullivan founder of Fundavlog, a website that contained a gallery of videos related to projects and events [7]. This site sought donations to support bloggers. Since then this practice has been known as Crowdfunding. [14] identified that three main actors interact in Crowdfunding (i) organizations or people with ideas or entrepreneurial projects; (ii) intermediaries who are responsible for the promotion and publicity of entrepreneurial projects or ideas, in order to collect the resources needed by the speakers of the projects and initiatives to materialize them; (iii) taxpayers or financiers. Crowdfunding is a phenomenon that involves entrepreneurs seeking capital or funds through collective cooperation, which is usually done through web platforms [14]. Entrepreneurs can choose Crowdfunding as a means of financing their projects. In this process, in addition to the entrepreneurs, the following are involved: the Crowdfunding organization or platform that is in charge of promoting the entrepreneurial projects that they wish to materialize, and the people who access these platforms and make their contributions in order to finance the projects proposed by the entrepreneurs [15]. In brief, crowdfunding is a financing process where three main actors interact: (i) the end user who needs to obtain resources to finance his idea or project, (ii) the people who want to finance the initiatives of the end user and (iii) an organization that allows through the promotion and publication of projects to finance the initiatives of the end user. In this way, we identify that Crowdfunding allows to establish a connection between investors and entrepreneurs. Considering that the latter obtain resources from people who wish to contribute to their projects, through specialized platforms).
3 Method 3.1
The Data
Data collection was done through the development and dissemination of a survey and data collection was random. This instrument was constructed based on previous
Supervised and Unsupervised Learning Applied to Crowdfunding
93
research on the specific motivations of Crowdfunding business models. The instrument was conceived to obtain relevant information for the research such as: residence in the city of Bogotá, socioeconomic stratum, age and intention to participate. Subsequently, the analysis and interpretation of the data was prepared in order to determine the motivations for investment and financing by means of crowdfunding. The projection of the population of the city of Bogotá for the year 2017 was 8,080,734, which was composed of 3,912,910 men and 4,167,824 women [16]. 2,505,793 people were within the defined age range, equivalent to 31% of the city’s population. Having defined the size of the city’s population, which was in the established age range, the sample size was calculated with an error rate of 5% and a confidence level of 95%, in order to obtain an optimal sample quality. The required sample was 385 individuals, however, the sample reached was 394, implying an error of 4.94% and a confidence level of 95.28%. 3.2
Instrument
The instrument consists of 10 questions, 8 compulsory questions and 2 with Likert scales. The construction of the instrument was done through Googleforms, to facilitate the dissemination, collection and storage of data. By creating the survey using this application, real-time information on the number of completed surveys can be obtained and the data can be exported to office packages such as Microsoft Excel. This avoids the data entry process manually and allows the diffusion of the instrument through various media such as email, social networks and instant messaging applications such as Whatsapp, Facebook Messenger, Line, Snapchat among others. The instrument was sent to two researchers classified in Colciencias. These experts evaluated and issued suggestions, which were considered for the adjustment and initiation of dissemination of the instrument. For the pilot test, the instrument was sent to a group of 20 people. The initial means of dissemination was by e-mail, from this group initially only 13 responses were received. Subsequently, the Cronbach Alpha coefficient was calculated to determine the reliability of the instrument before it was definitively sent. The initial method used to calculate the Cronbach alpha coefficient was through the variance of response values and was subsequently calculated using the variance of the total score. The value obtained for the group of data collected by the method of variance of the items in the pilot test was ɑ = 0.995176. According to [17] the minimum expected value of the coefficient should be 0.7. Because the value obtained was greater than the threshold, the instrument is reliable. Likewise, the calculation was made using the correlation matrix of the items and the result obtained was ɑ = 0.96386 confirming the reliability of the calculation by the first method, considering that between the results there is only a difference of 0.031316.
94
O. I. T. Quitian et al.
4 Results The decision tree algorithm was used using the Orange software [18]. This algorithm makes it possible to identify a pattern in the responses obtained from the surveys. In other words, the algorithm analyzes the data for patterns and uses them to define the sequence and conditions for the creation of the classification model. Figure 1 presents the result. According to Fig. 1, a first form of grouping corresponds to the Bogotans who would use Crowdfunding if they did not comply with the conditions of the banks and from the investor’s point of view, most of them would participate to recover the investment. Another form of classification is associated with those who do not meet the criteria to obtain resources from non-traditional institutions, but different from the financial sector. In this case, the interest of donors would be related to contributing directly to the project [19].
Fig. 1. Application of the decision tree algorithm to the database
Now, making the classification for each one of the individuals that answered the survey, we can appreciate that for a level of explanation of the grouping at 95% and considering as a variable to explain if one would be interested or not in participating in crowdfunding activities, 3 groups are identified. The blue group corresponds to the grouping of entrepreneurs, while the red and green groups mix those who would not only be donors or who would participate as donors and entrepreneurs simultaneously (Fig. 2).
Fig. 2. Application of the decision tree algorithm to the database
Figure 3 shows that most of the people who would participate as entrepreneurs and donors and entrepreneurs simultaneously belong to stratum 3, although due to dispersion they were not significant (Fig. 4).
Supervised and Unsupervised Learning Applied to Crowdfunding
Fig. 3. Clustering visualization
Fig. 4. Stratum vs participation in crowdfunding
95
96
O. I. T. Quitian et al.
5 Conclusions From the application of supervised and unsupervised learning methods we found that Crowdfunding projects based on donations do not have a high interest on the part of Bogotans, that is to say that this type of projects would not have the necessary support of people living in the city of Bogota to materialize through Crowdfunding. The roles of investor and entrepreneur have greater acceptance by the population of Bogota, acquiring resources through an alternative means, offers the opportunity to access credit to people who cannot meet the requirements demanded by traditional and alternative funding entities. For future work, these results could be comparable with other departmental capital cities or other countries. The instrument, considering that it has already been validated, could be used in other researches. Finally, the data collected could be used to estimate multivariate probit or logit models in order to make the findings more statistically robust. Similarly, other supervised learning algorithms could be used to predict participation in crowdfunding mechanisms in Colombia or other countries. Compliance with Ethical Standards. • All authors declare that there is no conflict of interest • No humans/animals involved in this research work. • We have used our own data.
References 1. Yang, Y., Bi, G., Liu, L.: Profit allocation in investment-based crowdfunding with investors of dynamic entry times. Eur. J. Oper. Res. 280(1), 323–337 (2019) 2. Bagheri, A., Chitsazan, H., Ebrahimi, A.: Crowdfunding motivations: a focus on donors’ perspectives. Technol. Forecast. Soc. Chang. 146, 218–232 (2019) 3. Frenken, K., Schor, J.: Putting the sharing economy into perspective. Environ. Innov. Soc. Trans. 23, 3–10 (2017). https://doi.org/10.1016/j.eist.2017.01.003 4. Zhang, H., Zhao, H., Liu, Q., Xu, T., Chen, E., Huang, X.: Finding potential lenders in P2P lending: a hybrid random walk approach. Inf. Sci. 432, 376–391 (2017) 5. Petruzzelli, A.M., Natalicchio, A., Panniello, U., Roma, P.: Understanding the crowdfunding phenomenon and its implications for sustainability. Technol. Forecast. Soc. Chang. 141, 138–148 (2019) 6. Bento, N., Gianfrate, G., Thoni, M.H.: Crowdfunding for sustainability ventures. J. Clean. Prod. 237(10) (2019) 7. Torralba Quitian, O.: Motivaciones de inversión y financiamiento colaborativo en Bogotá mediante el crowdfunding. Master dissertation, Universidad Central, Bogotá (2017) 8. Wang, W., Mahmood, A., Sismeiro, C., Vulkan, N.: The evolution of equity crowdfunding: insights from co-investments of angels and the crowd. Res. Policy 48(8), 103727 (2019). https://doi.org/10.1016/j.respol.2019.01.003 9. Hamari, J., Sjöklint, M., Ukkonen, A.: The sharing economy: why people participate in collaborative consumption. J. Assoc. Inf. Sci. Technol. 67(9), 2047–2059 (2016) 10. Gera, J., Kaur, H.: A novel framework to improve the performance of crowdfunding platforms. ICT Express 4(2), 55–62 (2018)
Supervised and Unsupervised Learning Applied to Crowdfunding
97
11. Belleflamme, P., Lambert, T., Schwienbacher, A.: Crowdfunding: tapping the right crowd. Core 32(2011), 1–37 (2011) 12. Son, S.: Financial innovation - crowdfunding: friend or foe? Procedia Soc. Behav. Sci. 195, 353–362 (2015) 13. Ordanini, A., Miceli, L., Pizzetti, M., Parasuraman, A.: Crowd-funding: transforming customers into investors through innovative service platforms. J. Serv. Manage. 22(4), 443– 470 (2011) 14. Martínez, R., Palomino, P., Del Pozo, R.: Crowdfunding and social networks in the music industry: implications. Int. Bus. Econ. Res. J. 11(13), 1471–1477 (2012) 15. Wang, X., Wang, L.: What makes charitable crowdfunding projects successful: a research based on data mining and social capital theory. In: International Conference on Parallel and Distributed Computing: Applications and Technologies, pp. 250–260. Springer, Singapore, August 2018 16. DANE: Proyecciones de Población [database]. DANE, Bogotá (2019) 17. Oviedo, C., Campo, A.: Aproximación al uso del coeficiente alfa de Cronbach. Revista Colombianan de Psiquiatría 34(4), 572–580 (2005) 18. Demsar, J., Curk, T., Erjavec, A., Gorup, C., Hocevar, T., Milutinovic, M., Mozina, M., Polajnar, M., Toplak, M., Staric, A., Stajdohar, M., Umek, L., Zagar, L., Zbontar, J., Zitnik, M., Zupan, B.: Orange: data mining toolbox in Python. J. Mach. Learn. Res. 14, 2349–2353 (2013) 19. Lis-Gutiérrez, J.P., Reyna-Niño, H.E., Gaitán-Angulo, M., Viloria, A., Abril, J.E.S.: Hierarchical ascending classification: an application to contraband apprehensions in Colombia (2015–2016). In: International Conference on Data Mining and Big Data, pp. 168–178. Springer, Cham, June 2018
Contextual Multi-scale Region Convolutional 3D Network for Anomalous Activity Detection in Videos M. Santhi(&) and Leya Elizabeth Sunny Department of Computer Science and Engineering, Mar Athanasius College of Engineering, Kothamangalam, Kerala, India [email protected], [email protected]
Abstract. The importance of the surveillance camera has been greatly increased as a result of increase in the crime rates. The monitoring of the videos recorded is once again challenging. Thus, there is an incredible interest for intelligent surveillance system for security purposes. The proposed work aims at a anomalous activity detection in video using the contextual multi-scale region convolutional 3D network (CMSRC3D) for. The work is deal with different time scale ranges of activity instances; the temporal feature pyramid is used to represent normal and abnormal activities of different temporal scales. In the temporal pyramid are 3 levels, that is short, medium and long for video processing. For each level of the temporal feature pyramid, an activity proposal detector and an activity classifier are learned to detect the normal and abnormal activities of specific temporal scales. Most importantly, the entire system model at all levels can be trained endwise. The activities detected are classified as the normal and the abnormal using the activity class. In the project CMSRC3D, each scale model uses convolutional feature maps of specific temporal resolution to represent activities within specific temporal scale ranges. More importantly, the whole detector can identify anomalities with in all sequential ranges in a single shot, which makes it very efficient for computation. And also provide if any anomalous activities detected in video, we give alert to avoid future mishappening. Keywords: CMSRC3D
TFP APN ACN
1 Introduction Over a last few years it has been seen that fast development as well as amazing improvement in real-time video analysis. The aim of video analytics is to find the potential threaten activities with fewer or no human interference. Because, as today crime rate is increased. So the surveillance cameras are placed in malls, educational institute, railway stations, airports etc., for task is to detect the anomalous activities that can affect normal human being’s life. And also another important task is to detect the activities in untrimmed video is a challenging problem, since instances of an activity can happen at arbitrary times in arbitrarily long videos. There are several techniques are established to find a solution for an issue and much progress have been made it. In any © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 98–108, 2020. https://doi.org/10.1007/978-3-030-37218-7_12
Contextual Multi-scale Region Convolutional 3D Network
99
case, how to identify temporal activity boundaries in untrimmed videos precisely is as yet an open inquiry. Motivated by the boundless achievement of R-CNN and Faster RCNN, which is used for object detection, most standing state-of-the-art activity detection methods deal with problem as detection is done by classification. In most cases, a subset of proposals is generated via sliding window or proposal method and the classifier is used to classify these temporal segments as a specific activity category. Most approaches suffer from the following major drawbacks. (1) When identifying activity instances of numerous temporal scales, they depend on either handmade feature maps, or deep convolutional feature maps like VGG [2], ResNet [3] or C3D [4] with fixed resolution. When use fixed resolution to represent the activity in video frames. It provides an irregularity between the inherent variability in temporal length of activity ranges and the fixed temporal resolution of features, thus, leading lower performance. Because activity of instances in same category, it has a different duration. So if we use a fixed resolution is leading performance degradation of activity detection. So it can clear that it will increase inference time. (2) And also fixed resolution of the convolutional feature maps, contextual information, which has been demonstrated to be very effective in describing activities and boosting activity classification performance is not fully exploited. In our work, we proposed a CMSC3D [1] for anomalous activity detection. This method is helps to detect the both normal and abnormal activities in videos. To deal with normal and abnormal activities of various temporal scales, we create multi-scale temporal feature pyramid to represent the different duration of video frames. One model is trained on each level of the temporal feature pyramid to detect normal and abnormal activity instances within specific duration ranges in video. Each model has its own convolutional feature maps of different temporal resolutions, which can handle activities of different temporal scales better than using a single scale approach. The key role of the project: an innovative CMSC3D architecture for anomalous activity as well as normal activity detection is proposed, where each scale model is trained to handle activities different scale ranges in video. More importantly the architecture uses a single pass network. Another contribution of this project Contextual information is fused to recognize activities, which can improve detection performance and also if we detecting anomalous activity in video then give alert to avoid mishappening in future. The major three main contributions of this work: (1) A new multi-scale region convolutional 3D network architecture for anomaly activity detection is proposed, where each scale model is trained to handle activities of a specific temporal scale range. More importantly, the detector can detect activities in videos with a variety temporal scales in a single pass. (2) Contextual information is fused to identify activities, which can increase detection performance. (3) The proposed CMS-RC3D detector achieves state-of-the-art results on our own datasets. The overview of this project is to detect the normal and abnormal activities using contextual multi-scale region convolutional 3D network detection. First of all, we extract the frames in the videos. After that we give video frames as the input and then perform feature extraction using optical flow algorithm. Depends upon the feature extraction, we created a three levels of pyramid. Then we detect the activity proposal for the pyramid. Finally, we classify activity using deep convolutional neural network, first we give a testing video to the network. The outputs the feature map and region of
100
M. Santhi and L. E. Sunny
interests coordinates. Then the output fed in to the RoI pooling, after that it outputs the fixed size feature map for each map. Then we classify normal and abnormal activity is done by feature map of testing video is tested with trained model.
2 Related Work In this area, an ongoing field of work in the anomalous activity detection and normal activity detection are discussed. There are variety of architecture developed for anomalous activity detection. The diverse researchers have utilized distinctive systems to identify anomalous activity in videos. Buch et al. [5] have approach for time-based recognition of human activities in since a long time ago untrimmed video classifications. It presents Single-Stream Temporal Action Proposals (SST), another successful and effective deep architecture aimed at the creation of temporal action proposals. This system is being able to run constantly in the solitary stream in excess of very large input video sequence, it doesn’t want to split input into short overlapping clips or temporal windows for cluster processing. The technical approach of this strategy is to produce temporal action proposals in long uncutted videos. The specialized methodology of this technique are: (i) the design examine the video input precisely one time at numerous timescales without overlapping sliding windows, while it provides an outcome in quick runtime during inference; (ii) it considers and assesses a generous amount of action proposals over densely tested timescales as well as locations, which outcomes in the system creating proposals with high temporal cover through the ground truth activity intervals. Input: The model taking input as long untrimmed video sequence of frames. Visual Encoding: The objective of a module is to calculate a feature representation that summarizes the pictorial content of the input video. Sequence Encoding: The aim of a module is to gather proof across time as the video sequence developments. The thought is that so as to probably deliver great recommendations, the model should most likely give aggregate data until it is certain that a move is making place in the video, at the same time disregarding immaterial foundation of the video. Output: The aim of this module is to deliver confidence scores of numerous proposals at each time step. This module takes as input the hidden representation is determined by the Sequence Encoding layer at period. The disadvantage of this paper is doesn’t provide consistent image of activity detection. Girshick et al. [6] approaches the technique for recognizing objects and localizing them in pictures. It is a standout amongst the most key and testing issues in computer vision. There has been important advancement on this issue in the course of the most recent decade due to great extent to the utilization of low-level image features, for example SIFT and HOG, in refined machine learning works. SIFT and HOG stand in semi-local orientation histograms, a demonstration it might relate generally through difficult cells, the first cortical region in the primate visual path. In any case, it additionally realizes that recognition happen numerous steps downstream, which recommends that there might be hierarchical, multi-stage processes for predicting features that are still more useful for visual recognition. In this paper, it describes an object detection and segmentation system that utilizes multi-layer convolutional networks to
Contextual Multi-scale Region Convolutional 3D Network
101
process exceedingly discriminative, yet invariant, features. It utilizes these features to categorize an image region, which would be able to output as detected bounding boxes or pixel-level segmentation masks. Dissimilar to classification of image, detection strategy requires limiting (likely various) objects within an image. Unique methodology of the frame detection as a regression issue. This design can be work fit for localizing a single object, but detecting many objects requires complex workarounds or an ad hoc assumption about the number of objects per image. The disadvantage of paper is the method only detect the action in the single image. That is takes image is an input. Biswas and Babu [7] approaches the technique by using cues from the motion vectors in H.264/AVC compressed domain area. The effort is primarily inspired via an observation that motion vectors (MVs) show distinctive qualities amid anomaly. Seen that H.264 motion vector magnitude holds related information, which could be used to model the usual behavior (UB) efficiently. Like somewhat video compression standard, greater part of compression is accomplished over an eliminating temporal redundancy between neighboring frames, by using Motion Vectors (MVs). MVs contains the move of large scale obstructs among hopeful and reference outlines. Just the movement repaid mistakes alongside MVs are coded, in this way lessening the measure of bits to be coded. In unrefined sense, MVs is viewed as a rough guess of optical flow. Half pixel and quarter pixel motion expectation are in this manner used to enhance precision of square movement, resultant in almost optical stream similar attributes. MVs are frequently utilized for contras tent video examination errands together with video object division, activity acknowledgment, and so forth. In this work, the MVs for identifying irregular occasion. Anomaly is characterized by the parting from normal features. At that point, abnormal is characterized to be the occasion whose likelihood of event is not exactly a specific limit. The issue is diminished to separating related features, it can distinguish the behaviors between the normal and the abnormal.
3 Proposed Work The proposed work, we use multi-scale region for detecting abnormal activities in videos. The system is also detected normal activities, and it provides good image about the activities. All activities are detected in a single shot and it will also detect what activity has more happened in the video frames. If we change architecture by adding more layers, it will produce an efficient architecture. In this project every process is done in a single pass network. The proposed framework is consisting of five phase. That are, gathering of the inputs, extracting of features, forming a pyramid network utilizing the temporal feature, proposes the activity and classifies the activity. 3.1
Network Input
The input of the network is an original video with arbitrary length, which is only restricted by GPU memory. The video is consisting of set of frames. We use a loop to
102
M. Santhi and L. E. Sunny
extract key frames from the video frames. It contains categories of activities of normal and abnormal. When we predict all the categories of activities in video. 3.2
Feature Extraction Network
For the extraction of feature, shared feature extractor is used which is extracts spatiotemporal features from input videos, it can be any kind of typical architectures. In our project, we used a deep CNN for feature extraction in the training of videos. And also to track an object in the video frame using optical flow algorithm. Optical flow algorithm, which can be used to characterizes an exact motion of each pixel among two succeeding frames of videos [11]. Using this algorithm, can be used in acquiring the videos motion and the velocity. The optical flow is determined among the two consecutive frames S1 and S2, which is denoted by v = (u, v) where u and v are horizontal and vertical components of optical flow. Convolutional neural network: the images fed as input are taken in one direction through the CNN to the output. The CNN [12] is comprised of three layers of convolutional, followed by pooling and fully connected layer. The first layer the convolution behaves as a feature extractor learning the complete features of the input. Every neurons in a feature map, constitute weights that are forced to be equal; however, there are cases where the different features map could have different weights even when present in the same convolution layer. So that a numerous of features could be extracted from each image. 3.3
Feature Pyramid Network
In unconstrained settings, activities in videos have varying temporal scales. To address this fact, many previous works (e.g. [10]) create different temporal scales of the input and perform several forward passes during inference. Although this strategy makes it possible to detect activities with different temporal scales, it inevitably increases the inference time. So that we create a temporal feature pyramid (TFP) with different temporal strides to detect normal and abnormal activities of different temporal scales. More specifically, a three-level pyramid is created, which is designed to detect short, medium and long duration of normal and abnormal activities simultaneously. Use of this feature pyramid, the system can efficiently detect normal and abnormal activities of all temporal scale range in single pass. There are three levels of feature pyramid is represented as K = 3. In the existing method [10], to create the temporal feature pyramid, down sampling is applied on the convolutional map to produce two extra level feature. The down sampling is the reduce the size of video frames. So it can be divide the video frame into two or more levels. For the simplicity of the implementation, we use max pooling and 3-D convolution with stride 2 are used for down-sampling. The strides of these levels are considered as 8, 16, 32 respectively. The anchor setting of 8 stride is {1:7} and stride 16 is {4:10}, they contain {8–56} and {64–160} temporal scale ranges. Similarly stride 32 is {6:16} anchor setting and it has {192–516} temporal scale ranges. As compared to [10], each scale model in our architecture only needs to predict activities whose temporal scales lie in a limited range, rather than a single wide range. We assign anchors of specific
Contextual Multi-scale Region Convolutional 3D Network
103
temporal scales to each level according to training statistics. With this temporal feature pyramid, our detector can efficiently detect activities of all temporal scales in a single forward pass. One of use of this method is, for example if we take running video of 10 s, it may be wrongly detected as walking, but we take 30 s or 50 s of that video, we clearly detected that is running. For this reason, we created temporal feature pyramid for activities representations. Using this temporal feature pyramid can save a substantial amount of computational cost compared to generating multi-scale video segments, which have to pass through the backbone network several times. 3.4
Activity Proposal Network (APN)
To find candidate activity segments for the final classification network. To detect activity proposals of different scales more efficient, the single temporal scale feature map is replaced by the multi-scale TFP. We learn an APN on each level of the TFP, a further 3D convolutional filters with kernel size 3 3 3, which is operates in each level of the TFP to generate feature maps for predicting temporal proposals. In order to detect anomalous and normal activity proposal of different durations, the anchors of multiple scales are pre-defined, in which an anchor denotes a candidate time window associated with a scale and works as a reference segment for activity proposals at each temporal location. As compared to [9], each scale model in our architecture only needs to predict activities whose temporal scales lie in a limited range, rather than a single wide range. We assign anchors of specific temporal scales to each level according to training statistics. As mentioned previously, there are K = 3 levels of the feature pyramid. The strides of these levels are {8, 16, 32}, respectively. We use {7, 7, 11} anchors of different temporal scales for the corresponding levels of the temporal feature pyramid. And the temporal scale ranges of each level are 8–56, 64–160, 192−512 frames, respectively. Therefore, our proposal detector can deal with a very wide range of temporal scales in an input video in a single shot. For identify the activities in the video frames, we assign a positive or negative labels to the anchor setting. Already anchor settings are defined in feature pyramid section. We choose a positive label, if the anchor segment is overlap with intersection over union (IoU) higher than 0.7 or it has highest IoU overlap with some ground truth activity. If anchor setting has overlap lower than 0.3 with all ground truth activities. Then we give negative label to the segment. 3.5
Activity Classification Network (ACN)
Activity Classification Network (ACN): ACN is a segment based activity detector. Due to the arbitrary length of activity proposals, 3D Region of Interest (3D-RoI) pooling can utilized to extracted the fixed-size features for selected proposal segments. Then, pooled features are fed into fc6 and fc7 layers in the C3D network, trailed by two sibling networks, which output activity categories and refine activity limits by classification and regression. If we are finding an abnormal activity, then give alarm generation will have created to avoid future mishappening [8]. In the existing method [9], ACN is only done on single scale feature map. In our project work, we use temporal feature pyramid, assign activity proposals of different
104
M. Santhi and L. E. Sunny
scale range to the levels of the TFP. So we perform 3D-RoI pooling in activity classification network. 3D-RoI pooling is a neural network layer, which is used for object detection. It can achieve a significant speedup of both training and testing. It also maintains a high detection accuracy. The 3D-RoI has region proposal and final classification. In the region proposal, if we give the input image frame, it finds all possible places where objects can be located. The output of this stage should be list of bounding boxes is called region of interest. In the final classification; for every region proposal from the previous stage, decides whether its belongs to target classes or background. Here we use deep convolution network for final classification. This layer takes two inputs: (1) fixed size feature map is obtained from deep convolutional network with several convolutions and maximum pooling layers. (2) Matrix of representing of list of region of interest. The result of 3D-RoI is list of rectangles with different sizes. We can quickly get a list of corresponding feature maps with fixed size. Then this features are fed in to C3D network. Then it outputs the predicted activities and then it is classified as normal and abnormal classes.
Fig. 1. System architecture of CMSRC3D model [4]
In the above Fig. 1(A) Input videos can contain set of activities of different temporal scale ranges, we use loop for extracting the frames in video, then it is given as input to network. In the above figure, input video represented as (3, L, H, W), here Lnumber of video frames, H- height of video frame and W- width of video frame. (B) The shared feature extractor is used to extracts the spatio-temporal features from input videos, which can be any kind of typical architectures; in this project we use deep convolutional neural architecture. For the feature extraction in video frames, we use optical flow algorithm to find apparent motion of objects in the video frames. Then we generate feature map using optical feature. The feature map is the mapping of features in the different location of the frame. (C) To deal with normal and abnormal activities of different temporal scales, the temporal feature pyramid is created, on which each
Contextual Multi-scale Region Convolutional 3D Network
105
level of a different temporal resolutions is used to represent activities of short, medium and long temporal scales simultaneously. The strides of feature pyramid are 8, 16 and 32, then anchors also assigned to the pyramid. (D) On each level of the temporal feature pyramid, an activity proposal detector is learned to detect candidate activity segments of frames within specific temporal scale ranges. In this we assign a positive and negative labels to the anchor setting for the detecting activity proposal. (E) Finally activity classification network is uses 3D-RoI pooling, which can extract contextual and non-contextual features for every selected activity proposal from each level of the temporal feature pyramid. At that point, the contextual and non-contextual features are concatenated and fed into the kth specific activity classifier, which outputs activity categories and refines activity segment boundaries.
4 Implementation 4.1
Data Sets
We create a data set by downloading videos from YouTube and we take some videos from ActivityNet dataset to detect the normal activities. Then we create a data set for abnormal activities, the videos are download from YouTube and also UCSD. The dataset consists of different activity video of normal and abnormal. The normal activities like walking, jumping and handclapping etc., and abnormal activities like attacking person, theft etc. The dataset is divided and used for training, validation and testing phase separately. 4.2
Preprocessing
The preprocessing is an important step in process of video. Which is used to convert all videos in one format with use of normalization. In the project, first we perform a frames extraction from the video and normalized the video frames. With use of the normalization is change all the video types into a common format. In this project we use loop to extracts the key frames from videos. 4.3
Training
When training the CMS-RC3D detector, each mini-batch is created from one video, selected uniformly at random from the training dataset. For each frame in a video, which is resized to 172 128 (width height) pixels, and we randomly crop regions of 112 112 from each frame of video. To fit each batch into GPU memory, we generate a video buffer of 768 frames. The buffer is generated by sliding from the beginning of the video to the end and from the end of the video to the beginning to rise the amount of training data, which can be seen as a data augmentation strategy. We also employ data augmentation by horizontally flipping all frames in a batch with a probability of 0.5. The training process is depending upon the architecture and the amount of data. If the amount of data is huge then the data to be extracted will be more then automatically
106
M. Santhi and L. E. Sunny
the time will be increased. In this project we use ActivityNet and UCSD data sets for predicting the normal and abnormal activities in video. First we preprocess the video and extract video frames. In the feature extraction phase, we loading the video from the dataset. Then read frame by frame of a video. Convert video frame in to gray image for optical flow calculation. In this phase dilation and filing holes are used. So we create a new graphical image, which is filled with 0 (black) and append computed values to original image. Then we create back image and calculated the optical flow values for video frames. That is, first we create a black frame which is same as that of the current video frame. Then we find frame transpose and convert the image into gray to BGR with points and arrows inside the original frame. As the object moves the arrows will also move to the same direction. That points in which the changes occurred is optical flows of video. The optical flow calculation based on the two video frame, that is compare with current and previous image frame. Then apply some threshold value to find the object in the video frame. Find the region of motion in video frames. Then temporal feature pyramid is created based on the feature of video frames. For example, if we take 30 s of the video, we couldn’t predict the activity. Sometimes is lead to wrong prediction. So TFP is used to represent different scale range of video and is useful for predicting activity. Three levels of pyramid are created for activity prediction of normal and abnormal activities. We assign anchors of specific temporal scales to each level according to training statistics. With this temporal feature pyramid, our detector can efficiently detect activities of all temporal scales in a single forward pass. In the implementation, for simplicity, max pooling and 3-D convolution with stride 2 are used for down-sampling. In order to detect activity proposals of different durations, anchors of multiple temporal scales are pre-defined, in which an anchor denotes a candidate time window associated with a scale and serves as a reference segment for activity proposals at each temporal location. Once we got optical flow values and temporal feature pyramid is created, the optical flow values of the video are stored in the one list. The labels in the another list. The labels are 0, 1, 2 etc., with activity name labels are assigned in the activity proposal network. Then extract the features from the video using optical flow algorithm, it can give a features as optical value of video. After that we create temporal feature pyramid for the extracted features in different levels. TFP is used to represent different scale range of video and is useful for predicting activity. In the activity proposal detector, we create a proposal for different feature pyramid. Once we have feature values, then perform the mapping between the data and label. The mapping is a manual process in which the data is stored in a list and the corresponding label will be stored in another list. We access then using the index values. Then we create a trained model. 4.4
Testing
In the testing phase, testing videos with trained model. We give an input video and the system will extract frame wise. Then the trained model will classify the frame into normal or abnormal type (or type of activity).
Contextual Multi-scale Region Convolutional 3D Network
107
5 Conclusion In this project, we proposed a contextual multi-scale region convolutional C3D network for anomaly activity detection in video. In our CMS-RC3D, each scale model uses convolutional feature maps of specific temporal resolution to represent activities within specific temporal scale ranges. More importantly whole detector can detect activities within all temporal ranges in a single shot, which makes it computationally efficient. It uses a convolutional neural networks to extract the features. It detects multi scale temporal activities are detected and its uses single pass network. This method also Detect both normal and abnormal activities, finally give an alert to avoid mishappening when abnormal activity is detected. Compliance with Ethical Standards. • All authors declare that there is no conflict of interest • No humans/animals involved in this research work. • We have used our own data.
References 1. Bai, Y., Xu, H., Saenko, K., Ghanem, B.: Contextual multi-scale region convolutional 3D network for activity detection. arXiv:1801.09184v1 [cs.CV], 28 January 2018 2. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) 3. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770– 778 (2016) 4. Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015) 5. Buch, S., Escorcia, V., Shen, C., Ghanem, B., Niebles, J.C.: SST: single-stream temporal action proposals. In: CVPR (2017) 6. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Region based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 142–158 (2016) 7. Biswas, S., Babu, R.V.: Real time anomaly detection in H.264 compressed videos. In: Proceedings of the 4th National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG), pp. 1–4, December 2013 8. Chaudharya, S., Khana, M.A., Bhatnagara, C.: Multiple anomalous activity detection in videos. In: 6th International Conference on Smart Computing and Communications, ICSCC 2017, Kurukshetra, India, 7–8 December 2017 9. Xu, H., Das, A., Saenko, K.: R-C3D: region convolutional 3D network for temporal activity detection. CoRR, abs/1703.07814 (2017) 10. Shou, Z., Wang, D., Chang, S.-F.: Temporal action localization in untrimmed videos via multi-stage CNNs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1049–1058 (2016)
108
M. Santhi and L. E. Sunny
11. Parvathy, R., Thilakan, S., Joy, M., Sameera, K.M.: Anomaly detection using motion patterns computed from optical flow. In: 2013 Third International Conference on Advances in Computing and Communications (2013) 12. Kurková, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I. (eds.): Artificial Neural Networks and Machine Learning ICANN 2018. Springer, Cham (2018)
Noise Removal in Breast Cancer Using Hybrid De-noising Filter for Mammogram Images D. Devakumari1(&) and V. Punithavathi2 1
Department of Computer Science, Government Arts College, Coimbatore, Tamil Nadu, India [email protected] 2 Government Arts College, Coimbatore, Tamil Nadu, India [email protected]
Abstract. Breast Cancer is a one of the major disease for women in today’s world. The aim of this paper is to develop a robust and image pre-processing methods to realize mammogram images features with different dimensions. To categorize an image is to be illustrated or signified by particular features. In this paper, proposed a mammogram image pre-processing extraction process for deriving the image classification. The proposed method present two phases namely, (1) Image Re-sizing: The original MIAS mammogram image database images are resized into predefined sizes; (2) Image pre-processing. The Hybrid De-noising Filter is obtained by Median Filter and Applied Median Filter Integration. It is established out using Hybrid Denoising Filter (HDF) to discard the noise; Different image de-noising algorithms are discussed in literature review. The proposed Hybrid Denoising Filter algorithm performs an important function in image feature selection, segmentation, classification, and investigation. According to the experimental results the Hybrid Denoising Filter algorithm mainly focuses on discarding irrelevant noise and focuses on the execution time using MATLAB R2013a Tool. The proposed Hybrid Denoising Filter has less Root Mean Square Error (RMSE) variation and High Peak Signal Noise Ratio (PSNR) when compared with other de-noising algorithms of Gaussian, Wiener, Median and Applied Median Filters. Keywords: Breast Cancer Image pre-processing Hybrid De-noising Filter MIAS
1 Introduction Image processing is a technique to carry out several operations on an image, in order to obtain an enhanced image or to mine some precious information from it. It is a type of signal processing in which input (contribution) is an image and result could be image or characteristics or attributes linked with that image. Digital Image Processing (DIP) is an ever budding area with a variety of applications. It forms core research area within engineering and computer science regulations too. Digital image processing deals with developing a digital system that perform operations on digital image. A cancer is a type of disease having source in the abnormal growth of the cells. Breast cancer known as breast disease having staring point is breast tissues and it can © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 109–119, 2020. https://doi.org/10.1007/978-3-030-37218-7_13
110
D. Devakumari and V. Punithavathi
extend over a large area of the ducts or lobes of their breast. When limiting of the breast cells are not proper, they divide and appear in the form of lumps or tumor that is called breast tumor or cancer. Breast cancer is the one of the second significant common disease which leads to death. Mammography is the screening program for exposing and identification of breast cancer. This process is utilized for the diagnosis of breast cancer. Mammogram is an Xray exam which is utilized to expose and identify the advance breast cancer. A mammogram can expose the breast cancer when cancer is very small to notice it. The mammogram having good standard and it can expose 80–90% breast cancer and there is no screening tool which is 100% fruitful. The most efficient method for identifying breast tumors is the mammography. The small contrast of the tiny tumors to the background, which is occasionally close to the noise, creates that small breast cancer lesions can hardly be distinguished in the mammography [1]. In this perception, an image preprocessing to decrease the noise stage (level) of the image preserving the mammography configurations is a significant item to recover the detection of mammographic attributes or features. De-noising techniques have been based on relevant linear filters as the Wiener filter to the image, but linear methods tends to blur the edge formation of the image. Based on nonlinear filters several De-noising techniques have been established to avoid this problem [2–4]. In this proposed work, a novel Hybrid De-noising Filter (HDF) algorithm for mammographic images is offered. The proposed filter is compared with various image de-noising filters like Wiener, Gaussian, Median, Applied Median filters of the image. The objective of this work is to develop a novel image de-noising algorithm for mammogram images which are efficient to remove the irrelevant noises. De-noising process of MIAS image database (Mammogram images) is presented using Point Spread Function (PSF) which gives a good De-noising result with high Peak Signal Noise Ratio (PSNR) ratio.
2 Related Work Ponraj et al. [5] has reviewed the previous approaches of preprocessing in mammographic images. The objective of preprocessing is to develop the excellence of the image and make it prepared for further processing by discarding the irrelevant noise and unnecessary portions (parts) in the background of the mammogram. There are different techniques of preprocessing (Adaptive Median Filter, Denoising Using filters, Mean Filter, Adaptive Mean Filter, Histogram Equalization and so on.,) a mammogram image advantages and disadvantages are discussed. Angayarkanni and Kumar [6] has discussed breast cancer is a serious and become frequent disease that changes thousands of women every year. Premature detection is necessary and dangerous for valuable treatment and patient recovery. The authors presented a clear idea of Extracting features (attributes) from the mammogram image to discover cancer involved region is an important step in breast cancer detection and verification. Several segmentation algorithms were used to segment the mammogram image. Except with the assist of segmentation procedure they could not obtain the clear idea where will the cancer cell with elevated is small density mass.
Noise Removal in Breast Cancer Using HDF for Mammogram Images
111
Monica Jenefer B and Cyrilraj V has designed that “An Efficient Image Processing Methods for Mammogram Breast Cancer Detection method presented a mammogram enhancement can be attained by eliminating the noise and develop the quality of the image using speckle noise elimination and Expectation Maximization (EM) algorithm correspondingly. Unchallengeable standard for Magnetic Resonance Image (MRI) is the attendance of speckle noise is arbitrary and deterministic in an image. Speckle has negative collision on ultrasoundimaging, Radicaldiminishment interestingly resolve may be in charge of the poor flourishing determination of ultrasound as contrasted with MRI. If there should arise an event of medicinal written works, speckle noise is otherwise called texture [7]”. Angayarkanni, Kumar and Arunachalam [8] has discussed a clear idea of extracting attributes from the mammogram image to discover cancer affected region which is a crucial step in breast cancer detection and verification. Combinations of algorithms were used to discover the cancer cell area in mammogram image. The mammogram images are extracted directly from the original gray scale mammogram with steps of image processing algorithms. Mammography has established to be the majority efficient tool for perceiving breast cancer in its initial and most treatable phase, so it persists to be the key imaging modality for breast cancer screening and identification. Additionally, this examination permits the discovery of other pathologies and might suggest the nature such as normal, benign or malignant. The beginning of digital mammography is measured mainly the significant development in breast imaging. Thus the algorithm provides an accuracy of 99.66% and over in numerous cases of images and gives 100% accuracy in the case of excellence image. Aditya Ravishankar et al., has discussed that “A Survey on Noise Reduction methods in Computed Tomography (CT) scan images presented the detailed imaging of tumor increase inside the lungs and are extensively used. Still assorted noise types like those declared exceeding can be encountered during a CT scan. Elimination of these noises is serious for medical diagnoses and is complete using filters. Filtering is a technique by which they can improve and de-noise the image [9]. They presented a summary of lung cancer and a review on various noises (Wiener Filter, Median Filter, Gaussian Filter, Mean Filter and Contraharmonic Filter) and their removal techniques, including the usage of various filters”. Anshul Pareek and Dr. Shaifali Madan Arora, has discussed about breast cancer is the second most common tumor in the world and more prevalent in the women population, not only a women disease its present in men even. Since the root cause of the disease remains unclear, early detection and diagnosis is the optimal solution to prevent tumor progression and permit a successful medical intervention, save lives and reduce cost. The authors to be processed to eliminate noise such as out of focus and salt and pepper noise, with a set of image processing techniques to enhance or restore the image for meaningful interpretation and diagnosis [10]. Ognjen Magud, Eva Tuba and Nebojsa Bacanin has designed that “Medical Ultrasound Image Speckle Noise Reduction by Adaptive Median Filter algorithm was tested on different ultrasound images and different evaluation metrics counting mean square error, peak signal to ratio, normalized cross correlation, average difference, structural content, maximum difference, normalized absolute error and image enhancement factor were used as measure of the quality of noise removal [11]”.
112
D. Devakumari and V. Punithavathi
Sulong et al. [12] has presented “Preprocessing digital breast mammograms using adaptive weighted frost filter of preprocessing technique is an essential element of any imaging modalities whose foremost aim is to perform such processes which can bring the image to that quality where it is suitable for further analysis and extraction of significant data. The authors talked about pre-processing which has great significance in mammographic image analysis due to poor quality of mammograms since they are captured at low dose of radiation while the high amount of radiation may expose the patient’s health. They concluded that the Adaptive Weighted Frost filter is the best suitable choice for eliminating noise from mammographic images and performs better comparatively. The Frost filter attained better PSNR values, for example in mdb001, the PSNR values produced by modified Frost filter is 34.6 dB”.
3 Proposed Work The proposed methodology considers the order to test all the experiments were conducted using Mammographic Image Analysis Society (MIAS) - a benchmarked dataset [12]. The Hybrid De-nosing Filter algorithm successfully eliminates the noise from mammogram images, the resultant mammogram is enhanced and noise free images are obtained which can be used for further processing like segmentation and classification. The overall HDF flow diagram is described in Fig. 1.
MIAS Image Database
Convert Gray Image in double precision
Image Resize(256x256) using ‘Bicubic’
Initialize Point Spread Function(PSF) Value
Hybrid Denoising Filter Sorting Process
Extract Estimated Noise Portion in Mammogram images
Overlap Noise Image and Adjust Image-To remove noise
Denoise Result
Fig. 1. Proposed Hybrid Denoising Filter flow diagram
Noise Removal in Breast Cancer Using HDF for Mammogram Images
113
In Fig. 1, MIAS Image datasets are taken as inputs in MATLAB simulation for Hybrid Denoising Filter algorithm. The first process is converting image from MIAS database into double precision and the image is resized into 256 256 using bicubic method. Bicubic method resizes the image without any data loss of the original image. Next step is initializing the Point Spread Function (Optimization method) or kernel value to 3 3 for denoising process. Third process is the proposed work called Hybrid Denoising Filter (HDF) method which is applied to extract and remove the noisy portions in an mammogram image. Finally image adjustment procedure is carried out to adjust the image intensity of final Denoise result (HDF result). 3.1
Image Preprocessing
Image preprocessing is a technique that involves converting raw image data into reasonable format. Preprocessing method is separated into two stages, specifically; (i) Image Re-sizing conversion (ii) Hybrid De-noising Filter. In this pre-processing phase is an improvement of the mammogram image portion contains unnecessary distortions or improves various image attributes significant for further processing. In this phase, original MIAS mammogram image database of pixels size (1024 1024 uint8) is converted into predefined 256 256 sizes exclusive of pixel information failure using ‘bicubic’ interpolation method. After that, MIAS database images should be of the equal dimension and are believed to be connected with indexed images on a common gray color map. 3.2
Hybrid De-noising Filter (HDF)
The proposed HDF algorithm successfully eliminates the noise from mammograms, the resultant mammogram is enhanced and noise free images which can be used for further processing like segmentation and classification. In this method an image is taken from MIAS database as input image and image dimensions are taken for processing. The HDF algorithm is used to remove the noise with the help of various attributes such as PSF Value, Sorting, Pixel Portion and Image Adjustment. The formula used is given below: 8 row column X X < ðimgi1; imgj1 Þ ðimgi; imgj1 Þ HFD ¼ : i¼1 j¼1 ðimgi1; imgj Þ
ð1Þ
Algorithm 1: Hybrid De-noising Filter (HDF) Step 1: Initialize Point Spread Function matrix 3 3. Step 2: At each pixel, sort image pixels with average filter differences Ak(i, j) between the image pairs in horizontal and vertical directions. Step 3: Search the sorted image locations according to Eq. (1) Step 4: Calculate the image adjust over the whole image. Step 5: Denoise result image The proposed algorithm compares various image de-noise filtering algorithms of median, Gaussian, Wiener, Applied Median filter. The comparison results are shown in Fig. 2. By integrating Median and Applied Median Filter, the proposed Hybrid Denoising Filter is found out and shown in Fig. 3.
114
D. Devakumari and V. Punithavathi
Fig. 2. Comparison of various image denoising filters [5]
By comparing original image with de-noised image, the following metrics are calculated. The performance parameters are most significant measure to justify the simulation results. Peak Signal to Noise Ratio (PSNR) and Root Mean Square Error (RMSE) are considered parameters for measuring the quality of images [6]. The Peak Signal to Noise Ratio (PSNR) is measured using Decibel (dB).
Fig. 3. Integration of Median, Applied and proposed Hybrid De-noising Filter
Root Mean Square Error (RMSE): RMSE is the root mean square error between original and denoised image with image dimensions (row column). It is the cumulative square error between the encoded and the original 322 digitized images and the average RMSE is evaluated using, RMSE ¼ sqrtðmeanðinput denoiseImageÞ2 Þ
ð2Þ
Table 1 shows the comparison of RMSE values with fifteen sample images from MIAS database.
Noise Removal in Breast Cancer Using HDF for Mammogram Images
115
Table 1. Comparison of RMSE values with fifteen sample images from MIAS database Images mdb001.png mdb002.png mdb003.png mdb004.png mdb005.png mdb013.png mdb018.png mdb031.png mdb033.png mdb035.png mdb037.png mdb044.png mdb050.png mdb052.png mdb059.png
Wiener 6.0333 7.0109 7.1967 7.6966 7.1716 3.7058 3.7491 4.8945 4.7416 3.7941 4.0218 9.161 5.0736 6.1594 4.8439
Gaussian 6.0338 7.0106 7.1991 7.6995 7.1918 7.1527 6.4879 7.3685 6.7688 6.2327 6.0643 5.9009 6.9322 6.9221 6.6159
Applied Median Median Proposed HDF 4.2627 0.18952 0.18516 4.9552 0.26002 0.22827 5.0905 0.36252 0.33516 5.441 0.1204 0.10291 5.0831 0.23076 0.2155 5.0599 0.33455 0.34706 4.5876 0.25822 0.23479 5.2107 0.3925 0.40241 4.7844 0.2986 0.28849 4.4043 0.26819 0.25116 4.2854 0.25533 0.22824 4.1725 0.26537 0.21767 4.9009 0.28034 0.26031 4.8934 0.28246 0.26386 4.6771 0.26271 0.24755
Fig. 4. Comparison of RMSE values with fifteen sample images from MIAS database
The comparison chart of RMSE values for various existing filters (Wiener, Gaussian, Applied Median, and Median) with Proposed Hybrid De-noising Filter is shown in Fig. 4. Table 2 shows the evaluation of RMSE values for various filters with Proposed Hybrid De-noising Filter. The Comparison of Average RMSE Ratio for Overall 322 images from MIAS Database with various Existing and Proposed Filter is shown in Fig. 5.
116
D. Devakumari and V. Punithavathi
Table 2. Average RMSE ratio for overall 322 images in MIAS database with various filters Measure Wiener Gaussian Applied Median Median Proposed Hybrid De-noising Filter RMSE 7.2371 7.2396 5.1187 0.277 0.2569
Average RMSE Ratio for Overall 322 images in MIAS Database 7.2371
8
7.2396
5.1187
RMSE
6 4 2
0.277
0.2569
0 Wiener RMSE
Gaussian
Median Filtering Methods
Fig. 5. Average RMSE ratio for overall 322 images from MIAS database with existing and proposed filter
Table 3 shows the comparison of PSNR values with fifteen sample images from MIAS database and Fig. 6 shows the comparison chart of PSNR values for various existing filters (Wiener, Gaussian, Applied Median, and Median) with Proposed Hybrid De-noising Filter. Table 3. Comparison of PSNR values with fifteen MIAS sample images Images mdb001.png mdb002.png mdb003.png mdb004.png mdb005.png mdb013.png mdb018.png mdb031.png mdb033.png mdb035.png mdb037.png mdb044.png mdb050.png mdb052.png mdb059.png
Wiener (dB) 30.1713 29.5596 29.1662 29.0208 29.5125 29.4193 29.7784 29.6591 29.9694 30.1075 30.1055 30.0515 29.7054 29.7287 29.8369
Gaussian (dB) 30.1663 29.5558 29.1616 29.0157 29.5312 29.4127 29.7719 29.6531 29.9632 30.1016 30.1 30.0448 29.7003 29.7239 29.8325
Applied Median (dB) 30.1713 42.983 43.625 43.2533 44.0972 44.198 45.5654 43.9061 45.683 46.3653 46.2509 45.4693 45.5095 45.2483 45.7528
Median (dB) 47.583 42.4871 42.9168 42.9262 43.8703 43.833 44.9026 43.386 44.9637 45.4951 45.3857 45.2444 45.0399 45.8145 45.1627
Proposed HDF (dB) 48.9996 42.9914 43.5942 43.8612 44.7114 44.6078 45.9955 43.9247 45.7363 46.6076 46.4589 47.2456 46.1409 45.8853 46.155
Noise Removal in Breast Cancer Using HDF for Mammogram Images
117
Fig. 6. Comparison of PSNR values with fifteen MIAS sample images
Table 4 shows the evaluation of Mammogram image de-noising measures of PSNR on MIAS image database. Overall Comparison of PSNR value for 322 images from MIAS Database is shown in Fig. 7. PSNR is measured using the formula, MAXf PSNR ¼ 20log10 pffiffiffiffiffiffiffiffiffiffi MSE
ð3Þ
The overall comparison of PSNR value for 322 images in MIAS database are executed in order to find out the duration for the noise removal process on a Windows machine with an Intel core I5 processor with 3.20 GHz speed and 8 GB of RAM. The elapsed time is estimated for Applied Median, Median filter methods and proposed HDF as 293.820402, 115 016032 and 114.290015 s. Therefore experimental results proves that HDF is the best method for removing noise with less time when compared to the existing filter methods. Table 4. Average PSNR ratio comparison of overall 322 images in MIAS database with various filtering methods Measure PSNR
Wiener (dB) 45.633
Gaussian (dB) 29.702
Applied Median (dB) 45.237
Median (dB) 44.637
Proposed HDF (dB) 53.846
118
D. Devakumari and V. Punithavathi
Fig. 7. Overall comparison of PSNR with various existing and proposed filter method for 322 images in MIAS database
4 Conclusion In this paper analyzed the advancement of the information and communication technology in the field of mammogram image preprocessing. The proposed method works with two phases namely Image Re-sizing Reduction and Image De-noising. The original MIAS mammogram image database images are resized into predefined sizes and Image De-noising: is carried out using Hybrid De-noising Filter to remove the noise in mammogram images. The proposed Hybrid De-noising Filter algorithm is best eminent method for removal of noises in mammogram images when compared with various filters. The HDF is obtained by Median and Applied Median Filter integration. Comparison of PSNR and RMSE with different De-nosing filter such as Wiener, Gaussian, Median and Applied Median filter with proposed HDF method attains overall mean RMSE value for 322 images is found to be 0.2569 and overall mean PSNR value for 322 images is found to be 53.846 dB which is obtained by using the proposed Hybrid Denoising Filter. Hence it proves that Hybrid De-noising filter have less RMSE value and High PSNR ratio which is the best result when compared with the existing De-noising methods. It also proves that the experimental results of Hybrid De-noising Filter algorithm takes 114.290015 s for executing the overall 322 images. Therefore HDF possess less processing time and it is the best method for noise removal in mammogram images. Compliance with Ethical Standards. ✓ All authors declare that there is no conflict of interest ✓ No humans/animals involved in this research work. ✓ We have used our own data.
Noise Removal in Breast Cancer Using HDF for Mammogram Images
119
References 1. Dengler, J., Behrens, S., Desaga, J.F.: Segmentation of microcalcifications in mammograms. IEEE Trans. Med. Imaging 12, 634–642 (1993) 2. Donoho, D., Johnstone, I., Kerkyacharian, G., Picard, D.: Wavelet shrinkage: asymptopia? J. R. Stat. Soc. B 57, 301–369 (1995) 3. Catté, F., Lions, P., Morel, J., Coll, T.: Image selective smoothing and edge detection by nonlinear diffusion. SIAM Numer. Anal. 29(1), 182–193 (1992) 4. Hyvärinen, A.: Sparse code shrinkage: denoising of nongaussian data by maximum likelihood estimation. Neural Comput. 11, 1739–1768 (1999) 5. Ponraj, N., et al.: A survey on the preprocessing techniques of mammogram for the detection of breast cancer. J. Emerg. Trends Comput. Inf. Sci. 2(12), 2079–8407 (2011) 6. Angayarkanni, N., Kumar, D.: Mammogram breast cancer detection using pre-processing steps of binarization and thinning technique. Pensee J. 76(5), 32–39 (2014) 7. Monica Jenefer, B., Cyrilraj, V.: An efficient image processing methods for mammogram breast cancer detection. J. Theor. Appl. Inf. Technol. 69(1), 32–39 (2014) 8. Angayarkanni, N., Kumar, D., Arunachalam, G.: The application of image processing techniques for detection and classification of cancerous tissue in digital mammograms. J. Pharm. Sci. Res. 8(10), 1179–1183 (2016) 9. Ravishankar, A., Anusha, S., Akshatha, H.K., Raj, A., Jahnavi, S., Madhura, J.: A survey on noise reduction techniques in medical images. In: International Conference on Electronics, Communication and Aerospace Technology (ICECA) (2017) 10. Pareek, A., Arora, S.M.: Breast cancer detection techniques using medical image processing. Int. J. Multi. Educ. Res. 2(3), 79–82 (2017) 11. Magud, O., Tuba, E., Bacanin, N.: Medical ultrasound image speckle noise reduction by adaptive median filter. WSEAS Trans. Biol. Biomed. 14, 2224–2902 (2017) 12. Talha, M., Sulong, G.B., Jaffar, A.: Preprocessing digital breast mammograms using adaptive weighted frost filter. Biomed. Res. 27(4), 1407–1412 (2018)
Personality Trait with E-Graphologist Pranoti S. Shete1(&) and Anita Thengade2 1
MIT World Peace University, Pune, India [email protected] 2 MIT College of Engineering, Pune, India [email protected]
Abstract. Signature analysis helps in analyzing and understanding individual’s personality. Graphology is the scientific technique that helps us predict the writer’s personality. Different types of strokes and patterns in writer’s signature are considered for predicting their personality trait. Social skills, achievements, work habits, temperament, etc. can be predicted by using the writer’s signature. It helps us in understanding the person in a better way. As signature is directly related and develops a positive impact on your social life, personal life as well as for your career it is essential to practice correct signature for good results. The main objective here is predicting authors personality trait based on features such as Skewness, Pen Pressure, Aspect Ratio, Margin, and the difference between the first and last letter of the signature. As your signature has a direct impact on any of your assets and career, the proposed system will also provide suggestions for improvement in the signature if needed. This research paper proposes an offline signature analysis. We have created our own dataset for the analysis purpose. We have also provided them with some questionnaire to check the accuracy of the proposed system. Keywords: Signature analysis Personality prediction Graphology Artificial neural network Human personality Personality Signature Margin Aspect Ratio Difference between the first and last letter of the signature Pen-pressure
1 Introduction E-Graphologist is emerging as a most demanded profession in this modern world of technologies. Graphology also guides people to derive a proper solution in order to overcome their problems. Manual graphology test is a time consuming process. It involves lot of human intervention and it remains as a more expensive method. Besides, the rate of accuracy of any analysis, it depends on the skills of the graphologist. This system uses image processing techniques to extract the features in your signature. These features tell you about your personality as well as it provides suggestions for improvement if needed. The characteristics used to predict personality of a writer are Skewness, Pen Pressure, Aspect Ratio, Margin, and difference between first letter and the last letter of the signature. These features are selected based on 5 important traits that are used for predicting personality which is known or memorized as O-C-E-A-N traits. © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 120–130, 2020. https://doi.org/10.1007/978-3-030-37218-7_14
Personality Trait with E-Graphologist
121
Following table shows the personality assigned to each trait and the feature that we are detecting this personality from (Table 1). Table 1. Personality traits and features. 5 important traits Openness
Low
High
Prefer to be alone (Skewness)
Conscientious
Quick reaction and impatience (Aspect Ratio)
Extroversion
Pessimistic (Skewness) Capricious (Difference between first and last letter of the signature) Easy going, not over emotional, Forward looking person (Pen-pressure and Margin)
Creative, love to be surrounded by people (Skewness) Self disciplined, Understanding (Difference between first and last letter of the signature) Optimistic (Skewness) Reliable, Assertive (Aspect Ratio)
Agreeable
Neuroticism
Over emotional, Tendency to cling to the past (Pen-pressure and Margin)
• Openness: People with high score in openness are the people who love learning new things and getting new experience, learning new things and need to be surrounded by people. People having low score in Openness likes to be alone and not surrounded by the people all the time. • Conscientiousness: Conscientious people are organized with strong sense of duty. They’re disciplined, down to earth, understanding and goal oriented. These type of people roam around the world with not just bag-pack but with plans. People who are low in conscientious tends to be more careless, impatient and aggressive. • Extraversion: The extrovert person is also called as social butterfly. Extraverts are sociable and chatty. They are assertive and cheerful. Introverts, on the other hand, need plenty of alone time. People with high extroversion can be classified as optimistic where as people with low extroversion which is also called as introversion can be classified as pessimistic. • Agreeableness: It measures the warmth and kindness of a person. The more agreeable the person is, the more likely he is to be trusting, helpful and compassionate. Disagreeable people on other hand are suspicious of others, cold and not very cooperative. • Neuroticism: It is one of the Big Five personality traits in the study of psychology. High scorers in neuroticism are kind of moody, they just think of the past mistakes and has feelings like worry, anger, frustration, fear, anxiety, etc.
122
P. S. Shete and A. Thengade
2 Related Work In [1] BPN and structured identification algorithm is used to detect ascending underline and dot structure with the accuracy of 100% and 95% respectively. In [2] personality is predicted using both handwriting and signature. Eight features with multi-structure algorithms and six features with ANN with the accuracy rate of 87–100% and 52–100% respectively were classified. In [3, 5–7] handwriting and signature features like baseline, pen pressure and slants were detected to predict writers personality [3, 5, 7] used ANN for feature extraction whereas [6] used SVM for the same and obtained the accuracy 93.3%. In [3] handwriting features like ‘t-bar’ and loop of ‘y’ were extracted using template matching algorithm. Shell, dot structure, underline, streak and shape was identified in [5] using ANN and BPN. The main aim of [4] was to show a complete methodology to develop a system for predicting personality of a writer using as a fundamental part of the right people with the features such as slant, zones, pen pressure, connections, spacing, margin, letter size, speed, and clarity to increase the concern of developing more efforts in this area. In [8] two areas were considered for predicting personality in which the first area is to identify the styles n strokes of the capital letters such as ‘O’, ‘T’ and ‘A’ using Learning Vector Quantization(LVQ). The second area was Signature analysis in which features like curved start, end stroke, shell, middle streaks, underline were identified using ANN and extreme margin, streaks disconnected and dot structure were identified with multi structure algorithm. LVQ was also used for training data in second area (signature analysis). For 100 test data the system accuracy was only 43% because while training the data, most of the data was not identified. In [9] the signature analysis was carried out using WACOM pen tablet (online signature analysis) to identify the features like position of pen in y-axis and x-axis, angle of the pen and pen pressure. Methodologies like Template matching and Hidden Markov Model (HMM) were used to train off-line signature data. In [10] analysis is done on the features such as T-bar, margin, slant and baseline where Polygonalization and Template matching are used for identifying the slant of the baseline and compared input with model signature to identify in which class will that signature belong to. In [11, 12, 15, 16] features like baseline, pen pressure and slant of handwriting was detected using ANN. In [14] loops of ‘i’ and ‘f’ were also considered. Whereas in [17] new feature like “unique stroke” of letter ‘a’, ‘d’, ‘i’, ‘m’, ‘t’ were detected using structured analysis. In [13] the features like page margin, line spacing, slant, zones and direction are extracted. Lex and Yacc software was used for analysis along with context free grammar with nearly 50 rules for accurate prediction. In [14] the effort for developing handwritten Indian Script (Devnagarari) of numeral database are created which consist of 22,556 samples. A multistage method was implemented for recognizing there handwritten data separately as well as in mixed script more accurately. In [18] baseline and pen pressure is detected using orthogonal projection and gray level respectively. These two features separately as well as combined are tested on more than 550 text images.
Personality Trait with E-Graphologist
123
3 System Architecture • Image is taken from the database for pre-processing purpose. The input image is properly scanned with 300 dpi scanner. • The signatures are then converted in gray scale to identify the pen-pressure of the writer. • In noise removal process, single black pixel which is not a part of signature but is plotted on white background and single white pixel which is not a part of signature but is plotted on black background is removed for accurate predictions. • The features extracted in this paper are Skewness, Pen Pressure, Aspect Ratio, Margin, and the difference between the first and last letter of the signature and the predictions are made based on these features. • Support Vector Machine (SVM) classifier is used for feature extraction and the personality is predicted based on the strokes of individual’s signature (Fig. 1).
Fig. 1. General architecture of personality prediction system.
4 Paper Work This research paper proposes an off-line signature analysis. We have created our own dataset for analysis purpose. We also provided them with some questionnaire to check the accuracy of our system. Image pre-processing is used to remove unwanted noise from the images so that the next step of feature extraction and personality prediction should have higher accuracy.
124
4.1
P. S. Shete and A. Thengade
Scanning the Signature
All the signatures that we are processing need to be scanned properly. The English and Marathi signatures dataset was collected manually, and scanned the signature using 350 dpi resolution scanner. Now the scanned images are preprocessed. 4.2
Pre-processing
There are three steps in pre-processing, 4.2.1 Gray Scale There are three steps in pre-processing, Gray image consist of values from 0–255, where, 0 is black and 255 is white. 4.2.2 Binary Image To perform image segmentation Thresholding is easy method. Using thresholding technique grey image is converted to binary image. This thresholding method replace each black pixel with white pixel if the intensity of that pixel is less than that of any constant, or replace each white pixel with black pixel if the intensity of that pixel is greater than that of any constant. 4.2.3 Noise Removal As digital images consist of various types of noise which cause error resulting in wrong output. So the next step performed in pre-processing technique is noise removal. Three main techniques for noise removal are Noise removal by linear filtering, Noise removal by Adoptive filtering, Noise removal my Median Filtering. In Fig. 2 we can see the first image consist of some unwanted black dots which is been removed in this step by using median filtering.
Fig. 2. Output of noise removal
5 Features 5.1
Skewness
Skewness is to find in which direction the writer has signed his signature in downward or upward direction. In this paper to find skewness, the regionprops function is used. It is obtained by computing normalized centered moments. The matrix of Inertia is
Personality Trait with E-Graphologist
125
obtained from the normalized moments of order. With the help of Eigen vectors and Eigen values, the direction of Eigen vector can be found which can then be converted to angle. If the angle is found to be greater than that of 0 than it is classified as upward oriented, and if the angle is found to be less than 0 then the signature is downward oriented. 5.2
Pen-Pressure
A pen-pressure is one of important feature in personality prediction. It helps in identifying a person emotional capabilities and decisiveness. In this paper to find penpressure we have set threshold to 0.8. If the pen-pressure is more than 0.8 then it is classified as heavy pen pressure or emotional person and if signature is less than 0.8 so the person is not very emotional. 5.3
Aspect Ratio
The aspect ratio is the relationship between the height and the width of the signature. Using aspect ration of the writers signature we can predict his personality if he is calm and easy going person or short tempered etc. 5.4
Difference Between First and Last Letter of the Signature
A difference between first and last letter indicates if his signature is narrow or broad and based on that the writer will be classified into respective classes. 5.5
Margin
Graphologist says the left margin space by the writer should be at least 30% of the length of his signature, if not then the person has a tendency to cling his past which may result in bad out come in his current and future life. We have calculated margin as follows, firstly all the horizontal pixels are calculated and then to calculate length of the signature all the pixels from row 0 to n − 1 along with all the pixels of column 0 to n − 1 are calculated.
6 Features and Their Meaning The five most important traits for predicting personality is O-C-E-A-N traits. These traits are accepted worldwide. By considering the above traits we are extracting the features such as, 6.1
Skewness
The upward oriented signatures indicate that the writer is optimist, active and ambitious, and the signature with downward orientation indicates the writers nature is Pessimistic.
126
6.2
P. S. Shete and A. Thengade
Pen-Pressure
If the writer has heavy pen-pressure that means the writer is too emotional and the person with light pen-pressure indicates the person is less emotional. 6.3
Aspect Ratio
People with the long length signature are reliable and assertive, whereas the people with short length signature tends to be impatient and react quickly. 6.4
Difference Between First and Last Letter in the Signature
Difference between the first and last letters in your signature tells you about the writer’s flexibility. That is if huge amount of difference is identified then the writer is capricious, and flexible enough to accept others work method and if the difference between the signatures is less than the writer is classified as self-disciplined and flexible. 6.5
Margin
The margin that we leave while writing while doing signature tells lot about our personality. Signature to the very left indicates the person has a Tendency to cling to the past whereas the signature more to the right indicates that writer is a Forward looking Person.
7 System Algorithm 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16.
if(orientation= upwards) then write “Optimistic” else (orientation= downwards) then write “Pessimistic” end if. if(pressure=heavy) then write “Too emotional” else (pressure=light) then write “Not too emotional” end if. if(length= long) then write “Understanding, Reliable and Assertive” else (length= short) then write “Quick reaction and Impatience” end if. if(difference= huge difference between first and last letter) then
Personality Trait with E-Graphologist
17. 18. 19. 20. 21. 22. 23. 24. 25. 26.
127
write “Capricious” else (difference= less difference between first and last letter) then write “Self-disciplined and flexible.” end if. if(margin= less) then write “Tendency to cling to the past” else (margin= more) then write “Forward looking person” end if. Stop.
8 System Input/Output Input
Fig. 3. System input image.
Figure 3 is the Graphical User Interface of the E-Graphologist System. Here we need to browse the image from the dataset and then clicking on predict will predict the writers personality. Now there are again two sections on the right hand side, i.e. Predicted personality and suggestions.
128
P. S. Shete and A. Thengade
Output
Fig. 4. Output image.
In Fig. 4 we can see the scanned signature is been preprocessed and based on the features in the signature the personality is been predicted. In the suggestion section, there will be suggestions for improving your signature if needed.
9 Conclusion and Future Scope We evaluate the performance metrics of our system by comparing the answers provided by the writer and the personality our system is predicting by calculating true positive (TP), or true negative (TN) or false positive (FP) or false negative (FN). • TP is when the writer has some positive point and the system successfully detects it as positive. • FP is when the writer has some negative point and the system detects it as positive. • FN is when the writer has some negative point and the system successfully detects it negative. • TN is when the writer has some positive point and the system detects it as negative. The metrics used to evaluate performance are the standard methods such as accuracy, precision, recall and F score (Fig. 5).
Fig. 5. Equations to evaluate performance metrics.
Personality Trait with E-Graphologist
129
Future target is to develop tool for predicting personality with more number of features. The additional features such as Slant, underline, and character recognition can be taken into consideration. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Lokhande, V.R., Gawali, B.W.: Analysis of signature for the prediction of personality traits. In: 2017 1st International Conference on Intelligent Systems and Information Management (ICISIM), 5–6 October 2017. IEEE (2017) 2. Djamal, E.C., Darmawati, R., Ramdlan, S.N.: Application image processing to predict personality based on structure of handwriting and signature. In: International Conference on Computer, Control, Informatics and Its Applications (2013) 3. Champa, H.N., AnandaKumar, K.R.: Automated human behavior prediction through handwriting analysis. In: First International Conference on Integrated Intelligent Computing (2010) 4. Varshney, A., Puri, S.: A survey on human personality identification on the basis of handwriting using ANN. In: International Conference on Inventive Systems and Control (ICISC-2017) (2017) 5. Sharma, V., Depti, Er.: Human behavior prediction through handwriting using BPN. Int. J. Adv. Res. Electron. Commun. Eng. (IJARECE) 6(2), 57–62 (2017) 6. Prasad, S., Singh, V.K., Sapre, A.: Handwriting analysis based on segmentation method for prediction of human personality using support vector machine. Int. J. Comput. Appl. 8(12), 25–29 (2010). (0975 – 8887) 7. Champa, H.N., AnandaKumar, K.R.: Artificial neural network for human behavior prediction through handwriting analysis. Int. J. Comput. Appl. 2(2), 36–41 (2010). (0975 – 8887) 8. Djamal, E.C., Ramdlan, S.N., Saputra, J.: Recognition of handwriting based on signature and digit of character using multiple of artificial neural networks in personality identification. In: Information Systems International Conference (ISICO), 2–4 December 2013 9. Faundez-Zanuy, M.: Signature recognition state-of-the-art. IEEE Aerosp. Electron. Syst. Mag. 20(7), 28–32 (2005) 10. Joshi, P., Agarwal, A., Dhavale, A., Suryavanshi, R., Kodolikar, S.: Handwriting analysis for detection of personality traits using machine learning approach. Int. J. Comput. Appl. 130(15), 40–45 (2015). (0975 – 8887) 11. Kedar, S., Nair, V., Kulkarni, S.: Personality identification through handwriting analysis: a review. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 5(1), 548–556 (2015) 12. Grewal, P.K., Prashar, D.: Behavior prediction through handwriting analysis. Int. J. Comput. Sci. Technol. 3(2), 520–523 (2012) 13. Sheikholeslami, G., Srihari, S.N., Govindaraju, V.: Computer aided graphology. Research Gate, 28 February 2013 14. Bhattacharya, U., Chaudhuri, B.B.: Handwritten numeral databases of Indian Scripts and multistage recognition of mixed numerals. IEEE Trans. Pattern Anal. Mach. Intell. 31(3), 444–457 (2009)
130
P. S. Shete and A. Thengade
15. Djamal, E.C., Febriyanti: Identification of speed and unique letter of handwriting using wavelet and neural networks. In: Proceeding of International Conference on Electrical Engineering, Computer Science and Informatics (EECSI 2015), Palembang, Indonesia, 19– 20 August 2015 16. Bobade, A.M., Khalsa, N.N., Deshmukh, S.M.: Prediction of human character through automated script analysis. Int. J. Sci. Eng. Res. 5(10), 1157–1161 (2014) 17. https://www.futurepointindia.com/article/en/your-nature-is-revealed-in-your-signature-8792 18. Bala, A., Sahaa, R.: An improved method for handwritten document analysis using segmentation, baseline recognition and writing pressure detection. In: 6th International Conference on Advances in Computing & Communications, ICACC 2016, Cochin, India, 6– 8 September 2016
WOAMSA: Whale Optimization Algorithm for Multiple Sequence Alignment of Protein Sequence Manish Kumar(B) , Ranjeet Kumar, and R. Nidhya Department of CSE, MITS, Madanapalle, India [email protected]
Abstract. In the past few years, the Multiple Sequence Alignment (MSA) related problems has gained wide attractions of scientists and biologists as it is one of the major tools to find the structural and behavioral nature of Biomolecules. Furthermore, MSA can also be utilized for gene regulation networks, protein structure prediction, homology searches, genomic annotation or functional genomics. In this paper, we purpose a new nature and bio-inspired algorithm, known as the Whale Optimization Algorithm for MSA (WOAMSA). The algorithm works on the principle of bubble-net hunting nature of the whale with the help of objective function we tried to solve the MSA problems of protein sequences. In order to focus on the effectiveness of the presented approach, we used BALiBASE benchmarks dataset. At the last, we have compared the obtained result for WOAMSA with other standard methods mentioned in the literature. After comparison, it was concluded that the presented approach is better (in terms of obtained scores) when compared with other methods available in the considered datasets. Keywords: Proteins · Bioinformatics · Whale Optimization Algorithm · Multiple Sequence Alignment
1 Introduction Alignment of multiple sequence or Multiple Sequence Alignment (MSA) [1] is the alignment of three or more amino/nucleotide acid. Generally, MSA is a method used to determine homology among the given pair of sequences. The most studied branches in bioinformatics, is the sequence similarity. The currently available molecular data is capable enough to teach us about evolution, structure and functions of molecules or biomolecules in the related domains. The motive to study or research on MSA is to have better aligned sequences which are capable of finding biological relationship between related sequences. But, developing optimal MSA is not at all easy. Let us take an example: Here, we will consider large number of sequences (N) as input for MSA with a defined scoring meter for creating an optimal MSA. Although, a very simple requirement is mentioned here for developing a MSA. MSA may require the criteria for input selection along with the comparison model to evaluate the obtained alignment. © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 131–139, 2020. https://doi.org/10.1007/978-3-030-37218-7_15
132
M. Kumar et al.
There are various points indicated in the literature [2–10] for alignment problems. First one is the structure of protein sequences, which are too complex. Second one, the newly developed protein or DNA sequences which are chosen by automatics methods contains considerable amount of bugs. But, there are various approaches suggested in the literature for MSA problem such as classical, iterative and progressive algorithms. All these algorithms are either local or global in nature. The global algorithm based on the principle of end to end alignment of sequences. Whereas, local approach first looks for a substring within the available strings and later tries to align it with the reference string [11]. While dealing with local alignment approaches, we often encounter a problem in identifying the region of similarity with the given sequences. Therefore, efforts should be made to resolve the issues related to local alignment. A dynamic programming (DP) [11] approach which generally uses both the technique (local & global) is the Smith– Waterman algorithm [12] and Needleman-Wunsch algorithm [13]. The dynamic based approach can only be used for aligning sequences which are lesser in number. Here, it should be noted that MSA is a combinatorial type of problem and as the numbers of sequences increases the computational efforts becomes probative. In order to know the relationship between sequences, authors (Feng and Dolittle) [14] constructed an evolutionary tree based on Needleman and Wunsch methods. The progressive alignment algorithm method often faced the problem of getting trapped in local optima. To deal with such kind of problems, it is suggested in the literature [1–14] to use either iterative or stochastic methods [15]. Furthermore, by referring to several literature studies we came to a conclusion that none of the existing or presented algorithms are capable enough to give optimum output with all the standard datasets. As a result, with the help of certain bio inspired or evolutionary algorithms we have developed an iterative method using whale optimization algorithm which can work on some standard datasets available in the literature. However, all the listed methods in the literature [1–15] have shown their effectiveness in aligning the distantly related sequences for almost all the standard datasets. But, some accuracy was lost in aligning such sequences which are situated at a distance apart. The above paragraphs, clearly indicates that none of the available algorithms or methods can give accurate and meaningful alignment under all possible condition. Above all, progressive alignment method is fast and creative, but if any errors occur in the very first alignment and it gets propagated to related alignments, then it is very hard to correct the errors in the alignment. But, such kind of problems or hurdles are not there for iterative kind of methods. Furthermore, we can say that iterative method is much slower than progressive method and can be used in place where alignment of sequences are most important as compared to computational cost [16]. Algorithm such as the genetic algorithm (GA) [8] which follows the natural selection process, are often applied and implemented for iterative method. This type of algorithms can fit to any type of fitness function and therefore most considered for iterative types of problems. Furthermore, algorithms like GA can help us to achieve different objective functions. Also, evolutionary algorithms like GA or other such kind of related algorithms can give multi-core processor and low cost cluster because they can be parallelized for meeting the current trend scenario.
WOAMSA: Whale Optimization Algorithm for MSA of Protein Sequence
133
In this study, we have considered bio inspired algorithms called Whale Optimization Algorithm (WOA) [17], for our experimental analysis. The proposed Whale Optimization Algorithm for Multiple Sequence Alignment (WOAMSA) follows the multiobjective principle in which the nature inspired methodology has also been considered. Here, we generate the initial population randomly and then with the help of objective function we calculated the fitness score for each column of the protein sequences available in standard dataset known as BALiBASE. The advantages that we have with the bio-inspired algorithms or nature inspired algorithms (such as Biogeographically Based Optimization [18], Gravitational Search Algorithm [19], Ant Colony Optimization [20] etc.) is that they only requires a feasible objective function for evaluating the proposed algorithm or scheme. Since, the above mentioned algorithms are highly implicit parallel techniques; therefore they can be utilized for real time large scale problems for analyzing the biological sequences for alignment problems. Further in this paper we will proceed as follows. The section that follows tells us about the motivation behind this research work which includes important preliminaries required to understand this research work. After that we will discuss about the proposed scheme that will follow the experimental set up and the comparative results with conclusion.
2 Motivation Sequences which are smaller in length can be align manually but sequences of higher length requires certain algorithm (such as BBO, ACO, WOA etc.) for successful alignment. Dynamic programming which comes under the progressive technique suffers from the problem of early convergence and hence cannot be used for handling MSA related problems. For such kind of problems, we need iterative kind of algorithm such as the Genetic Algorithm. While going through the literature reviews, we came to know about the importance of protein structure alignment in near future and therefore considered proteins for our experimental analysis. Until now, sequence similarity method is only considered for predicting the functions and evolutionary history of biomolecules. But, literature studies [1–20] suggested that there is a remarkable improvement in the tools for handling issues related to multiple sequence alignment. Furthermore, studies also stated that we need to combine sequence alignment with some of the known protein structure in order to have a close look over the alignment of protein sequences. We can also expect a better alignment, if we utilize the phylogenetic relationship among sequences. By referring to number of literature studies [1–21], we came to know that there are still huge challenges for aligning protein sequence properly. The first and foremost thing is the less aligned locally conserved region within the sequence. This problem has been considered for aligning any protein sequences, irrespective of their lengths. Second factor is the mismatch of the motifs which may find a disordered region within the alignment. Third and most important problem is the dataset or the alignment error, which we encounter every time we access the protein datasets available around the globe. Taking the literature survey into consideration, and in order to known the effectiveness of the presented (WOAMSA) approach, a comparative study has been made in
134
M. Kumar et al.
Table 1 with some of the other methods such as BBOGSA [22], GA based approach [23], QBBOMSA [24]. The GA based method presented in [23] tells about the role of genetic operators (crossover and mutation) for aligning the protein sequences in terms of scores. If score is good (approaching towards 1) then we can consider it as good or optimal. Whereas, the BBOGSA approach [22] employed the two new bioinspired algorithms to create optimal alignment for MSA of protein sequence. In this scheme, the author merged two nature inspired algorithm and compared their proposed scheme with some of the standard methods available in the same field. In the proposed scheme, we have used BALiBASE datasets for protein sequences and with the help of Whale Optimization Algorithm we tried to align the unstructured protein sequences. We first initialize the initial population and then with the help of a fitness function we calculated the fitness score for each column of protein sequence. Based on this score and with the help of WOA we tried to find out the best optimal alignment for protein sequence and compared the same with some of the available methods for MSA.
3 Proposed Approach In this section, we will describe our proposed approach for aligning multiple protein sequences of variable length using Whale Optimization Algorithm (WOA). The inputs for the WOAMSA is the protein sequences taken from BALiBASE standard dataset. Our experimental analysis followed the Whale Optimization technique in which we first randomly initialized the population of the search agents. And then, the fitness score is calculated for every test case in the dataset by defined objective function. While we are dealing with WOA concept, we need to initialize the current best agent and needs to modify their positions so that they can proceed towards current best search agent. We will continue this process until the optimal solution is achieved. Now, we will demonstrate the steps for aligning protein sequences using Whale Optimization Algorithm. Here, we will discuss the steps involve in WOA and will frame our objective of alignment through WOA. Step 1: In this step, Population is initialized randomly, and as said earlier we have to search for the best search randomly. The first or the inception population that we have considered for WOAMSA is: t i (i = 1, 2, …, k) and we have represented the best search agent as t∗. Step 2: Fitness calculation: To calculate the fitness function we have considered the length of the sequences, where N represents the column and M represent row. The fitness of the chromosome is considered as the sum of the sub-scores obtained from each column. For each column we have consider the chromosomes in N * M fashion and tries to calculate the score. For each match, mismatch and Gap we have assumed a score rating of 2, 1 & −1. Furthermore, we have considered PAM 250 scoring matrix to evaluate the overall score for each column of the protein sequence. Step 3: Encircling prey: Here, in this phase the position of the prey will be predicted by the humpback whale beforehand, and then the whale will try to catch the prey by
WOAMSA: Whale Optimization Algorithm for MSA of Protein Sequence
135
encircling them. Then, the whale will assumes that current solution (prey) is the best it can have and then the search agents update their position based on the position of the current best agent. The objective followed in this study for encircling the whales are as follows: − → − → → − (1) R = S . t ∗ (v) − t (v) − → − → Where, v represents the current run, t tells the position of the vector and t ∗ represents the position vector for the best solution. Vector coefficient is represented by − → S . Equation (1) represents the currents best position of the agents and Eq. (2) will represent the new or updated position: − → − →− → − → (2) t (a + 1) = t ∗ (v) − M . R − → − → − → Where, M represents coefficient vector as mentioned in Eq. (1). Now, S and M will be represented by Eqs. (3) and (4) − → → → → M = 2− n .− r −− n − → → S = 2.− r → → Here, the value of − n is made to fall from 2 to 0 and − r gives the representation of the − → − → random vector in [0, 1]. The values of M and S are updated or modified in search of best search agent available in the surrounding places. Step 4: Exploitation Phase: This step consists of two important phases. The first one is shrinking encircling phase and the second one is spiral updating position. − → In the first phase i.e. shrinking encircling phase, M is set to [−1 to 1] furthermore, current best agent position and initial position will be used to represent the new updated position for the agent. For spiral phase, we can represent the equation for spiral updating as follows: − → − → − → (3) t (v + 1) = R . bcq . cos(2π ) + t ∗ (v) Where, c is the constant notation, q has the range between [−1, 1], (.) represents − → multiplication of one element with other. R will be evaluated as follow: − → − → − → (4) R = t ∗ (v) − t (v) − → Where, position vector is represented by t whereas the position vector of the best − → solution is represented by t ∗ . Now, in-between the encircling mechanism and the spiral position, the current position of the search agent will be updated. − → − →− → t ∗ (v) − M . R i f p < 0.5 − → (5) t (v + 1) = − → cq − → R . b . cos(2π ) + t ∗ (v) i f p > 0.5 Here, p is a number which can range between [−1, 1]
136
M. Kumar et al.
Step 5: Exploration Phase: Here in this phase, the randomly chosen agents will be used to update the position of the search agent. → − − → − → − → R =S . t − t rand
→ − → − → − → − t (a + 1) = t − M . R rand
− → t used to represent the random position vector.
r and
Step 6: Here, we will updated the position of the agent until we find a best solution.
4 Results The aim of this report is to have a feasible study on the behavior of new nature inspired algorithm such as WOA for solving problems related to alignment of multiple protein sequences. Here, we can evaluate the quality of sequences by observing and comparing its obtained score with that of the standard score mentioned in the BALiBASE datasets. For this experimental analysis, we have considered Whale Optimization Algorithm. For experimental evaluation and comparison of the developed method, we have compared our results with some other methods which has used similar nature inspired algorithm. In this case, we have considered BALiBASE standard dataset and within that we have considered ref. 1, 2 & 3 as a reference set for evaluating and comparing our obtained result. It was observed that for few datasets our newly proposed method was not giving optimal result. But for most of the cases, the concept of whale optimization has obtained optimal result. We can say that in the current scenario, most of the nature inspired algorithms such as ACO, GSA, GA, PSO, BBO are very much useful in successful alignment of protein sequences or handling of structural related problems of protein sequences. With the given table, one can identify the number of reference alignment for which our proposed method has shown better results as well as one can easily find out for which test cases our method needs improvement. Here, the score of 1 means the protein sequence is successfully aligned and 0 means our references alignment does not matches at all with the standard reference alignment available in the BALiBASE dataset. For most of the cases, we have successfully aligned the sequences and for those we are lacking we will look forward to align or to improve our score over alignment. The bold faced scores indicate the superiority of the results when compared to different methods. Furthermore, unavailability of result in the literature is represented by n/a. In the Table 1, we will find there are many datasets for which we don’t have any results to compare. From the Table 1, it can also be concluded that the results presented by us using the whale optimization technique is far better than those presented using similar algorithms or technique for the same application of protein sequence alignment. Average Score is also calculated for each method against every datasets mentioned in Table 1. It can be seen that when we calculate the overall average score and compare
WOAMSA: Whale Optimization Algorithm for MSA of Protein Sequence
137
it with the methods considered in this paper, the overall average score of the purposed method is less than all other methods considered in this paper. However, the individual score of WOAMSA is better as compared to other methods considered in the paper. The best score among all are presented/highlighted in bold. Table 1. Obtained results on various references sets (1 to 5) Reference no. Ref. 1
Ref. 2
Ref. 3
SN ASL DATASETS GA based method
QBBOMSA GA-BBO WOAMSA
3
374
1ped
n/a
0.758
0.793
0.879
4
220
1uky
n/a
0.574
0.878
0.954
4
474
2myr
0.621
0.427
0.746
0.547
5
276
kinase
0.981
0.703
0.894
0.846
23
473
1lvl
0.812
n/a
0.850
0.877
18
511
1pamA
n/a
0.824
0.856
0.741
15
60
1ubi
n/a
n/a
0.988
1.000
20
106
1wit
n/a
n/a
0.788
0.784
16
294
2pia
n/a
0.897
0.912
0.981
15
237
3grs
0.793
n/a
0.877
0.684
23
287
kinase
0.847
0.795
0.929
0.962
19
427
4enl
0.845
n/a
0.924
0.956
28
396
1ajsA
0.249
n/a
0.655
0.357
22
97
1ubi
0.576
n/a
0.768
0.944
19
511
1pamA
0.894
0.856
0.820
0.922
24
220
1uky
0.452
n/a
0.944
0.571
Ref. 4
18
468
Kinase2
n/a
0.629
0.943
0.963
6
848
1dynA
n/a
0.237
0.765
0.247
Ref. 5
8
328
2cba
n/a
0.871
0.859
0.874
15
301
S51
n/a
0.869
0.983
0.674
0.707
0.703
0.858
0.698
Average score SN: Sequence Number ASL: Average Sequence Length
5 Conclusion As MSA has become one of the important research area to know the structural and behavioral study of biomolecules therefore, this study present a whale optimization based method to identify the problems related to alignment of protein sequences. Here,
138
M. Kumar et al.
a standard reference sets from BALiBASE is considered and with the help of new nature inspired algorithm i.e. Whale optimization Algorithm we tried to align protein sequence. The result so obtained gives the conclusion that our proposed method is much efficient and feasible for alignment of protein sequences as compared to other nature inspired algorithm available in the same field. Thus, after seeing the result outcome we can conclude that our proposed method has successfully aligned unstructured protein sequence from standard dataset and have obtained better score as compared to other methods discussed in this paper for most of the test cases. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Hamidi, S., Naghibzadeh, M., Sadri, J.: Protein multiple sequence alignment based on secondary structure similarity. In: International Conference on Advances in Computing, Communications and Informatics, pp. 1224–1229 (2013) 2. Eddy, S.R.: Multiple alignment using hidden Markov models. In: Proceeding of the International Conference on Intelligent Systems for Molecular Biology, vol. 3, pp. 114–120 (1995) 3. Peng, Y., Dong, C., Zheng, H.: Research on genetic algorithm based on pyramid model. In: 2nd International Symposium on Intelligence Information Processing and Trusted Computing, pp. 83–86 (2011) 4. Zhu, H., He, Z., Jia, Y.: A novel approach to multiple sequence alignment using multiobjective evolutionary algorithm based on decomposition. IEEE J. Biomed. Health Inform. 20(2), 717– 727 (2016) 5. Wong, W.C., Maurer Stroh, S., Eisenhaber, F.: More than 1,001 problems with protein domain databases: transmembrane regions, signal peptides and the issue of sequence homology. PLoS Comput. Biol. 6(7), e1000867 (2010) 6. Besharati, A., Jalali, M.: Multiple sequence alignment using biological features classification. In: International Congress on Technology, Communication and Knowledge (ICTCK), Mashhad, pp. 1–5 (2014) 7. Razmara, J., Deris, S.B., Parvizpour, S.: Text-based protein structure modeling for structure comparison. In: International Conference of Soft Computing and Pattern Recognition, pp. 490–496 (2009) 8. Naznin, F., Sarker, R., Essam, D.: Progressive alignment method using genetic algorithm for multiple sequence alignment. IEEE Trans. Evol. Comput. 16, 615–631 (2012) 9. Aniba, M.R., Poch, O., Thompson, J.D.: Issues in bioinformatics benchmarking: the case study of multiple sequence alignment. Nucleic Acids Res. 38, 7353–7363 (2010) 10. Katoh, K., Kuma, K., Toh, H., Miyata, T.: MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 33, 511–518 (2005) 11. Hong, C., Tewfik, A.H.: Heuristic reusable dynamic programming: efficient updates of local sequence alignment. IEEE/ACM Trans. Comput. Biol. Bioinform. 6, 570–582 (2009) 12. Fu, H., Xue, D., Zhang, X., Jia, C.: Conserved secondary structure prediction for similar highly group of related RNA sequences. In: Control and Decision Conference, pp. 5158–5163 (2009)
WOAMSA: Whale Optimization Algorithm for MSA of Protein Sequence
139
13. Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970) 14. Feng, D.F., Dolittle, R.F.: Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J. Mol. Evol. 25, 351–360 (1987) 15. Mohsen, B., Balaji, P., Devavrat, S., Mayank, S.: Iterative scheduling algorithms. In: IEEE INFOCOM Proceedings (2007) 16. Pop, M., Salzberg, S.L.: Bioinformatics challenges of new sequencing technology. Trends Gene. 24, 142–149 (2008) 17. Mirjalili, S., Lewis, A.: The whale optimization algorithm. Adv. Eng. Softw. 95, 51–67 (2016) 18. Cheng, G., Lv, C., Yan, S., Xu, L.: A novel hybrid optimization algorithm combined with BBO and PSO. In: 2016 Chinese Control and Decision Conference (CCDC), Yinchuan, pp. 1198–1202 (2016) 19. Ganesan, T., Vasant, P., Elamvazuthi, I., Shaari, K.Z.K.: Multiobjective optimization of green sand mould system using DE and GSA. In: 2012 12th International Conference on Intelligent Systems Design and Applications (ISDA), Kochi, pp. 1012–1016 (2012) 20. Ping, G., Chunbo, X., Yi, C., Jing, L., Yanqing, L.: Adaptive ant colony optimization algorithm. In: 2014 International Conference on Mechatronics and Control (ICMC), Jinzhou, pp. 95–98 (2014) 21. Neshich, G., Togawa, R., Vilella, W., Honig, B.: STING (Sequence to and with In graphics). PDB viewer. Protein Data Bank Quart. Newslett. 85, 6–7 (1998) 22. Kumar, M., Om, H.: A hybrid bio-inspired algorithm for protein domain problems. In: Shandilya, S., Shandilya, S., Nagar, A. (eds.) Advances in Nature-Inspired Computing and Applications, Chap. 13, pp. 291–311. Springer, Cham (2018) 23. Kumar, M.: An enhanced algorithm for multiple sequence alignment of protein sequences using genetic algorithm. EXCLI J. 14, 1232–1255 (2015) 24. Zemali, E.A., Boukra, A.: A new hybrid bio-inspired approach to resolve the multiple sequence alignment problem. In: 2016 International Conference on Control, Decision and Information Technologies (CoDIT), St. Julian’s, pp. 108–113 (2016)
Website Information Architecture of Latin American Universities in the Rankings Carmen Luisa Vásquez1(&), Marisabel Luna-Cardozo1, Maritza Torres-Samuel2, Nunziatina Bucci1, and Amelec Viloria Silva3(&) 1
Universidad Nacional Experimental Politécnica Antonio José de Sucre, Av. Barquisimeto, Venezuela {cvasquez,mluna,nbucci}@unexpo.edu.ve 2 Universidad Centroccidental Lisandro Alvarado, Barquisimeto, Venezuela [email protected] 3 Universidad de La Costa (CUC), Calle 58 # 55-66, Atlantico, Baranquilla, Colombia [email protected]
Abstract. In the 2019 editions of SIR, QS, ARWU and Webometric university rankings, after the process of evaluating and positioning through academic and research quality criteria established in their methodologies, 22 Latin American university institutions located in Brazil, Mexico, Chile, Argentina, and Colombia coincide in their positions in all the rankings considered for the study. This paper intends to characterize the website main menu design of these universities to describe and enlist the information available to users as an alternative to make their academic and scientific processes visible. Results highlight that 18% of the options of the main menu refer to library, academic life, institutional mail services, language translator, search engine, and information to visitors. Keywords: University Rankings QS Webometrics
Latin America ARWU SIR scimago
1 Introduction Increasing visibility of academic and scientific processes is a great challenge faced by universities today [1–5], particularly in relation to knowledge generation and the achievement of their researchers [6, 7]. In this sense, university website design, with menus, navigation bars, and frames offer a board to access information on academic processes, teaching activity, research, innovation and extension products, transparency, services such as digital libraries, email and social networks, available to teachers, students, graduates, administrative and support staff, and visitors. This study characterizes the website main menu of Latin American universities positioned in the SIR, QS, ARWU and Webometrics rankings, 2019 editions, to describe and list the information available to users. Users include scholars and university students, but also visitors in general. For this purpose, the positioning of these
© Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 140–147, 2020. https://doi.org/10.1007/978-3-030-37218-7_16
Website Information Architecture of Latin American Universities in the Rankings
141
universities in the respective website of the following four world rankings was reviewed: – Academic Ranking of World Universities (ARWU), 2019 edition [8]. – Scimago Institutions Rankings (SIR) Latin America, 2019 edition [9]. – The QS World University Rankings, QS Latin America University Rankings, 2019 edition [10]. – Ranking Web of Latin American Universities (Webometric), July 2019 edition [11].
2 Visibility and Internationalization of Universities Internationalization is a key factor for enhancing quality, efficiency, excellence, and competitiveness of universities in a global environment. It is based on the dissemination and communication of knowledge generated by researchers; hence their institutional strategy is linked to policies of excellence and reputation in the global market of knowledge generation, dissemination, and transfer [12]. University institutions, in order to promote and position themselves in the national and international market, and to guarantee its economic viability, require to develop communication and information skills to attract teachers, students, and economic resources [13]. In this way, those responsible for institutional/external relations of Spain universities give 82% of importance to quality and impact of promotion and web marketing instruments, considering it as a quality criterion for the university internationalization [13]. Similarly, UNESCO (2009) states that results of scientific research should be more widely disseminated through ICTs, also supported on free access to scientific documentation [14]. Thus, in an environment of globalized higher education, universities face challenges with respect to the visibility of their academic and scientific processes with the integration of ICT [15], and the governing authorities are aware of the importance of increasing their international profile, strategies and internationalization activities. One of these challenges is the assessment and positioning of the academic and scientific community through international rankings with different classification criteria according to their quality, which ratifies the importance of scientific and academic web visibility for higher education institutions.
3 Web Indicators and the Presence of Universities on Internet Cybermetrics is an emerging discipline that applies quantitative methods to describe the processes of scientific communication on the Internet, the contents of the site, their interrelationships, and the consumption of that information by users, the structure and use of search tools, the invisible Internet or the particularities of email-based services [15]. The fundamental measurement tool for this purpose is the use of so-called indicators, which can be used in combination with bibliometric equivalents for describing
142
C. L. Vásquez et al.
different aspects of the academic and scientific communication processes. There are three main groups of web indicators for cybermetric analysis: descriptive measures that count the number of objects found in each website (pages, average or rich files, link density); visibility and impact measures that count the number and origin of external incoming links, such as the famous Google PageRank algorithm; and popularity measures which take into account the number and characteristics of visits to websites. 3.1
Content Indicators
The main indicators are those that describe the volume of content published on the site. They can measure the number and size of computing objects found in each site, but the second data is irrelevant since it depends on factors linked to the format and not to the content. 3.2
Visibility and Impact Indicators
Citation analysis techniques can be applied to the description of the global scenario. The measure of visibility is given by the number of external links (from third parties) received by a domain. 3.3
Popularity Indicators
Information consumption can be measured by counting the number and describing the characteristics of users and visitors to a site. Relative values could be provided by a popularity ranking such as Alexa [16]. This information, in addition to the global position, can be used in comparative studies.
4 Information Architecture (IA) Information architecture (IA) is the result of organizing, classifying, and structuring the contents of a website so that users can meet their goals with the least possible effort. Its main purpose is to facilitate, as much as possible, the understanding and assimilation of information, as well as the tasks performed by users in a specific information space [17, 18]. A fundamental component in the website’s IA is the home page. It is the showcase or the front door of the institution. Although it must share the same style of interior pages, it must show the user where it is and the organization of the website (sections and navigation, layout of menus). It is, therefore, the starting point in the navigation scheme of the site, so it must be clearly and unequivocally organized. The home page is so important that the user must be given access to it with a single click from any interior page. It is recommended to provide a search option. The functions of the home page are as follows: (a) Show the user’s location. (b) Show news.
Website Information Architecture of Latin American Universities in the Rankings
143
(c) Provide access to the information required by the user. Another key element in IA is navigation, so all interfaces must provide the user with answers to three questions: Where am I? Where do I come from? Where can I go? [17].
5 Information Architecture (IA) of Latin America Universities Positioned in Sir Csimago, Qs, Arwu and Webometrics Rankings Among the world rankings that evaluate universities, especially for their scientific and academic quality are [19]: a. Academic Ranking of World Universities (ARWU) of Shanghai Jiao Tong University [8]. This ranking conceives higher education as equivalent to scientific research, valuing, among other factors, prestige, opinion of peers, research, and the obtaining of Nobel prizes by professors and researchers. It ranks only the first 800 universities recognized by this Ranking. b. Webometrics Rankings of World Universities of the Cybermetrics Laboratory of the CSIC in Spain [11]. With semi-annual publishing, it considers the productivity and effect of university academic products uploaded to the Internet. c. SIR Ranking (SCimago Institutions Rankings). Published annually by SCimago Grupo [20], it takes into account variables such as production and research, innovation and social impact, and aspects associated with the institution’s position in the global, Ibero-American, Latin American and national contexts, the percentage of the institution’s production that has been published in collaboration with foreign institutions, and the impact of research, among others. 22 universities that coincide in the Latin American rankings were identified. As shown in Fig. 1, approximately 55% of them correspond to Brazilian universities, with USP [21] ranking first in three (3) of the four (4) rankings. Universities in Chile and Argentina with 13% of the ranked universities. Finally, Colombia and Mexico, with 10% each. In order to categorize the information in the main menu options, the information was grouped into 16 items, the nomenclature of which is shown in Table 1. Figures 2 and 3 show the percentage of information obtained from main menu and its representation in Pareto diagram, respectively. It can be noted that the highest percentage corresponds to Library and Services (18%), followed by Teaching, Extension, and Community with 15% each, and 13% of institutional information.
144
C. L. Vásquez et al. BRA 12 10 8 6 4
COL
MEX
2 0
ARG
CHL
Fig. 1. University positions in SIR Csimago, QS, ARWU y Webometrics Rankings in 2019 editions
Table 1. Indicators and nomenclature of main menu options of universities positioned in SIR Csimago, QS, ARWU and Webometrics Rankings No 1 2
NOM BEG BIB
ITEM Home Library and other services Communication
3
COM
4
E&C
5 6
STU STI
7 8 9
GRA INS INT
10
RES
11 12
MAN NUM
13
NOP
14
TEA
Extension and Community Students Student admission Egresados Institutional International Cooperation Research and Innovation Management University figures Number of options Teaching
15 16
TES TRA
Teachers Transparency
CONTAINS Home option Library, academic life, email service, language translator, search engine and information to visitors Communication media, TV channels, Radio, press, and ICT Engaged organizations, culture, sport, relationship with community and others Student life Student admission Information and services to graduates Institutional information, directors, faculties and others International Cooperation, foreign visitors and others Research, innovation and related information Strategic Planning, Biddings and others Figures on students, teachers and others Number of options in the main menu Undergraduate study offers, postgraduate, careers and others Teachers information Transparency
Website Information Architecture of Latin American Universities in the Rankings
145
BIB TEA
4% 4%
1% 1% 1% 2% 4%
E&C
18%
INS RES STU
4%
STI
4%
15%
4%
MAN INT BEG COM
10% 15% 13%
TES GRA NUM TRA
Fig. 2. Average distribution in categories according to number of options in main menu of Latin American universities in SIR csimago, QS, ARWU y Webometrics Rankings
Fig. 3. Pareto diagram with information obtained from the number of options in main menu of Latin American universities in SIR Csimago, QS, ARWU y Webometrics Rankings
146
C. L. Vásquez et al.
6 Conclusions An alternative for universities to show their processes and academic results, research, extension, transparency, services such as libraries, email, social networks, and internationalization is to design the website with menus, navigation bars, and frames that allow access to institutional information for teachers, students, graduates, administrative staff, and visitors in general. This study categorized the information from the main menu from the website of 22 Latin American university institutions located in Brazil, Mexico, Chile, Argentina, and Colombia, which are positioned in the SIR, QS, ARWU and Webometric university rankings, in the 2019 editions. Among the results, 60% of institutional information is distributed in this way: 18% of the options in the main menus of the Latin American universities contain information on library, academic life, institutional mail services, language translator, search engine, and information for visitors. Additionally, 15% of these options refer to information about teaching in these institutions as well as another 15% about extension and community. In 13% of the options, institutional information stands out.
References 1. Aguillo, I., Uribe, A., López, W.: Visibilidad de los investigadores colombianos según sus indicadores en Google Scholar y ResearchGater: Diferernicas y similitudes scon la clasificación oficial del sistema nacional de ciencia-COLCIENCIAS. Rev. Interam. Bibliot 40(3), 221–230 (2017) 2. Gómez, R., Gerena, A.: Análisis de 5 universidades colombianas con mayor porcentaje de investigaciones publicadas en revistas de primer cuartil según el SIR Iber 2014. Bibliotecas 35(3), 1–31 (2015) 3. Mejía, L., Vargas, N., Moreno, F.: Visibilidad de la investigación científica en las universidades pertenecientes a la AUSJAL: Caracterización de los sitios Web. Salutem Scientia Spiritus 2(1), 10–20 (2016) 4. Torres-Samuel, M., Vásquez, C., Viloria, A., Lis-Gutierrez, J., Crissien, T., Valera, N.: Web visibility profiles of Top100 Latin American Universities. Lecture Notes in Computer Sciencie (Including subseries Lecture Notes in Artificial Intelligent and Lecture Notes of Bioinformatics) (2018) 5. Vásquez, C., Torres-Samuel, M., Viloria, A., Crissien, T., Valera, N., Gaitán-Angulo, M., Lis-Gutierrez, J.: Visibility of research in universities: the triad producto-researcherinstitution. Case: Latin American Contries. de Lecture Notes in Computer Science (Including subseries Lecture Notes in Artificial Intelelligent and Lecture Notes in Bioinformatics) (2018) 6. Vílchez-Román, C., Espíritu-Barrón, E.: Artículos científicos y visibilidad académica: combinación impostergable y oportunidad que debe aprovecharse, Biblios, p. 9 (2009) 7. Henao-Rodríguez, L.-G.J., Gaitán-Angulo, M., Vásquez, C., Torres-Samuel, M., Viloria, A.: Determinants of researachgate (RG) score for the TOP100 of Latina American Universities at Webometrics. Communications in Computer and Informations Science (2019) 8. ARWU: Shanghai Jiao Tong University Ranking (2019). http://www.shanghairanking.com/ es/. accessed 01 Sept 2019 9. S.R. Group: SIR IBER 2019. SCImago Lab, Barcelona, España (2019)
Website Information Architecture of Latin American Universities in the Rankings
147
10. QSTOPUNIVERSITIES: QS World University Rankings. QS Quacquarelli (2019). https:// www.topuniversities.com/qs-world-university-rankings. accessed 01 Sept 2019 11. Webometrics: Web-Webometrics Ranking (2019). http://www.webometrics.info/es/Latin_ America_es. accessed 10 Aug 2019 12. de Wit, H., Gacel-Ávila, J., Knobel, M.: Estado del arte de la internacionalización de la educación superior en América Latina. Revista de Educación Superior en América Latina ESAL 2, 1–4 (2017) 13. Haug, G., Vilalta, J.: La internacionalización de las universidades, una estrategia necesaria: Una reflexión sobre la vigencia de modelos académicos, económicos y culturales en la gestión de la internacionalización universitria, Fundación Europea Sociedad y Educación, Madrid (2011) 14. UNESCO: Conferencia mundial sobre educación superior 2009. Nuevas dinámicas de la educación superior y de la investigación para el cambio social y el desarrollo, UNESCO, Paris, (2009) 15. Aguado-López, E., Rogel-Salazar, R., Becerrill-Garcia, A., Baca-Zapata, G.: Presencia de universidades en la Red: La brecha digital entre Estados Unidos y el restyo del mundo. Revista de Universidad y Sociedad del Conocimeinto 6(1), 4 (2009) 16. Amazon: Alexa, Amazon (2019). www.alexa.com. accessed 01 Feb 2019 17. Romero, B.: Usabilidad y arquitectura de la informaición. Universidad Oberta de Cataluña, Barcelona, España (2010) 18. Morville, P., Rosenfeld, L.: Information Architecture for the World Wide Web: Desining Large-Scale Web Sites. O’Really, Sebastopol (2006) 19. Torres-Samuel, M., Vásquez, C., Viloria, A., Hernández-Fernandez, L., Portillo-Medina, R.: Analysis of patterns in the university Word Rankings Webometrics, Shangai, QS and SIRScimago: case Latin American. Lecture Notes in Computer Science (Including subseries Lecture Notes in Artificial Intelligent and Lecture Notes in Bioinformatics) (2018) 20. S.I. Rankings: SCImago Institutions Rankings, SCImago Lab & Scopus (2019). https:// www.scimagoir.com/. accessed 08 Aug 2019 21. USP: Universidade de Sao Paulo (2019). https://www5.usp.br/. accessed 25 Aug 2019
Classification of Hybrid Multiscaled Remote Sensing Scene Using Pretrained Convolutional Neural Networks Sujata Alegavi(&) and Raghvendra Sedamkar Thakur College of Engineering and Technology, Thakur Village, Kandivali - (East), Mumbai 400101, Maharashtra, India {sujata.dubal,rr.sedamkar}@thakureducation.org
Abstract. A novel hybrid multiscaled Remote Sensed (RS) image classification method based on spatial-spectral feature extraction using pretrained neural network approach is proposed in this paper. The spectral and spatial features like colour, texture and edge of RS images are extracted by using nonlinear spectral unmixing which is further scaled by using bilinear interpolation. The same RS image in parallel path is first scaled using bilinear interpolation and further spectrally unmixed. These two paths are further fused together using spatialspectral fusion to give multiscaled RS image which is further given to a pretrained network for feature extraction. For authentication and discrimination purposes, the proposed approach is evaluated via experiments with five challenging high-resolution remote sensing data sets and two famously used pretrained network (Alexnet/Caffenet). The experimental results provides classification accuracy of about 98% when classified at multiscale level compared to 83% when classified at single scale level using pretrained convolutional networks. Keywords: Multiscale remote sensed images Convolutional neural networks Spatial-Spectral analysis Pretrained networks
1 Introduction In recent era there is a major boom in the earth observation programs has resulted in easier collection of high resolution imagery for various applications. Most of the research is done in developing a robust feature representation map in different domains. Scene classification at low-level or mid-level works good on some remote sensing scenes but with complex spatial and spectral layouts these techniques also fails for achieving a good classification accuracy. In Previously each CNN network was trained fully for a defined problem [4, 6, 7]. To overcome these aforementioned problems, existing CNN models which are trained on large datasets are transferred to other recognition problems [7, 8]. Training a completely new CNN have not yielded a better classification accuracy compared to using a pretrained network for remote sensing scene classification. In case of few hundreds of training samples training a fully new CNN becomes much more difficult. This transfer learning helps in forming a feature representation. The novelty of this paper is: © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 148–160, 2020. https://doi.org/10.1007/978-3-030-37218-7_17
Classification of Hybrid Multiscaled Remote Sensing Scene
149
(1) We present a multiscale feature extraction of a RS image in spatial and spectral domain. (2) We propose the use of a Pretrained network for forming feature discrimination map.
2 Related Works A complete CNN network consists of many such blocks, followed by many fully connected layer and a final classifier layer. For an input data X and a CNN with n computational blocks, the output from the last block is given by [7]. fLðXÞ ¼
X
Xi Wi þ bi
ð1Þ
where Xi, Wi and bi represent input, weights, and bias respectively. * is the convolution operation. In remote sensing there are many applications developed by exploiting the spatial-spectral information of high resolution RS imagery. As RS images are diverse in nature and there are many discriminative features at different layers, mixed pixel problems are very common in a single RS layer [1]. More SpectralSpatial information of RS images are applied to CNN to extract the structural information [10–13]. Famous Pretrained networks which have achieved state-of the-art performance can be fine tuned for a given task without altering the parameters of the network [7, 8] fined-tuned networks in [7].
3 Proposed Work Figure 1, shows a diagrammatic representation for hybrid classification of multiscale SAR/Hyperspectral images. As remotely sensed images are diverse in nature, hence while classification many aspects needs to be considered for precise classification. Currently many features of RS images are considered such as edge, colour, texture, etc. but they all are considered in single scale [5]. This is convenient in case of normal images, whereas in case of RS images lot of information can be yielded when the information is extracted at a multiscale level which helps in precise classification of the images. Mapping of similar regions with higher accuracy is possible with extraction of features at multiscale level. The above approach deals with classification of RS images at multiscale level using various different features like, Texture (Contrast, Correlation, Energy & Homogenity), Edge (Edge Histogram) and Wavelets (Daubechies 1 wavelets). The Multiscale analysis is done in two different ways to understand the changes in the classification results and a comprehensive analysis is being done to come up with the best solution proving multiscale analysis to be better as compared to single scale analysis.
150
S. Alegavi and R. Sedamkar
Fig. 1. Block diagram for hybrid classification of hybrid multiscale RS images using pretrained convolutional neural network model
1. In the first phase a RS image is directly fed to the pretrained network for classification and classes are determined. 2. In the second phase, low resolution RS image is taken which is processed in two parallel paths. a. In the first path, Spectral features of the RS images are derived by non linear spectral unmixing technique assuming that, the end member components are randomly distributed throughout the Instantaneaous Field of View (IFOV) causing multiple scattering effect. f ðx; yÞ ¼ F½M; aðx; yÞ þ nðx; yÞ
ð2Þ
where, M is number of end members and a(x, y) is the spectral coefficient. Further Bilinear Interpolation is applied to this spectrally unmixed images for achieveing scaling at multiple levels. f ðx; yÞ ¼
XX
aij xi yj
ð3Þ
where, a is the coefficient. Texture (Contrast, Correlation, Energy & Homogenity), Edge (Edge Histogram) and Wavelets (Daubechies 1 wavelets) are used to extract features from these multiscaled images. b. In the second parallel path, Bilinear Interpolation is applied to the RS image for scaling the image at multiple levels using Eq. (3) and then Spectral features of the scaled images are derived by non linear spectral unmixing technique using Eq. (2). Texture (Contrast, Correlation, Energy & Homogenity), Edge (Edge Histogram) and Wavelets (Daubechies 1 wavelets) are used to extract features from these multiscaled images.
Classification of Hybrid Multiscaled Remote Sensing Scene
151
3. Further these spectral and spatial features are fused together by a weighted parameter using integration. 4. This fused image is further given to the pretrained network (Alexnet/Caffenet) for further feature extraction. 5. Finally class allocation is done by soft max classifier which is used at the end of the fully connected network in the pretrained network. For getting the spatial values, RS images are first unmixed in the spectral domain to form fractional images further, bilinear interpolation is applied to these fractional images to get the spatial characteristics of these images. X ¼ a1s1 þ a2s2 þ . . .: þ aMsM þ n
ð4Þ
¼ Raisi þ w ¼ Sa þ w
ð5Þ
Where M is the number of endmembers, S is the matrix of endmembers, and n is an error term for additive noise. The predicted values in spatial domain are given by [15], Pvspa ðkjÞ ¼ ðk ¼ 1; 2; . . .. . .::; KÞ ðj ¼ 1; 2; . . .::; MÞ
ð6Þ
Where Pvspa is the predicted values in spatial domain in x and y direction, which are derived from spectral unmixing followed by bilinear interpolation are selected as spatial terms. Spectral terms are derived by first deriving high resolution RS imagery from low resolution RS imagery using bilinear interpolation. For getting the spectral values, RS images are first scaled using bilinear scaling and further fractional images are generated using spectral unmixing. The predicted values in spectral domain are given by [15]: Pvspe ðkjÞ ¼ ðk ¼ 1; 2; . . .::; KÞ ðj ¼ 1; 2; . . .. . .:; MÞ
ð7Þ
Further the two predicted values Pvspa(kj) and Pvspe(kj) are integrated using the weight parameter w to obtain the finer multiscale values Pv(kj) [15]. The following equation is used for calculation of multiscaled images, PvðkjÞ ¼ w PvspeðkjÞ þ ð1 wÞPvspaðkjÞ
ð8Þ
The linear optimization technique is employed to transform the finer multiscale values Pv(kj) into a hard classified land cover map at a multiscale level. Once single RS image is converted to multiscale image we apply this multiscaled image to a pretrained network for further classification.
152
S. Alegavi and R. Sedamkar
4 Result Analysis and Discussions The experiments are implemented using MATLAB 2017a, and the platform has X64 based PC, Intel (R) Core (TM), i5 processor – 7400 CPU @ 3.00 GHz, 3001 MHz, 4 cores CPU, 8 GB RAM, NVIDIA Titan XP GPU and Windows 10 Pro operating system. The database consists of scenes from Indian_pines_corrected, JasperRidge2_F198, Jasperridge2_F224, SalinasA_corrected and Urban_F210. Original and Multiscale RS images are applied to pretrained convolutional neural networks. Alexnet and Caffenet Pretrained networks are used. Alexnet is the winner of Imagenet Large Scale Visual Recognition Competition 2012. Architecture of Alexnet [14] and Caffenet [16] is similar to each other only with a difference of multiple GPU used in Alexnet and a single GPU used in Caffenet. These Pretrained networks have eight layers with five convolutional layers and three fully connected layers and the last layer is the soft max classifier. Different tests are carried out by varying different parameters to test the results using Pretrained CNN. Different parameters are calculated and the results precisely shows that, when features of images are derived at multiscale level yields very good recognition rate compared to features of single scale images. Further pretrained CNN network enhances the overall classification results compared to other classification techniques. The results show that, with the use of convolutional neural networks the classification accuracy improves for multiscale RS images compared to original RS images. For class 1 total of 200 images are considered, for class 2 total of 205 images are considered, for class 3 total of 198 images are considered, for class 4 total of 197 images are considered and for class 5 total of 194 images are considered for training and testing purpose. 4.1
Results of Training and Testing of Original and Multiscale RS Images on Alexnet Pretrained Network
Phase - I deals with applying of Original and Multiscale RS image to Alexnet network. The following table shows confusion matrix for all the five classes. Table 1. Confusion matrix of original RS image applied to Alexnet pretrained network
Classification of Hybrid Multiscaled Remote Sensing Scene
153
Table 1 shows, classification accuracy for each individual class. We can clearly observe that, for class1, class4 and class 5, the classification accuracy is 100%. The overall correct classification is 83.21% whereas the incorrect classification is 16.79%. Total time taken for training per epoch is 312 s. We can clearly see from the confusion matrix that, class 1, 4 and 5 are correctly classified but class 2 and 3 are incorrectly classified using Alexnet network. Table 2. Confusion matrix of hybrid multiscaled RS image applied to Alexnet pretrained network
Table 2 shows, classification accuracy for each individual class. We can clearly observe that, for class2, class4 and class 5, the classification accuracy is 100%. The overall correct classification is 98.32% whereas the incorrect classification is 1.68%. Total time taken for training per epoch is 312 s. We can clearly see from the confusion matrix that, class 2, 4 and 5 are correctly classified but class 1 and 3 are incorrectly classified using Alexnet network. 4.2
Results of Training and Testing of Single Scale and Multiscale Model of RS Images on Caffenet Pretrained Network
Phase - II deals with applying of Original and Multiscale RS image to Caffenet network. The following table shows confusion matrix for all the five classes. Table 3 shows, classification accuracy for each individual class. We can clearly observe that, for class1, class 2, class4 and class 5, the classification accuracy is 100%. The overall correct classification is 78.65% whereas the incorrect classification is 21.34%. Total time taken for training per epoch is 315 s. We can clearly see from the confusion matrix that, class 1, 2, 4 and 5 are correctly classified but class 3 is incorrectly classified using Caffenet network. Table 4 shows, classification accuracy for each individual class. We can clearly observe that, for class2, class4 and class 5, the classification accuracy is 100%. The overall correct classification is 98.32% whereas the incorrect classification is 1.68%. Total time taken for training per epoch is 315 s. We can clearly see from the confusion
154
S. Alegavi and R. Sedamkar
Table 3. Confusion matrix of original RS image applied to Caffenet pretrained network
Table 4. Confusion matrix of hybrid multiscaled RS image applied to Caffenet pretrained network
matrix that, class 2, 4 and 5 are correctly classified but class 1 and 3 are incorrectly classified using Caffenet network.
Table 5. Comparison of classification accuracy of original and Multiscaled RS images using Alexnet and Caffenet pretrained networks Classification accuracy Original image Multiscaled image Alexnet 83.213 98.321 Caffenet 78.657 98.321
Classification of Hybrid Multiscaled Remote Sensing Scene
155
Classification Accuracy
It can be clearly seen from the above two confusion matrix, that the classification accuracy increases by 15% for hybrid multiscaled RS image as compared to the original image when applied to Alexnet pretrained network and the classification accuracy increases by 20% for hybrid multiscaled RS image as compared to the original image when applied to Caffenet pretrained network (Fig. 2).
Classification Accuracy for Original and Multiscaled RS Images using Alexnet and Caffenet pretrained networks 150 100
83.213
98.321
78.657
98.321
50 0
Original Image
Multiscaled Image
Pretrained Networks Alexnet Caffenet
Fig. 2. Comparison of classification accuracy of original and multiscaled RS images using Alexnet and Caffenet pretrained networks
The above figure shows comparative analysis of RS images whose features are extracted at different scales. These multiscaled images are trained using fine tuned seven pretrained CNN models in our proposed approach, to implement classification. Besides, we evaluate the performance of original image and multiscaled image on both of the pretrained networks that are widely used for classification. Then, the feature vectors are directly fed into Softmax classifier without dimensionality reduction. Table 5, presents the experimental results on SAR/Hyperspectral datasets obtained by our proposed approach.
Table 6. Comparison between training and testing time for classification of Original and Multiscaled RS image using Alexnet and Caffenet pretrained networks Time taken Original RS image Multiscaled RS image Training time per epoch 312 315 Testing time 156 158
It, can be clearly seen from table no. 6, that the time required for training and testing of original and multiscaled image is nearly the same, with increased
156
S. Alegavi and R. Sedamkar
Training & Testing Time (secs)
classification accuracy as in case of multiscaled RS image compared to classification accuracy of original RS image (Fig. 3).
Training & Testing time for classification of Original & Multiscaled RS image (Alexnet & Caffenet Networks) 400 300 200 100 0
315
312 156
158
Original RS Image Multiscaled RS Image RS Image Training Time per Epoch
Testing Time
Fig. 3. Training & testing time for classification of original and multiscaled RS Images using Alexnet and caffenet pretrained networks
Thus, we can conclude that, by classifying RS image at different scales (multiscaled) gives us better classification accuracy over single scale with nearly the same training and testing time as that of original RS image. According to the performances with Alexnet and Caffenet pretrained CNN models, we can easily conclude that images taken at different scales performs well compared to images taken at a single scale. Multiscaled images outperform all the other single scale parameters for the two pretrained network. Overall, the proposed method achieves competitive accuracies (more than 97%) for Alexnet and Caffenet pretrained models (Table 7). Table 7. Performance comparison of the state-of-art methods with our proposed method Methods Spatial BOW [19] SIFT + BOW [8] SC + Pooling [20] SPM [21] PSR [21] Caffenet [22] Googlenet [22] GBRCN [23] Caffenet + fine tuning [23] Googlenet + fine tuning [23] Multiscale + Alexnet (Our Proposed Method) Multiscale + Caffenet (Our Proposed Method)
Classification accuracy % 81.19 75.11 81.67 ± 1.23 86.8 89.1 95.02 ± 0.81 94.31 ± 0.89 94.53 95.48 97.10 98.32 98.32
Classification of Hybrid Multiscaled Remote Sensing Scene
157
We can very well see from the above graph that all CNN based methods have achieved higher classification accuracy compared to the traditional classification methods. The CNN models were trained with 60% of data for training and remaining 40% of data was used for testing. Pretrained networks were used for feature extraction instead of designing a whole new CNN for classification purpose to save on computational complexity while maintain the good results. We can also see from the above graph that, transfer learning was used by fine tuning pretrained caffenet and googlenet networks for classification as reported in [23]. Our proposed method uses transfer learning by fine tuning Alexnet and Caffenet network with input as multiscale features which yields good classification results in our case compared to the existing sate-of-art methods (Fig. 4).
Classification Accuracy (%)
Comparison of state-of-art methods with our proposed solution 120 100 80 60 40 20 0
Classification Methods
Fig. 4. Comparison of the proposed method with the state-of-art methods
A comparison was made between the performances when using different feature vectors at multiscale level for the pretrained networks. Dataset was fine tuned to understand the impact of original data and multiscaled data on the pretrained network (Alexnet & Caffenet). The results were compared after pretraining the data and classification accuracy was calculated for both the pretrained networks. The results suggest that, Multiscaling RS images increases classification accuracy as compared to using original RS image. Moreover using pretrained CNN improves the classification error in
158
S. Alegavi and R. Sedamkar
turn it also improves the recognition rate when there is a small amount of training data. This is consistent for all data sets. Larger datasets can also be tested successfully using CNN. More number of feature vectors can be trained at different scales. The result are positive, because when there is more training data available for the target data set, more of the variability in the data is captured and features can be trained from scratch to be robust to the variability.
5 Conclusion This paper focuses on use of pretrained networks for feature extraction of remotely sensed images and presents a multiscale model for classification of RS images. The proposed multiscale model extracts deep spatial and spectral features of an RS image before giving it to the Pretrained network for building a feature representation map. In comparison with the existing scene classification techniques which uses the original RS image at a single scale for classification the proposed model uses multiscaled layers of an RS images for building the feature map for further classification. Experimental results on five test datasets shows better classification accuracy as compared to the traditional methods. In addition, it demonstrates that, classification accuracy actually improves by using pretrained neural networks whenever there is a small amount of training data, instead of training a CNN from the scratch. Thus, it is clearly seen that, whenever multiscaled images are used instead of images taken at a single scale for feature extraction and given to a pretrained network they outperform all the other methods in terms of classification accuracy and the time required for training and testing is nearly the same for both the approaches. The proposed model can be combined with post-classification processing to enhance mapping performance. Acknowledgements. This work is supported in part by NVIDIA GPU grant program. We thank NVIDIA for giving us Titan XP GPU as a grant to carry out our work in deep learning. We also thank the anonymous reviewers for their insightful comments. Compliance with Ethical Standards. ✓ All authors declare that there is no conflict of interest ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Alegavi, S., Sedamkar, R.R.: Improving classification error for mixed pixels in satellite images using soft thresholding technique. In: Proceedings of IEEE Conference in Internatıonal Conference on Intellıgent Computıng and Control Systems (2018) 2. Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015) 3. Zhang, L., Zhang, L., Du, B.: Deep learning for remote sensing data: a technical tutorial on the state of the art. IEEE Geosci. Remote Sens. Mag. 4(2), 22–40 (2016)
Classification of Hybrid Multiscaled Remote Sensing Scene
159
4. Vakalopoulou, M., Karantzalos, K., Komodakis, N., Paragios, N.: Building detection in very high resolution multispectral data with deep learning features. In: Proceedings of IEEE International Geoscience Remote Sensing Symposium (IGARSS), Milan, Italy, July 2015, pp. 1873–1876 (2015) 5. Jiao, L., Tang, X., Hou, B., Wang, S.: SAR images retrieval based on semantic classification and region-based similarity measure for earth observation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 8(8), 3876–3891 (2015) 6. Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, June 2015, pp. 1–9 (2015) 7. Nogueira, K., Penatti, O.A.B., dos Santos, J.A.: Towards better exploiting convolutional neural networks for remote sensing scene classification. Pattern Recognit. 61, 539–556 (2017) 8. Hu, F., Xia, G.-S., Hu, J., Zhang, L.: Transferring deep convolutional neural networks for the scene classification of high-resolution remote sensing imagery. Remote Sens. 7(11), 14680– 14707 (2015) 9. Penatti, O.A.B., Nogueira, K., dos Santos, J.A.: Do deep features generalize from everyday objects to remote sensing and aerial scenes domains? In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Boston, MA, USA, June 2015, pp. 44–51 (2015) 10. Yue, J., Zhao, W., Mao, S., Liu, H.: Spectral–spatial classification of hyperspectral images using deep convolutional neural networks. Remote Sens. Lett. 6(6), 468–477 (2015) 11. Zhao, W., Du, S.: Learning multiscale and deep representations for classifying remotely sensed imagery. ISPRS J. Photogramm. Remote Sens. 113, 155–165 (2016) 12. Makantasis, K., Karantzalos, K., Doulamis, A., Doulamis, N.: Deep supervised learning for hyperspectral data classification through convolutional neural networks. In: Proceedings of International Geoscience Remote Sensing Symposium (IGARSS), Milan, Italy, July 2015, pp. 4959–4962 (2015) 13. Zhao, W., Du, S.: Spectral–spatial feature extraction for hyperspectral image classification: a dimension reduction and deep learning approach. IEEE Trans. Geosci. Remote Sens. 54(8), 4544–4554 (2016) 14. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of 26th Annual Conference on Neural Information Processing System (NIPS), Lake Tahoe, NV, USA, 2012, pp. 1097–1105 (2012) 15. Wang, P., Wang, L., Chanussot, J.: Soft-then-hard subpixel land cover mapping based on spatial-spectral interpolation. IEEE Geosci. Remote Sens. Lett. 13(12), 1851–1854 (2016) 16. Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of ACM Conference on Multimedia (MM), Orlando, FL, USA, November 2014, pp. 675–678 (2014) 17. Li, E., Xia, J., Du, P., Lin, C., Samat, A.: Integrating multilayer features of convolutional neural networks for remote sensing scene classification. IEEE Trans. Geosci. Remote Sens. 55(10), 5653–5665 (2017) 18. Windrim, L., Melkumyan, A., Murphy, R.J., Chlingaryan, A., Ramakrishnan, R.: Pretraining for hyperspectral convolutional neural network classification. IEEE Trans. Geosci. Remote Sens. 56(5), 2798–2810 (2018) 19. Yang, Y., Newsam, S.: Bag-of-visual-words and spatial extensions for land-use classification. In: Proceedings of 18th ACM SIGSPATIAL International Conference on Advances in Geographic Information System (GIS), San Jose, CA, USA, November 2010, pp. 270–279 (2010)
160
S. Alegavi and R. Sedamkar
20. Cheriyadat, A.M.: Unsupervised feature learning for aerial scene classification. IEEE Trans. Geosci. Remote Sens. 52(1), 439–451 (2014) 21. Chen, S., Tian, Y.: Pyramid of spatial relations for scene-level land use classification. IEEE Trans. Geosci. Remote Sens. 53(4), 1947–1957 (2015) 22. Xia, G.-S., et al.: AID: a benchmark data set for performance evaluation of aerial scene classification. IEEE Trans. Geosci. Remote Sens. 55(7), 3965–3981 (2017) 23. Zhang, F., Du, B., Zhang, L.: Scene classification via a gradient boosting random convolutional network framework. IEEE Trans. Geosci. Remote Sens. 54(3), 1793–1802 (2016)
Object Detection and Classification Using GPU Acceleration Shreyank Prabhu, Vishal Khopkar(&), Swapnil Nivendkar, Omkar Satpute, and Varshapriya Jyotinagar Veermata Jijabai Technological Institute, Mumbai 400019, India {svprabhu_b14,vgkhopkar_b14,ssnivendkar_b14, ohsatpute_b14,varshapriyajn}@ce.vjti.ac.in
Abstract. In order to speed up the image processing for self-driving cars, we propose a solution for fast vehicle classification using GPU computation. Our solution uses Histogram of Oriented Gradients (HOG) for feature extraction and Support Vector Machines (SVM) for classification. Our algorithm achieves a higher processing rate in frames per second (FPS) by using multi-core GPUs without compromising on its accuracy. The implementation of our GPU programming is in OpenCL, which is a platform independent library. We used a dataset of images of cars and other non-car objects on road to feed it to the classifier. Keywords: Graphics processer unit GPU Object detection processing HOG OpenCL Self-driving cars SVM
Image
1 Introduction Self-driving cars is the buzzword in today’s world. It uses multiple cameras to record images in all directions and needs to process them in real time, keeping in mind the speed of the car, which can be as high as 100 km/h especially on access-controlled roads. In order to achieve this, we need an algorithm that can process the images clicked in real time at as much high frames per second as possible with maximum accuracy. The motivation behind optimizing an object detection algorithm is the massive number of use cases which support local processing of data. With the ongoing trend of using cloud computing there are disadvantages associated with it, such as latency, costs of services, unavailability or failure of servers. This needs efficient processing of the images captured in the cameras of the vehicle – front, left, right and back, in order to achieve this there are several algorithms such as: • Histogram of Oriented Gradients • YOLO (You Look Only Once) • Fast YOLO Apart from this, algorithms such as histogram of linear binary patterns are also used. However, in order to achieve a good speed-accuracy trade-off we paralleled the histogram of oriented gradients (HOG) [1] algorithm on GPU using OpenCL, on open © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 161–170, 2020. https://doi.org/10.1007/978-3-030-37218-7_18
162
S. Prabhu et al.
source platform meant for heterogeneous computing i.e. the same algorithm will work on different GPU vendors such as AMD, Nvidia, Intel etc. as well as on CPU. Our code optimizes the object detection algorithm by parallelising it on GPU in order to decrease the time and increase number of frames per second. Our solution increases the current speed by 4–10 times depending on the hardware, without compromising on the accuracy. Paralellising HOG algorithm has provided efficient results in the case of pedestrian detection using CUDA on NVIDIA platforms, which has been shown with results in this paper [2]. Additionally, OpenCL and GPU computing has also been used for face detection and accelerated data mining [3–6]. The following sections give a brief idea on GPU Computing, OpenCL and HOG before explaining the actual approach adopted. Finally, we show a comparative study of different algorithms against ours as our result. 1.1
GPU Computing
The Time consuming and computing intensive parts of the code are offloaded to the GPU to accelerate applications running on the CPU. Remaining application can be processed on the CPU which can be linearly computed, the application runs faster because it uses the GPU cores to boost the performance. This is known as hybrid computing. 1.2
OpenCL
OpenCL [8] is a framework for writing programs for parallel processing on heterogenous vendor independent platforms like CPUs and GPUs. OpenCL improves the speed and efficiency of wide number of applications in numerous markets such as gaming, scientific and medical software as well as vision processing. OpenCL supports both task and data-based programming models. This language is suited to play an increasingly significant role in emerging graphics that use parallel compute algorithms. Our main purpose of using OpenCL is due to the heterogeneous nature which helps run our algorithm on different kind of systems, making it platform friendly. 1.3
Histogram of Oriented Gradients (HOG)
The steps for the algorithm in detail are as follows: 1. Extracting histograms from the image’s gradient and magnitude vectors 2. Normalising the consolidated histogram into a full feature vector The algorithm is fully explained alongside our implementation in Sect. 3.
2 Implementation Our entire code was written in two parts: 1. OpenCL kernel 2. A python code as a driver for OpenCL
Object Detection and Classification Using GPU Acceleration
163
The OpenCL kernel code is written in C as it has a store of functions that can be easily translated into GPU understandable code. The python code is the main program that is called, which acts as a driver and involves in creation and retrieval of arrays from the kernel code. It uses PyOpenCL, a computation API [9]. 2.1
Parallelizing HOG in OpenCL [10]
The histogram of oriented gradients algorithm comprises of different steps. The algorithm has a scope for parallelisation, since the operations it involves are performed on two dimensional arrays (matrices). In traditional sequential processing, the compiler will process each element of the array one by one. This, however, is time consuming, as images are large matrices. For the purpose of this project, our images are 64 64 pixels in size. Our kernel code has been implemented using three functions: 1. The first function extracts the magnitude and orientation, and later produces the histogram. 2. The second function creates the array to be normalized from the histogram obtained in step 1 and passes it to the third function as a vector of the entire image. 3. The third function normalizes the array obtained from step 2. This finally forms our feature vector for the given image. We adopted the block normalization approach. All three functions are explained in detail below: 2.1.1 Step 1 At first, the 2D matrix image is retrieved back from the one-dimensional array passed. The derivatives of the image dx and dy are found using the classic gradient operators, i.e. dx ¼
@f ðx; yÞ ¼ f ðx þ 1; yÞ f ðx 1; yÞ @x
ð1Þ
dy ¼
@f ðx; yÞ ¼ f ðx; y þ 1Þ f ðx; y 1Þ @y
ð2Þ
The next step is to create a 9-bin histogram for every 8 8 sub-image. This is also done parallelly, for every pixel, by determining which element of the histogram array the magnitude should be added to [11]. Algorithm 1 gives the list of computing steps.
164
S. Prabhu et al.
1. 2. 3. 4. 5. 6.
Input image (as a pointer to array) Get 2D matrix mat[y][x] = img[w*y + x] Find derivatives dx and dy magnitude_mat[y][x] = sqrt(dx[y][x]^2 + dy[y][x]^2) angle_mat[y][x] = atan(dy[y][x]/dx[y][x]) There will be w/8*h/8 histograms. Determine which histogram is responsible for the current pixel. 7. Populate the appropriate histogram element with the current pixel data. curr_ind = angle_mat[y][x]/20 rem = angle_mat[y][x] % 20 8. At the end, you get a histogram array of (w/8)*(h/8)*9 elements
Algorithm 1. Step 1: Finding the gradients and histogram of the image in HOG
2.1.2 Step 2 In order to normalise the histogram array, we used 16 16 block normalisation technique. Step 2 involves creation of the array which will be finally normalised in step 3. Therefore, the size of the entire vector obtained h 1 1 36 8 8 64 64 1 1 36 ¼ 8 8 ¼
w
ð3Þ
¼ 7 7 36 ¼ 1764 This is far larger than the size of the total size of the histogram array for the image. This is because during this process, the 16 16 blocks overlap with each other and the elements of the histogram array are present in the final array multiple times. Those on the corner blocks are repeated only once; those on the edges (but not corners) are repeated twice; while those in the interior are repeated four times in the final image. Figure 1 gives an illustration and algorithm 2 shows the steps.
Object Detection and Classification Using GPU Acceleration
165
Fig. 1. 16 16 blocks for histogram calculation
1. 2. 3. 4. 5. 6.
7.
Input consolidated vector from step 2 (tb_norm[(w/8 - 1)(h/8 -1)*36]) Let squared_sum[36] be an array (initialised with 0s). normal[i] = tb_norm[i]*tb_norm[i] sub_number = i/36 squared_sum[i/36] += normal[i] if(sub_number == 0) normal[i] = 0 else normal[i] = tb_norm[i]/(sqrt(squared_sum[sub_number])) normal array contains the normalised vector for the entire image
Algorithm 2. Finding the normalised feature vector of the image in HOG
2.1.3 Step 3 This function takes the input of the consolidated vector obtained from step 2 and normalises it. The detailed steps are shown in Fig. 4. First the square of 36 contiguous elements is calculated and summed up. There will be (w/8−1) (h/8−1) such squares, which in our case is 49. Every element is divided by the square root of the corresponding sum of squares to get the normalised vector of the entire image. This vector also measures (w/8−1) (h/8−1) 36 elements, which in our case is 1764. 2.2
Training the Model
The implementation of the histogram of oriented gradients algorithm stated above gives us the feature vector for the image. In order to classify the image as a vehicle or a nonvehicle, it is essential to apply a classifying algorithm. In this case, we used a
166
S. Prabhu et al.
supervised learning algorithm, linear support vector machine. The pseudocode of this implementation is as shown in the algorithm 3.
for all images ‘image’ in ‘cars’: features = extractFeatures(image) carFeatures.append(features) for all images ‘image’ in ‘not cars’: features = extractFeatures(image) notCarFeatures.append(features) train SVM on carFeatures and notCarFeatures Algorithm 3. Finding the normalised feature vector of the image in HOG The SVM training algorithm was not parallelized on OpenCL kernel because it is executed only once during training. Once the SVM is created, while predicting we need to only plot the feature set of the image that we want to predict as an object. This is not time consuming and therefore there is no need to parallelize SVM as well. The model thus learnt was stored in a file and used to classify images using SVM prediction algorithm [12]. 2.3
Datasets
The dataset used in this project was a set of images of car and non-car data. The total size of the dataset is 250. Each of the vehicles and non-vehicle images were split into groups of four according to their position in the image. The four groups were namely far, left, right and middle. The samples of each one is shown in Fig. 2.
Far
Left
Middle
Right
Fig. 2. Samples of the dataset: The images are 64 64 in size and hence appear unclear [7].
On the other hand, the non-vehicle images were not distributed in any groups since they had no criteria such as far, left, etc. under which we could categorize each image. With this dataset and by randomly splitting the images into test and train data, we observed an accuracy of 95.65% with 111 frames per second on Nvidia GTX 1060 6 GB. The original HOG algorithm run entirely on CPU gives us an accuracy of 99.95% but could only process a complete image in a third or a fourth of a second, making it impossible for real time detection which requires an FPS of at least 27. However, it must also be noted that our algorithm did not detect false negatives. Some non-cars were indeed classified as cars, but no car was classified as non-car. This problem could be attributed to the fact that there can be a lot of images that classify as
Object Detection and Classification Using GPU Acceleration
167
“not car” therefore making the system learn something as not car was difficult as not all such images could be covered in our training. Therefore, if the system came across any not car image that it has not seen before but has a silhouette of a car then it can confuse it for a car. Currently, the YOLO (You Look Only Once) algorithm, which gives an accuracy of 58% with 45 frames per second and its enhanced version, Fast YOLO gives an accuracy of 70.7% with 155 frames per second. The Fig. 3 compares the accuracy and FPS of different algorithms.
Fig. 3. Accuracy and frames per second for different algorithms: The image provides a comparison between the current fast processing algorithms, the traditional object detection in images and the algorithm mentioned above denoted by ‘GPU-HOG’.
3 Comparison on Different Platforms We ran the program on different platforms, CPUs, integrated GPUs as well as dedicated GPUs. Figures 4 and 5 show the details of a comparison between different GPUs and CPUs. As one can see, all GPUs beat CPUs in processing frames per second because of their higher number of threads for parallel processing. However, some less powerful CPUs and GPUs beat their more powerful counterparts; which could be due to noise influencing our results. Figure 5 shows the comparison of the algorithm on dedicated and integrated GPUs. As one can see, dedicated GPUs process faster than integrated ones. This is because of several better specifications of the former ones. As an example, Intel HD 620 GPU has 24 pipelines with DDR4 memory. On the other hand, Radeon R7 M445 has 320 pipelines, with GDDR5 memory.
168
S. Prabhu et al.
Fig. 4. Comparison of the algorithm on different GPUs and CPUs
Fig. 5. Comparison of integrated and dedicated GPUs
4 Conclusion To summarise, the GPU version of the HOG algorithm was able to retain most of the accuracy of its CPU counterpart, but at the same time it was much faster and gave over 96% accuracy, making the algorithm much more efficient and effective in real world applications. Although there were some issues such as the algorithm the data was being transferred between the CPU and GPU a total of six times, which could be further reduced. As our testing showed that running the HOG algorithm for the GPU also helps preserve the high accuracy of the HOG, also at the same time it gives an FPS of
Object Detection and Classification Using GPU Acceleration
169
over 60. This unique ability of the algorithm allows it to outperform both the YOLO and Fast YOLO which are popular algorithms in use currently. Some of the other advantages are that this algorithm is in OpenCL and therefore works on both Nvidia and AMD GPUs, also the hardware used such as the Nvidia 1060 GTX is commodity, which is not that expensive and also the CPU remains free for other instructions as the GPU will be busy in the computations. Currently, our GPU algorithm is capable of extracting features from an image. As a future work, we would try parallelising an algorithm for detection of objects from an image with a background and detecting them in real-time in a video. Acknowledgments. This project was undertaken for the final year course in B. Tech. Computer Engineering at Veermata Jijabai Technological Institute, Mumbai for the academic year 2017–2018. It was made possible with the guidance of Mrs. Varshapriya J N, Assistant Professor, Dept. of Computer Engineering. Compliance with Ethical Standards. ✓ All authors declare that there is no conflict of interest ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: International Conference on Computer Vision & Pattern Recognition (CVPR 2005), vol. 1, pp. 886–893. IEEE Computer Society, June 2005 2. Campmany, V., Silva, S., Espinosa, A., Moure, J.C., Vázquez, D., López, A.M.: GPU-based pedestrian detection for autonomous driving. Procedia Comput. Sci. 80, 2377–2381 (2016) 3. Naik, N., Rathna, G.N.: Robust real time face recognition and tracking on GPU using fusion of RGB and depth image. arXiv preprint arXiv:1504.01883 (2015) 4. Azzopardi, B.: Real time object recognition in videos with a parallel algorithm (Doctoral dissertation, University of Malta) (2016) 5. Cheng, K.M., Lin, C.Y., Chen, Y.C., Su, T.F., Lai, S.H., Lee, J.K.: Design of vehicle detection methods with OpenCl programming on multi-core systems. In: The 11th IEEE Symposium on Embedded Systems for Real-time Multimedia, pp. 88–95. IEEE, October 2013 6. Gurjar, P., Varshapriya, J.N.: Survey for GPU accelerated data mining. Int. J. Eng. Sci. Math. 2(2), 10 (2013) 7. What Is GPU Computing? https://www.boston.co.uk/info/nvidia-kepler/what-is-gpucomputing.aspx. Accessed 15 June 2019 8. OpenCL Overview - The Khronos Group Inc. https://www.khronos.org/opencl/. Accessed 15 June 2019 9. PyOpenCL. https://mathema.tician.de/software/pyopencl/. Accessed 15 June 2019 10. Histogram of Oriented Gradients||Learn OpenCV. https://www.learnopencv.com/histogramof-oriented-gradients/. Accessed 15 June 2019
170
S. Prabhu et al.
11. Kaeli, D., Mistry, P., Schaa, D., Zhang, D.P.: Examples. In: Heterogenous Computing with OpenCL 2.0 3rd edn. Waltham, Elsevier, pp. 75–79 (2015) 12. Understanding Support Vector Machine algorithm from examples (along with code). https:// www.analyticsvidhya.com/blog/2017/09/understaing-support-vector-machine-examplecode/. Accessed 15 June 2019
Cross Modal Retrieval for Different Modalities in Multimedia T. J. Osheen(&) and Linda Sara Mathew Computer Science and Engineering, Mar Athanasius College of Engineering, Kothamangalam, Ernakulam 686666, Kerala, India [email protected], [email protected]
Abstract. Multimedia data like text, image and video is used widely, along with the development of social media. In order, to obtain the accurate multimedia information rapidly and effectively for a huge amount of sources remains as a challenging task. Cross-modal retrieval tries to break through the modality of different media objects that can be regarded as a unified multimedia retrieval approach. For many real-world applications, cross-modal retrieval is becoming essential from inputting the image to load the connected text documents or considering text to choose the accurate results. Video retrieval depends on semantics that includes characteristics like graphical and notion based video. Because of the combined exploitation of all these methodologies, the crossmodal content framework of multimedia data is effectively conserved, when this data is mapped into the combined subspace. The aim is to group together the text, image and video components of multimedia document which shows the similarity in features and to retrieve the most accurate image, text, or video according to the query given, based on the semantics. Keywords: Cross-modal retrieval
Feature selection Subspace learning
1 Introduction The fast inclination towards urbanization, in rising markets across the countries worldwide, highly demanding elegant surveillance solutions are used in visual analytic techniques to augment the current digital life scenarios. Many methods were invented to sharply detect and analyze visual objects in an image or video, this facilitate a significant challenge for the researchers. Thus, cross-modal retrieval is becoming essential for many worldwide assistance where the end user perceives as a single entity, such as using image to search the relevant text documents or using text to search the relevant images. Because of the combined exploitation, the fundamental cross-modal content framework of multimedia data is properly preserved when this data is mapped into the joint subspace. Multimedia technology is developing constantly with the exponential increase in internet applications, which makes a terrific growth in the multimedia. Currently, social media are adopted all over the world, making its development mandate. As the processing speed must be made faster with a high accuracy level. The main features of querying and retrieval process is its speed and the accuracy. The data © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 171–178, 2020. https://doi.org/10.1007/978-3-030-37218-7_19
172
T. J. Osheen and L. S. Mathew
in this case is found as a multi-modal pairs. So, it’s a challenging issue to fetch data from heterogeneous types of data that is simply known as cross-modal retrieval. Nowadays, many statistics and machine learning techniques are considered for CMR. To benefit completely from a huge set of multimedia content and to make desirable benefit of quickly deploying multimedia techniques, without external intervention are necessary to confirm a relational connection between each multimedia component to one another. Depending on the past research works, the content of cross modal knowledge accumulated examination, which could not find a solution for the cross modal complementary relationship measuring issue. Data mining is one area where these methodologies are mainly used. The problems regarding the learning process in characteristics of cross modal data of heterogeneous variety reduces the semantic gap between high level and low level features. Researchers have introduced a different kind of cross modal understanding technologies to search across the possible combinations among media content, increasing the performance of cross modal knowledge accumulation. The main correspondence in which the miniature and acquired video data are essential including real time videos on required concept, inclusion identification, strip track, sport occasions, criminal examination frameworks, and numerous others. A definitive objective is to empower clients to recover some ideal substance from huge measures of video information in a proficient and semantically significant way. There are essentially three dimensions of video content which are crude video information, low dimension highlights and semantic substance. Second, low-level highlights are portrayed by sound, content, and visual highlights, for example, surface, shading conveyance, shape, movement, and so on. Third, semantic substance contains abnormal state ideas, for example, items and occasions. The initial two dimensions on which content displaying and extraction approaches are based utilize consequently removed information, which speak to the low dimension substance of a video, yet they barely give semantics which are substantially more fitting for clients. Be that as it may, it is extremely hard to remove semantic substance specific from crude video information. This is on the grounds that the video is a fleeting succession of edges without an immediate connection with its semantic substance.
2 Background and Motivation A powerful huge scale multi modular grouping technique is proposed by Wang et al. [1] is to manage the heterogeneous portrayal of extensive scale information. The cross modal bunching utilizes the similitude of the examples in diverse states, and the grouping of various states is utilized to uncover the potential sharing structure among various states. Recently, pertinent specialists have advanced some compelling cross modal grouping strategy. By utilizing non negative framework factorization, [2] scan good grouping technique for various states. A cross-modal grouping technique dependent on CCA is proposed to extend information onto a low-dimensional portion. The proposed model is utilized in [3], to refresh the grouping model by utilizing exchange spread requirements among various states. To develop a classifier dependent on the mutual highlights of the articles found in the different modes, cross modal
Cross Modal Retrieval for Different Modalities in Multimedia
173
characterization [4, 5] strategy is used, that is utilized to group the interactive media. The cross modal retrieval technique proposed by Rasiwasia et al. [6] in view of canonical correlation analysis (CCA) to acquire the mutual depiction between various states. To get familiar with a maximally related subspace by CCA, and then figure out how to pick classifiers in the semantic space for cross-modal retrieval, [6] proposed this method. A cross media highlight extraction technique dependent on CCA is proposed to acknowledge cross media characterization. A 3D object class demonstration is done by paper [7], which depended on the desirable value of posture and the acknowledgment of imperceptible states. A functioning exchange learning strategy is proposed to associate the highlights of various scenes [8]. In view of boosting, a feeble classifier is found out in every mode and the coordinated classifier is delivered by weighted mix. These attributes are huge, heterogeneous and in high measurement. Staggered attributes will without a doubt increment the trouble of recovering, writing and the relationship between the base and top is called semantic hole. In light of the customary substance of cross media learning research, we can’t settle the cross media relationship metric issue. As of late, scientists proposed an assortment of cross modal knowledge accumulation strategy to investigate across the ability of connecting between media information, enhancing the productivity of cross modal studies. Scientists have attempted incredible endeavors in creating different strategies from administered to unsupervised, from direct to non straight, for cross-modal retrieval. Canonical Correlation Analysis is the workhouse in CMR [9] by boosting the connecting relationship of heterogeneous highlights. In view of CCA, different variations are considered to show the mixed modal connection. Afterwards, they build up the methodology with the replacement of KCCA. What’s more, a general strategy for utilizing CAA to think about the semantic depiction of web pictures and closely connected writings was introduced. As of late, some advancement has been made in the field of cross media characterization.
3 Proposed Work In this segment, the documentations and problem statement is depicted. At that point, the general target work is planned. At long last, a viable improvement calculation is introduced to tackle the defined enhancement issue. A. Notations and Problem Definition In this paper, we mostly examine the normal cross-modal retrieval undertakings between picture, content and video: Image recovers Text (I2T) and Text recovers Image (T2I). Let P1 ¼ ½p11 ; . . .; P1n Rd1 n and P2 ¼ ½p21 ; . . .; P2n Rd2 n represent the characteristic notion of denoting image and text modalities, respectively. Q ¼ ½q1 ; . . .; qc Rnc represents a content defining matrix, and ith row denotes the content vector similar to p1l and p2l . Specifically, if p1l and p2l belong to the jth class, we assign qij to 1, contrarily it is assigned to 0. It is explained in Fig. 1 below.
174
T. J. Osheen and L. S. Mathew
Fig. 1. Modalities mapped to different subspaces
B. Objective Formulation The intention of the work is to acquire both pairs of matrices already mapped Am Rd1 C and Bm Rd1 C (m = 1, 2 which depicts 2 varieties of intermediary recovery tasks) for image and content modalities. With the scholarly projection lattices, the multi-modular information can be mapped into a mutual subspace for similitude estimation. The general learning structure is figured as: min f ðAm ; Bm Þ ¼ LðAm ; Bm Þ þ CðAm ; Bm Þ þ SðAm ; Bm Þ þ XðAm ; Bm Þ
ð1Þ
The studies on target work contains 4 sections: which is a straight relapse term which straight forwardly relates the methodology explicit subspace with unequivocal abnormal state semantics. Connection investigation term C (A, B) is that protects the pairwise semantic consistency of two modalities in the semantic spaces and the element choice term S (A, B) is that keeps away from over-fitting issue and chooses the discriminative highlights for methodology connection improvement. X (A, B) is a graph regularization term. The exploited features in hidden complex structure to protect the modality in between and intra-modality based on semantic consistency. 1. Derivation of I2T: While dealing with I2T task, two optimal projection matrices are calculated equivalent to the image and text modalities as: A1 Rd1 C and B1 Rd2 C , respectively. The real function of I2T assigned here as follows:
2
2 min f ðA1 ; B1 Þ ¼ PT1 A1 Q F þ PT1 A1 PT2 B1 F þ k1 kA1 k21 þ k2 kB1 k21 þ k3 X ðA1 ; B1 Þ
ð2Þ
2
2 min f ðA2 ; B2 Þ ¼ PT2 B2 Q F þ PT1 A2 PT2 B2 F þ k1 kA2 k21 þ k2 kB2 k21 þ k3 X ðA2 ; B2 Þ
ð3Þ Here ki (i = 1, 2, 3) represents balancing parameters C. Feature Extraction Amid the previous couple of years, profound CNN has illustrated a solid capacity for image characterization on some freely accessible datasets, for example, COCO and
Cross Modal Retrieval for Different Modalities in Multimedia
175
Flickr. Some ongoing articles illustrated that the CNN models pretrained on extensive informational collections with information assorted variety, e.g., COCO, can be straightforwardly exchanged to remove CNN visual highlights for different visual acknowledgment errands for example, image grouping and item location. To adjust the parameters pretrained on COCO to the target dataset, images are used from the target dataset to fine tune the CNN. The relationship between two modalities is worked by their mutual ground truth label(s). i. Extracting Visual Features From Pretrained CNN Model Enlivened by Yunchao et al. [8], Wang et al. [4], exhibited the remarkable execution of the CNN visual features from different acknowledgment undertakings, the pretrained CNN model is used commonly to separate CNN visual features for CMR. Specifically, each image is first resized to 256 256 and given into the CNN. We just use the center patch of the image to deliver the CNN visual highlights. fc6 and fc7 represent the 4096 dimensional highlights of the initial two completely associated layers after the rectified linear units (ReLU) [2]. ii. Extracting Visual Features From Fine-Tuned CNN Model Since the classifications (and the quantity of classes) among COCO and the target data set are typically extraordinary, straightforwardly utilizing the pretrained CNN model to extricate visual highlights may not be the best technique. Every image from the objective informational collection is resized into 256 256 without trimming. The yield of the last fully-connected layer is then encouraged into a c-way delicate max which delivers a likelihood dissemination over c classes. The square of loss function can accomplish a comparable or shockingly better arrangement precision when the quantity of classes of target dataset is less. Nonetheless, with the development of the quantity of classes, cross entropy loss function can achieve a superior characterization result. D. Cross Modal Mapping in Subspaces Deep learning has been credited to staggering advances in PC vision, normal language handling, and general example understanding. Ongoing disclosures have empowered proficient vector portrayals of both visual and composed upgrades. To probably exchange among visual and content portrayals vigorously remains a test that could yield benefits for recovery applications. Vector portrayals of every methodology are mapped to a Common Vector Space (CVS) utilizing individual inserting systems. These cross-modular subspace learning techniques have the basic work process of learning a projection framework for every methodology to extend the information from various modalities into a typical equivalent subspace in the preparation stage. E. Training of Tensor Box The Tensor Box venture takes an “.idl” document for preparing. It must contain a line for every one of the picture, with the relative jumping encloses the items are kept in the image. Here some extrapolation from Tensor Board of the preparation stage for the model over 2 M emphases. Every one of the capacities turns out to be level, yet have inadequate outcomes and tops out of the level esteem, this is because of the speculation of the model, that isn’t learning diverse covers for each item class, however just a
176
T. J. Osheen and L. S. Mathew
single for every one of them. This brings down the entire precision. This implies the model wasn’t adapting any more. Tensor Box gives an all the more clear view on the exactness of the model and furthermore gives the likelihood to watch the test stage on the pictures. F. Dataset Preparation The datasets for the Cross modular recovery is made from the 3 Datasets speaking to various modalities. Explanations are done, which maps diverse modalities in highlight space. The datasets are: a. COCO dataset: The COCO dataset is an extensive visual database intended for use in visual item acknowledgment, object discovery, division, and subtitling dataset programming research. More than 14 million URLs of pictures have been hand-commented on to show what objects are imagined; in something like one million of the pictures, bouncing boxes are additionally given. The database of explanations of outsider picture URLs is openly accessible specifically. COCO has a few highlights: i. 91 stuff classes ii. Super pixel stuff division iii. 1.5 million item occurrences iv. 80 object classes v. 250,000 individuals with keypoints b. Flickr Dataset: It comprises of 25000 pictures downloaded from the social webpage Flickr through its open API combined with full manual comments. Specifically for the examination network devoted to enhancing picture recovery. It has gathered the client provided picture Flickr labels just as the EXIF metadata and make it accessible in simple toget to content records. Furthermore, it gives manual picture explanations on the whole accumulation reasonable for an assortment of benchmarks. c. TRECVID Dataset: The primary objective of the TREC Video Retrieval Evaluation (TRECVID) is to advance the substance based examination of and recovery from computerized video by means of open and measurements based assessment. TRECVID is a research center style assessment that endeavors to demonstrate genuine circumstances or noteworthy part undertakings associated with such circumstances. In TRECVID stood up to known-thing seek and semantic ordering frameworks with another arrangement of Internet portrayed by a high level of assorted variety in maker, content, style, creation characteristics, unique gathering gadget/encoding, language, and so on, as is normal in much “Web video”. The accumulation additionally has related watchwords and portrayals given by the video contributor.
Cross Modal Retrieval for Different Modalities in Multimedia
177
4 Conclusion With the improvement of the Internet, sight and sound data, for example, picture and video, is progressively advantageous and quicker. Accordingly, how to locate the required sight and sound information on an extensive number of assets rapidly and precisely, has turned into an exploration center in the field of data recovery. A constant web CMR dependent on profound learning is proposed here. The strategy can accomplish high accuracy in picture CMR, utilizing less recovery time. In the meantime, for an expansive database of information, the technique can be utilized to recognize the mistake rate of picture acknowledgment, which is useful for machine learning. The structure joins the preparation of direct projection and the preparation of nonlinear shrouded layers to guarantee that great highlights can be educated. In the field of cross modular recovery, recovery exactness and recovery time utilization is a critical pointer to assess the nature of a calculation. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest. ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Wang, L., Sun, W., Zhao, Z., Su, F.: Modeling intra- and inter-pair correlation via heterogeneous high-order preserving for cross-modal retrieval. Sig. Process. 131, 249–260 (2017) 2. Bai, X., Yan, C., Yang, H., Bai, L., Zhou, J., Hancock, E.R.: Adaptive hash retrieval with kernel based similarity (2017) 3. Wang, K., He, R., Wang, W., Wang, L., Tan, T.: Learning coupled feature spaces for crossmodal matching. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2088–2095 (2013) 4. Wang, K., He, R., Wang, L., Wang, W., Tan, T.: Joint feature selection and subspace learning for cross-modal retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2010– 2023 (2016) 5. Wang, J., He, Y., Kang, C., Xiang, S., Pan, C.: Image-text cross-modal retrieval via modality-specific feature learning. In: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, pp. 347–354 (2015) 6. Rasiwasia, N., Costa Pereira, J., Coviello, E., Doyle, G., Lanckriet, G.R.G., Levy, R., Vasconcelos, N.: A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM on Multimedia Conference, pp. 251–260 (2010) 7. Jiang, B., Yang, J., Lv, Z., Tian, K., Meng, Q., Yan, Y.: Internet cross-media retrieval based on deep learning. J. Vis. Commun. Image Retrieval 48, 356–366 (2017) 8. Wei, Y., Zhao, Y., Lu, C., Wei, S., Liu, L., Zhu, Z., Yan, S.: Cross-modal retrieval with CNN visual features: a new baseline. IEEE Trans. Cybern. 47, 251–260 (2016) 9. Pereira, J.C., Vasconcelos, N.: Cross-modal domain adaptation for text-based regularization of image semantics in image retrieval systems. Comput. Vis. Image Underst. 124, 123–135 (2014)
178
T. J. Osheen and L. S. Mathew
10. He, J., Ma, B., Wang, S., Liu, Y.: Multi-label double-layer learning for cross-modal retrieval. Neuro-computing 123–135 (2017) 11. Hu, X., Yu, Z., Zhou, H., Lv, H., Jiang, Z., Zhou, X.: An adaptive solution for large-scale, cross-video, and real-time visual analytics. In: IEEE International Conference on Multimedia Big Data, pp. 251–260 (2015) 12. Lavrenko, V., Manmatha, R., Jeon, J.: A model for learning the semantics of pictures. In: NIPS (2003)
Smart Wearable Speaking Aid for Aphonic Personnel S. Aswin1(&), Ayush Ranjan1, and K. V. Mahendra Prashanth2 1
2
Department of Electronics, SJB Institute of Technology, Bangalore, India [email protected], [email protected] Department of Electronics and Communication, SJB Institute of Technology, Bangalore, India [email protected]
Abstract. Gesturing is an instinctive form of non-verbal or non-vocal communication in which perceptible bodily actions are used to communicate definite messages in conjunction with or without speech. But, aphonic personnel rely solely on this way of communication. This inhibits them from interacting or communicating freely with other people. This paper deals with a Smart Wearable Speaking aid which translates Indian Sign Language into text and voice in real time based on SVM (Support Vector Machine) Model using Raspberry Pi, Flex Sensors, and Accelerometer. Additionally, LCD is used to display the text corresponding to gestures and equivalent output in the form of voice signal is given to the Bluetooth Speaker. The prototype was designed to identify 5 gestures. During the testing of this device, it was observed that the accuracy of the prototype is 90%. Keywords: Indian Sign Language aid Wearable aid
SVM model Raspberry Pi Speaking
1 Introduction Communication plays a crucial role in conveying an information from one person to another person. They can be a verbal or non-verbal form of communication. Verbal communication relies on conveying the message through Language whereas non-verbal communication (usually termed as gesturing) is an instinctive method of conveying message either in conjunction with or without speech [1]. But people with speech or hearing impairment have gesturing alone as their principal means of communication. This might create a communication barrier between aphonic personnel and normal people. As per one of the survey report, there are 466 million people around the world who are hearing or speech impaired of which 34 million are children [2]. From the census of 2011, it is observed that there are around 30 million people in India who are aphonic [3]. Around the world, each nation has developed its own variety of sign languages. There are around 300 diverse sign languages in the world [4]. Indian Sign language is commonly used among aphonic people in India. Further, different parts of India adopt different signs. © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 179–186, 2020. https://doi.org/10.1007/978-3-030-37218-7_20
180
S. Aswin et al.
However, grammar is common & same throughout the country. Spoken language is not associated to sign languages and has its own grammar structure. Hence it is difficult to standardize the sign language regionally or globally. Research into the interpretation of sign language gestures in real time is explored progressively in recent decades. There are multiple methods of performing sign language recognition, namely, vision-based and non-vision based approaches. In visionbased approach, a camera is used to capture the gestures and processed to identify the sign. The vision-based method can be further classified into direct and indirect. The direct method processes the captured raw image directly to identify the sign whereas, in the indirect method, the sign is identified based on RGB color space segmented to different colors for each finger using a data glove. In non-vision based approach, sensors are used to analyze the flexing of fingers and identify sign [5–10]. In this paper the non-vision based approach to recognize the sign language is being adopted. A simple device is designed using Raspberry Pi, Flex sensors, and Accelerometer to function as Smart Wearable Aid. The Flex sensors are used for detecting the Hand gesture and Accelerometer is used for distinguishing the orientation of Palm. The data from the sensors are processed in Raspberry Pi and resulting text is displayed using an LCD screen and voice output is given through a Bluetooth Speaker. Battery with voltage regulator and a current booster is used to power the Raspberry Pi to ensure complete portability.
2 Methodology 2.1
Prototype Design
The proposed prototype consists of Raspberry Pi 3 (RPi), Flex sensors, MCP3008, Accelerometer, LCD Display, I2C Converter, Bluetooth speaker, and Power Supply (Fig. 1).
Fig. 1. Block diagram of designed prototype
Smart Wearable Speaking Aid for Aphonic Personnel
181
RPi needs 5 V, 2 A to function normally. To make the prototype portable, a 6 V battery is used with IC7805 and a current booster circuit to meet this requirement [11] (Fig. 2). GPIO pins are used to interface the sensors and to power them. Bluetooth 4.1 is used to connect to the Bluetooth Speaker for generating voice output.
Fig. 2. Power supply
Flex sensors works on basis of variable resistor; whose resistance depends on the amount of flexing. Five flex sensors are used for building the prototype. For interfacing RPi with these sensors a voltage divider circuit is used (See Fig. 3). The voltage signal strength at junction increases as the sensor is flexed. Since RPi does not support analog inputs, IC MCP3008 is used to interface the sensors to RPi. MCP3008 is a 10-bit DAC with 8 channels. The device is capable of conversion rates of up to 200 ksps [12]. It is interfaced to RPi using an SPI (Serial Peripheral Interface).
Fig. 3. Flex sensor interfaced with voltage divider
182
S. Aswin et al.
Accelerometer ADXL345 is used to identify the orientation of palm. It is an ultralow power, 3-axis accelerometer with 13-bit resolution measuring up to ±16 g. The digital output is formatted as 16-bit twos complement data and are capable of communicating over SPI or I2C interface [13]. In the prototype, the I2C digital interface is used to connect the Accelerometer. The LCD is used to display the text output and is interfaced to RPi by using I2C interface. 2.2
Software Implementation
The input data from the sensors are collected for a set of signs. Standardization of data is performed by converting these values to binary on comparing with the mean values [14] (Fig. 4).
Fig. 4. Flowchart for training the SVM model
Fig. 5. Flowchart depicting working of Raspberry Pi module
Smart Wearable Speaking Aid for Aphonic Personnel
183
The model obtained through various processes as described is then used in the realtime detection of gestures. The program for this is coded in python and is set to run on boot up. After booting up, RPi will automatically get connected to the Bluetooth speaker. Once the speaker is connected, a series of data from the sensors are collected to obtain the mean value. These Mean values are then used to convert the series of future sensor data to binary. The SVM model with the binary sensor values is used to predict the gestures. The corresponding text is displayed in the LCD and the prerecorded mp3 file of the respective gesture is played through Bluetooth speaker (Fig. 5).
3 Results The python code for real time translation using the pre-trained model is set to run on the startup. Initially, the Bluetooth speaker gets connected to the RPi. After this, we can see the “SIGN LANGUAGE TRANSLATOR” message getting displayed on the screen after a small delay. After this, a message is displayed on LCD to inform the user to make gestures of an open palm and a fist, during which a series of data from sensors are collected to decide the mean and threshold values which are used to convert the future values into binary. After this, the device can be used for the real-time translation of gestures (Figs. 6, 7, 8 and 9).
Fig. 6. Snapshot of designed prototype
184
S. Aswin et al.
Fig. 7. A snapshot illustrating start-up
Fig. 8. Displaying instructions to user
Fig. 9. Translator displaying respective text
Smart Wearable Speaking Aid for Aphonic Personnel
185
4 Conclusion and Future Scope A simple cost-effective prototype is designed using Raspberry Pi, Flex sensors, and accelerometer to function as Smart Wearable Aid. The Flex sensors are used for detecting the hand gesture and accelerometer for distinguishing the orientation of Palm. The data from the sensors are processed in Raspberry Pi and resulting text is displayed using an LCD screen and voice output is given through a Bluetooth Speaker. Battery with voltage regulator and a current booster is used to power the Raspberry Pi to ensure complete portability. From the obtained results it is observed that accuracy of the prototype is 90%. This could be further improvised by training with a large and raw data set rather than converting them to binary. Multiple classification models were compared and tested to select the applicable one for the given application; the real-time implementation could not achieve appreciable accuracy while using other classification models. It is suggested to identify and select a better real-time classification model for this application. As in this perspective of comparison, i.e., within the considered classification models for comparison, SVM was found to obtain better accuracy. The effectiveness of this device could be improved by effectively connecting the sensors and peripherals. Product can also be introduced into the market by devising a single powerful chip to perform all required functions. Compliance with Ethical Standards. ✓ All authors declare that there is no conflict of interest. ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Gesture: en.wikipedia.org. Wikipedia. https://en.wikipedia.org/wiki/Gesture 2. Deafness and Hearing Loss: World Health Organization. http://www.who.int/news-room/ fact-sheets/detail/deafness-and-hearing-loss 3. Census 2011 India. http://www.censusindia.gov.in/DigitalLibrary/data/Census_2011/ Presentation/India/Disability.pptx 4. List of Sign Languages: en.wikipedia.org. Wikipedia. https://en.wikipedia.org/wiki/List_of_ sign_languages 5. Mulder, A.: Hand gestures for HCI. In: Technical Report 96-1, vol. Simon Fraster University (1996) 6. Sriram, N., Nithiyanandham, M.: A hand gesture recognition based communication system for silent speakers. In: International Conference on Human Computer Interaction, Chennai, India, August 2013 7. Lee, B.G., Lee, S.M.: Smart wearable hand device for sign language interpretation system with sensors fusion. IEEE Sens. J. 18(3), 1–2 (2018) 8. Quiapo, C.E.A., Ramos, K.N.M.: Development of a sign language translator using simplified tilt, flex and contact sensor modules. In: 2016 IEEE Region 10 Conference (TENCON), p. 1 (2016)
186
S. Aswin et al.
9. Ekbote, J., Joshi, M.: Indian sign language recognition using ANN and SVM classifiers. In: 2017 International Conference on Innovations in Information Embedded and Communication Systems (ICIIECS), p. 1 (2017) 10. Pawar, S.N., Shinde, D.R., Alai, G.S., Bodke, S.S.: Hand glove to translate sign language. IJSTE – Int. J. Sci. Technol. Eng. 2(9), 1 (2016) 11. Raspberry Pi 3 Model B. http://www.raspberrypi.org/products/raspberry-pi-3model-b/ 12. MCP3008, 10-bit ADC. http://www.farnell.com/datasheets/1599363.pdf 13. ADXL345. http://www.analog.com/en/products/sensors-mems/accelerometers/adxl345. html#product-overview 14. Introduction to Support Vector Machine. https://docs.opencv.org/3.0-beta/doc/tutorials/ml/ introduction_to_svm/introduction_to_svm.html
Data Processing for Direct Marketing Through Big Data Amelec Viloria1(&), Noel Varela1, Doyreg Maldonado Pérez1, and Omar Bonerge Pineda Lezama2 1
Universidad de la Costa (CUC), Calle 58 # 55-66, Baranquilla, Atlantico, Colombia {aviloria7,nvarela1,dmaldona}@cuc.edu.co 2 Universidad Tecnológica Centroamericana (UNITEC), San Pedro Sula, Honduras [email protected]
Abstract. Traditional marketing performs promotion through various channels such as news in newspapers, radio, etc., but those promotions are aimed at all people, whether or not interested in the product or service being promoted. This method usually leads to high expenses and a low response rate by potential customers. That is why, nowadays, because there is a very competitive market, mass marketing is not safe, hence specialists are focusing efforts on direct marketing. This method studies the characteristics, needs and also selects a group of customers as a target for the promotion. Direct marketing uses predictive modeling from customer data, with the aim of selecting the most likely to respond to promotions. This research proposes a platform for the processing of data flows for target customer selection processes and the construction of required predictive response models. Keywords: Data stream marketing
WEKA MOA SAMOA Big Data Direct
1 Introduction Direct marketing is a data set-oriented process for direct communication with target customers or prospects [1, 2]. But at the same time, marketers face the situation of changing environments. Current datasets constitute computers insert data into each other making environments dynamic and conditioned by restrictions such as limited storage capacity, need for real-time processing, etc. This means that Big Data processing approaches [3, 4] are required; Hence, in order to establish and maintain the relationship with clients, specialists have anticipated the need to change the methods of intuitive group selection for more scientific approaches aimed at processing large volumes of data [5], obtaining rapid responses that allow selecting customers who will respond to a new offer of products or services, under a distributed data flow approach [6]. Several researches relate to computational and theoretical aspects of direct marketing, but few efforts have focused on technological aspects needed to apply data mining to the direct marketing process [7–11]. Situation is that gains in complexity © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 187–192, 2020. https://doi.org/10.1007/978-3-030-37218-7_21
188
A. Viloria et al.
when distributed data flow environments exist. Researchers need to devote effort and time to implementing data stream simulation environments, both demographic and historical, shopping, because you don’t have a platform that handles tasks like that while being scalable, allowing the incorporation of new variants of algorithms for their respective evaluation. Faced with such a problem and the importance of even as a basis for future research in the area of data flow mining for direct marketing; this research proposes a processing platform based on Free Software technologies and with a high level of scalability.
2 Direct Marketing Direct marketing is a set of techniques that allows you to create personal communication with each potential buyer, especially segmented (social, economic, geographical, professional, etc.), in order to promote a product, service, idea, and maintain it over time. using for this means or direct contact systems (emailing, telemarketing, couponing, mailboxing, new technologies that offer us “virtual markets”, multimedia systems and all the new media that facilitate technological advances online). Basically it consists in sending individualized communications addressed to a previously selected audience based on certain variables and with which it is sought to have a continuous relationship. The elaboration of your message follows, with adaptations, the same process as that of mass advertising, that is, creativity, production and dissemination, which are located within the corresponding planning. It is basically solved in two ways: • personalized mail or mailing, personalized deliveries to your home or workplace, which may include response formulas. • mailbox and brochures, which are distributed in homes and workplaces without a recipient’s address and according mainly to geographical criteria. Direct marketing is considered essentially part of relational marketing whose strategic objective is to convert any sale or contact with customers into lasting relationships based on the satisfaction of their needs and preferences. To achieve this, apply the techniques of direct marketing and other of its main tools, telemarketing. In this case the contact tries to establish itself through the telephone that, in front of the mail, presents its own advantages and disadvantages. Among the advantages, it should be noted the rapidity in communication, a greater security of contact with the target audience, a lower degree of reticence and the possibility of counterargument and even offering alternatives adapted to each person.
3 Methodology Large volumes of that data are generated every day in many organizations, which can be used to establish and maintain a direct relationship with customers to identify what groups of customers would respond to offers Specific [14].
Data Processing for Direct Marketing Through Big Data
189
There are companies such as Nielsen Datasets Transforming Academic Research in Marketing (2015), specialized in collecting and maintaining sales data for marketing companies [15]. A special category is databases with demographic information, which provide additional information at the group level, and are useful for determining target groups that have similar properties. Contrary to the external database, company internals provide more relevant and reliable information. Regarding customer behavior. These attributes (RFMs) are identified as the three most representative variables for marketing modeling. When these variables are used in combination, very accurate predictive results are achieved and are also very useful to support customer relationship management (CRM) [17]. RFMs tend to be 10 or less for a dataset; hence, attribute reduction is not a critical step when using these variables. In the Bank Marketing Data Set the data is related to an advertising campaign of a Portuguese banking institution, aimed at several customers. There are other datasets such as the one proposed by Nielsen Datasets (2015) [18] that record people’s consumption habits in the United States, which are available to academic researchers. The following table lists the attributes (Characterization of datasets for Direct Marketing), see Table 1. Table 1. Characterization of datasets for Direct Marketing Data set Features Instances Attributes Bank marketing data set Multivariate 45478 12 Insurance company benchmark (COIL 1000) data set Multivariate 9964 87
4 Results 4.1
Technology Selection Criteria
There are currently two approaches to Big Data processing. One uses incremental algorithms for processing and the other approach is based on distributed computing processing. Since the nature of the data sources generated by customers’ consumption habits are ubiquitous, the article proposes a processing of distributed data flows, capable of solving the tasks of each of the stages of the CRISP-DM methodology. To do this, there are several frameworks and technologies that are very useful, and in some cases indispensable for the evaluation and development of algorithms capable of processing data, both for stationary environments and for data flow environments. 4.2
Principles for Technology Selection
As you mentioned above, there are several technologies for data mining. That is why several selection principles were followed for the design of the proposed platform [19]: 1. Suitability: The tools used in the design perform the required function at each stage. 2. Interoperability: easy to integrate and connect the output of certain processes as inputs from others.
190
A. Viloria et al.
3. Homogeneous technology: all the tools used for the design are developed under the same paradigms and programming technologies. 4. Scalable: new algorithms can be incorporated for each of the stages of data processing included in the CRISP-DM methodology, as well as modify existing ones. 5. Free Software: the selected tools comply with the four freedoms of Free Software, thus guaranteeing the deployment of the proposed architecture, such as the reproducibility of the experiments. 4.3
Selected Technologies
– WEKA (Waikato Environment for Knowledge Analysis): It is a tool that allows the experimentation of data analysis through the application, analysis and evaluation of the most relevant data analysis techniques, mainly those from machine learning, on any set of user data. Weka also provides access to databases via SQL thanks to the Java Database Connectivity (JDBC) connection and can process the result returned by a query made to the database. You cannot perform multi-relational data mining, but there are applications that can convert a collection of related tables in a database into a single table that can already be processed with Weka. – MOA: Massive Online Analysis [21] is a framework for implementing and evaluating real-time learning algorithms in data flows. MOA includes a collection of algorithms for classification and grouping tasks. The framework follows the principles of green computing using processing resources efficiently, being a required feature in the data stream model where data arrive at high speeds and algorithms must process them under time and space constraints. The framework integrates with WEKA. – SAMOA: Scalable Advanced Massive Online Analysis is a platform for mining distributed data flows [22]. Like most data flow processing frameworks, it is written in Java. It can be used as a framework for development or as a library in other software projects. As a framework it allows developers to abstract themselves from tasks related to runtimes, which makes it easier for code to be used with different stream processing engines, making it possible to implement an architecture on different engines like Apache Storm, Apache S4 and Samza. SAMOA achieves seamless integration with MOA and MOA classification and processing algorithms can therefore be used in distributed flow environments. It is a very useful platform in itself for both researchers and deployment solutions in real-world environments. – Apache Spark: [23, 24] is an open source framework built for fast big data processing. Easy to use and sophisticated analysis. Spark has several advantages compared to other Big Data processing frameworks that are based on MapReduce such as Hadoop and Storm. This framework offers unified functionalities for managing Big Data requirements, including data sets of various in both nature and data source. Spark beats the processing speed of other MapReduce-based frameworks with capabilities such as in-memory data storage and real-time processing. It keeps intermediate results in memory instead of writing them to disk.
Data Processing for Direct Marketing Through Big Data
4.4
191
Proposal
A distributed processing platform for demographic and historical purchasing data streams from various sources is proposed, based on the interoperability of the technologies described above. To do this, a processing cluster was deployed on a set of mid-range desktop computers, running SAMOA configured with multiple processing engines (SPEs) between them Storm and S4. These take care of tasks such as data serialization, which are evaluated in Scalable Advanced Massive Online Analysis (SAMOA) [25] using Massive Online Analysis (MOA) algorithms [26] and WEKA [27] for the preprocessing and data modeling stages [28]. Several of the algorithms available in the above-mentioned frameworks were evaluated, on the Bank Marketing Data Set data set managing to identify the target groups and proposing a predictive response model in real time.
5 Conclusions With the development of Computer and Telecommunications Technologies, companies are adopting direct marketing strategies that allow to identify potential customers of their products and/or services, thus avoids large resource expenditures and efforts in mass advertising campaigns. This article described the nature of the data used by somerhythms for the selection of target customer groups, highlighting updated datasets. In addition, a platform was proposed for the processing of customer data from different sources. For this was evaluated for technologies for environments flows of distributed data, this makes it possible to analyses themselves in real time. Several selection criteria were applied, ensuring interoperability, and the freedoms offered by Free Software products. New algorithms requiring processing of data flows. This facilitates tasks such as simulating distributed environments, serializing data, among others that are functional requirements for research related to this field of action.
References 1. Marston, S., Li, Z., Bandyopadhyay, S., Zhang, J., Ghalsasi, A.: Cloud computing—the business perspective. Decis. Support Syst. 51(1), 176–189 (2011) 2. Bifet, A., De Francisci Morales, G.: Big data stream learning with Samoa. Recuperado de (2014). https://www.researchgate.net/publication/282303881_Big_data_stream_learning_ with_SAMOA 3. Mell, P., Grance, T.: The NIST definition of cloud computing. NIST Special Publication 800–145 (2011) 4. Valarie Zeithaml, A., Parasuraman, A., Berry, L.L.: Total, quality Management services. Diaz de Santos, Bogota (1993) 5. Sitto, K., Presser, M.: Field Guide to Hadoop, pp. 31–33. O’Reilly, California (2015) 6. Sosinsky, B.: Cloud Computing Bible, 3 p. Wiley Publishing Inc., Indiana (2011) 7. Casabayó, M., Agell, N., Sánchez-Hernández, G.: Improved market segmentation by fuzzifying crisp clusters: a case study of the energy market in Spain. Expert Syst. Appl. 41 (3), 1637–1643 (2015)
192
A. Viloria et al.
8. Izquierdo, N.V., Lezama, O.B.P., Dorta, R.G., Viloria, A., Deras, I., Hernández-Fernández, L.: Fuzzy logic applied to the performance evaluation. Honduran coffee sector case. In: Tan, Y., Shi, Y., Tang, Q. (eds.) Advances in Swarm Intelligence, ICSI 2018. Lecture Notes in Computer Science, vol 10942. Springer, Cham (2018) 9. Pineda Lezama, O., Gómez Dorta, R.: Techniques of multivariate statistical analysis: an application for the Honduran banking sector. Innovare: J. Sci. Technol. 5(2), 61–75 (2017) 10. Viloria, A., Lis-Gutiérrez, J.P., Gaitán-Angulo, M., Godoy, A.R.M., Moreno, G.C., Kamatkar, S.J.: Methodology for the design of a student pattern recognition tool to facilitate the teaching - learning process through knowledge data discovery (Big Data). In: Tan, Y., Shi, Y., Tang, Q. (eds.) Data Mining and Big Data, DMBD 2018. Lecture Notes in Computer Science, vol. 10943. Springer, Cham (2018) 11. Zhu, F., et al.: IBM cloud computing powering a smarter planet. In: Libro Cloud Computing, vol. 5931, pp. 621–625 (2009) 12. Fundación Telefónica: La transformación digital de la industria española. Informe preliminar. Ind. Conectada 4.0, p. 120 (2015) 13. Thames, L., Schaefer, D.: Software defined cloud manufacturing for industry 4.0. Procedía CIRP 52, 12–17 (2016) 14. Gilchrist, A.: Industry 4.0 the industrial Internet of Things. In: 2016th edited by Bangken, Nonthaburi, Thailand (2016) 15. Schweidel, D.A., Knox, G.: Incorporating direct marketing activity into latent attrition models. Mark. Sci. 31(3), 471–487 (2013) 16. Setnes, M., Kaymak, U.: Fuzzy modeling of client preference from large data sets: an application to target selection in direct marketing. IEEE Trans. Fuzzy Syst. 9(1), 153–163 (2001) 17. Thompson, E.B., Heley, F., Oster-Aaland, L., Stastny, S.N., Crawford, E.C.: The Impact of a student-driven social marketing campaign on college student alcohol-related beliefs and behaviors. Soc. Mark. Q. 1514500411471668 (2013) 18. Thorson, E., Moore, J.: Integrated Communication: Synergy of Persuasive Voices. Psychology Press/Lawrence Erlbaum Associates, Inc., Publishers, New Jersey (2013) 19. Lars Adolph, G.: Bundesanstalt: DIN/DKE-Roadmap GERMANS TANDARDIZATION 20. Morales, P.G., España, J.A.A., Zárate, J.E.G., González, C.C.O., Frías, T.E.R.: La Nube Al Servicio De Las Pymes En Dirección a La Industria 4.0. Pist. Educ. 39(126), 85–98 (2017) 21. Wu, Q., Yan, H.S., Yang, H.B.: A forecasting model based support vector machine and particle swarm optimization. In: 2008 Workshop on Power Electronics and Intelligent Transportation System, pp. 218–222 (2008) 22. Ellingwood, J.: Apache vs Nginx: Practical Considerations. Disponible en (2015). https:// www.digitalocean.com/community/tutorials/apache-vs-nginx-practical-considerations 23. Gilbert, S., Lynch, N.: Perspectives on the CAP theorem. Computer 45(2), 30–36 (2012) 24. Gouda, K., Patro, A., Dwivedi, D., Bhat, N.: Virtualization approaches in cloud computing. Int. J. Comput. Trends Technol. (IJCTT) 12(4), 161–166 (2014) 25. Hollensen, S.: Marketing Management: A Relationship Approach. Pearson Education, México (2015) 26. James, G., Witten, D., Hastie, T., Tibshirani, R., Hastie, M.T., Suggests, M.: Package ‘ISLR’ (2013) 27. Kacen, J.J., Hess, J.D., Chiang, W.K.: Bricks or clicks? Consumer attitudes toward traditional stores and online stores. Glob. Econ. Manag. Rev. 18(1), 11 (2013) 28. Karim, M., Rahman, R.M.: Decision tree and Naïve Bayes algorithm for classification and generation of actionable knowledge for direct marketing. J. Softw. Eng. Appl. 6, 196–206 (2013). Recuperado de http://file.scirp.org/pdf/JSEA_2013042913162682.pdf
Brain Computer Interface Based Epilepsy Alert System Using Neural Networks Sayali M. Dhongade(&), Nilima R. Kolhare, and Rajendra P. Chaudhari Government Engineering College, Aurangabad, India [email protected], [email protected], [email protected]
Abstract. At present, Epilepsy detection is a very time consuming process, and it requires constant clinical attention. In order to overcome this challenge, we have designed an epilepsy alert system, which takes the raw EEG signal, to monitor the blood pressure and the accelerometer application for human fall detection. The raw EEG signal is captured using a Brain Computer Interface module. This design is done using MATLAB GUI. The EEG raw signal is trained by using neural networks and a total of 500 epilepsy patient data is fed to train the network. Accuracy varies with different number of epochs. As the number of epochs increases, the accuracy will also be increased. Fuzzy system is designed to detect the epilepsy with three main parameters. Keywords: Electroencephalograph (EEG) Epilepsy Seizure Brain Computer Interface (BCI) Windowing Radial basis network Fuzzy logic Epochs
1 Introduction Epilepsy is one of the most common neurological disorders that has affected almost 50 million people all over the world and the count is still increasing [6]. Epilepsy is a disorder of the central nervous systems, where the brain activities increases abruptly. Due to this abnormal condition, there are many unusual sensations [4]. Patients who are mainly diagnosed with epilepsy are children and elderly population, who falls in the age limit of 65–70 years. 1.1
Epilepsy
EEG is electrophysiological strategy to record the electrical action that occurs on the surface of the brain. Electroencephalogram (EEG) signals taken from its recording are utilized to exercise the epileptic activities inside the brain [7]. It is commonly a strategy with the electrodes set on the surface of the scalp as shown in Fig. 1. It measures the voltage variations that results from the ionic current present inside the neurons of the brain. A seizure can be described as an abrupt [2] and an uncontrollable electric signals in the brain.
© Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 193–203, 2020. https://doi.org/10.1007/978-3-030-37218-7_22
194
S. M. Dhongade et al.
Fig. 1. Electrode placement [3]
The paper is organized as follows. Firstly, the work related to BCI, the basic blocks of BCI, and the frequency bands of raw EEG are discussed in Sect. 2. Based on related work being discussed the proposed work is carried out in Sect. 3. Following the proposed work, the Materials required and Methodology is discussed in Sects. 4 and 5. In the next Sect. 6, the Simulation and the results are discussed. Lastly, we summarize the whole project in Conclusion and Future scope.
2 Related Work Brain is one of the most important organs of our body. Brain computer interface has been designed for the user to allow them to control the external devices or computers according to their intentions, these are decoded in binary form from brain activity [9]. The application of BCI has been developed to such a level that we can control the EEG signals for epilepsy patients in seizure control. The basic blocks of brain computer interface are Pre-processing, Feature extraction, Feature Classification. 1. Preprocessing: This is a process which amplifies the EEG signals without disturbing the significant information within it. Signal strengthening guarantees signal quality by improving SNR [11]. EEG sign are examined at 256 Hz which fulfills Nyquist testing theorem. To get high SNR band pass filtering is used to remove the DC bias and high frequency noise. 2. Feature extraction: Before classification test and actual BCI, [4] the classifier is firstly trained and then supervised using a classification method [7] which classifies then using feature vectors called “target” and “non target”. Different methods are used for feature extraction such as: Discrete wavelet transform [2], independent component analysis, principal component analysis. 3. Feature classification: For classification also, various methods are implemented like Linear Discriminant analysis, Stepwise linear discriminant analysis and Support vector machine (Fig. 2).
Brain Computer Interface Based Epilepsy Alert System
120 100 80 60 40 20 0
195
Accuracy (%)
DWT
BLDA
LDA
SVM
ML
Methods Fig. 2. Comparison between different algorithms
Methods: 1. Discrete Wavelet Transform (DWT): The [2] performance is 7.7 characters per minute and accuracy is around 80%. Accuracy is greater than 90% for 2.2 characters per minute. 2. Bayesian linear discriminant analysis (BLDA): 95% false positive classification. 3. Linear discriminant analysis: Accuracy is close to 100% [12], and best classification accuracy for disabled body was on an average 100%, varies with the no. of electrodes. 4. Support vector machine (SVM): 96.5% accuracy, 5. Maximum likelihood (ML): Accuracy is 90% with communication rate of 4.2 symbols/min. The electroencephalogram (EEG) is a multi-channel time series which captures the electrical activity recorded from the scalp [8] from specific locations (electrodes). After spatial analysis then [1], EEG waveform is analyzed in the frequency domain. In which the reason why frequency bands are important is that during some particular tasks it is not the amplitude of whole signal that changes rather the amplitude of particular frequency changes. Using wavelet transform the EEG signal [7] can be classified into following bands shown in the Fig. 3, and the frequencies and period at which they occur is discussed in Table 1.
3 Proposed Work In the proposed system the work is on developing an epilepsy alert system, using EEG signal, accelerometer and blood pressure sensor. In MATLAB we have designed a GUI. The EEG signal is obtained from the Brainsense headband, which takes the raw EEG signal from the headband. The [1] raw EEG signal is then sent to the neural network, for classification. The neural network will then detect if seizure has occurred, if occurred it will check the readings of blood pressure sensor and the accelerometer
196
S. M. Dhongade et al.
Fig. 3. Time domain analysis of raw EEG signal Table 1. Decomposition into various frequencies of raw EEG signal Band Delta Theta Alpha Beta Gamma
Frequency range (Hz) Period of occurrence 0–3.5 In profound sleep at any age 4–7 In deep relaxed sleep 8–13 Awake and rest mode 14–30 Awake and mental activity >30 Highly focused mode during information processing
which will detect human fall. If the seizure has not occurred it will again check the raw EEG signal till it encounters a seizure. After detecting values from all the three sensors, the values are sent to a fuzzy system, where there are certain rules which are defined to check whether epilepsy is detected or not. The reason why we are considering all three measures i.e. the EEG signal, blood pressure values and the accelerometer is that sometimes due to excess stress or hypertension our EEG signal is abrupt, which may detect epilepsy and may give false results. So to get true results we are using all sensors. If all the rules in the fuzzy system get satisfied, the epilepsy is detected and the SMS is generated on the cellphone (Fig. 4).
4 Materials Required The hardware required in the epilepsy alert system is the blood pressure sensor, Arduino UNO, Brainsense band and a smartphone. The software used is MATLAB (Fig. 5).
Brain Computer Interface Based Epilepsy Alert System
197
Brainsense headband Raw EEG signal No Neural Network Yes Seizure detection
Blood Pressure Monitor
Accelerometer
Human Fall Detection
FUZZY SENT SMS Fig. 4. Proposed system
• Brainsense: This [10] module consists of TGAM 1 Module, dry electrode and Earchip electrode. It is compatible with automatic wireless pairing, Bluetooth 2.1 Class2 is used which has 10 m range. It also possesses Android support. • Blood pressure sensor: It reads at 9600 baudrate the blood pressure and heart rate and displays output on the screen. It displays measurement of systolic, diastolic and pulse. It’s accuracy clinically. Working voltage is +5 V, 200 mA. It stores 60 group memory measurements. • Accelerometer: We use the accelerometer and GPS location from the mobile phone. Human fall is detected by using the traditional human fall detection algorithm [10]. The software also sends the GPS Coordinates from the mobile phone to send alert to the care tracker.
198
S. M. Dhongade et al.
Fig. 5. Setup of the epilepsy alert system
5 Methodology The GUI in MATLAB is used to develop an interface in which all the sensors will display their results by checking the threshold value and the result whether epilepsy is detected or not is displayed. 5.1
Sampling and Windowing
The EEG signal is first after being captured through the Brainsense is sampled at 256 Hz. A raw EEG signal is sampled and is transformed into discrete values. Due to this transformation Fourier transform can’t be taken that is why we take FFT [3]. The frequency domain shows if the clean signal in time domain contains noise, jitter and crosstalk [4]. Since the topologies of time and frequency domain are similar, so end points of time waveform are considered. The FFT is fine when the signal is periodic and the integer number of periods fills the time interval. But when there are non integer periods we may get a truncated waveform which is different from continuous time signal with discontinuous end points [11]. It appears as if one frequency’s energy leaks into another. This is called spectral leakage. Windowing is a technique which at the end of the finite sequence reduces discontinuities. This includes the multiplication of time signal with a finite length window which in turn reduces the sharp discontinuities. Hamming window gives wide peak but low side lobes. It does a great job at reducing the nearest side lobes but is poor at other lobes [5] (Fig. 6). 5.2
Radial Basis Function Neural Network
Radial Basis Function (RBF) is used for feature classification. It is used to approximate the functions. The neurons are continuously added in the hidden layer of the network till it meets the specified goal of mean square error. The data of 500 epileptic patients are taken to train the network. Where, GOAL is the mean squared error goal which by
Brain Computer Interface Based Epilepsy Alert System
199
Fig. 6. Windowing techniques (Hamming and Hanning) give rise to high peak but nice side lobes.
default is 0.0 .SPREAD, denoted by r is extent of stretch of the curve of RBF, default = 1.0. Here for epilepsy detection the ranges and the abruptness are both trained using radial basis function. There are in total 48 hidden layers. The more hidden layers present in the network the more accuracy we get [1]. The two main advantages of the RBF network is that first it keeps mathematics part simple i.e. it uses linear algebra. Second, it keeps computations cheap, i.e. it uses generalized gradient descent algorithm. The approximate function is explained in the equation yðxÞ ¼
N X
wi uðkx xi kÞ
i¼1
Where y(x) is the approximation function, x is the inputs and w is the weight (Fig. 7).
Fig. 7. Radial basis function network
5.3
Fuzzy Logic
In this project complex system behavior are modeled using simple logic rules using fuzzy designer, we have used Mamdani based fuzzy system, and the fuzzy rules are designed in system as shown in Fig. 8, a is seizure, µ is the blood pressure, € is human fall and d is the output.
200
S. M. Dhongade et al.
Fig. 8. Fuzzy inference system designed.
There are total 4 fuzzy rules that are designed in the system. These are discussed in the Table 2. Table 2. Rules in fuzzy implemented system Sr. No. Rule 1 If (a is True) 2 If (a is True) 3 If (a is True) 4 If (a is True)
and and and and
(µ (µ (µ (µ
is is is is
High) and (€ is False) then (d is False) Low) and (€ is False) then (d is False) Low) and (€ is True) then (d is False) High) and (€ is True) then (d is High)
6 Simulation Results and Discussions Sensitivity and specificity are the classification function. The sensitivity (true positive rate) and the specificity (true negative rate) are defined as the function of actual positives and actual negatives. Accuracy is the closeness of the measured results to the true value. Precision is the measure in which repeated measurements demonstrate same outcome under unchanged circumstances. Accuracy ¼
TP þ TN ; TP þ TN þ FP þ FN
Precision ¼
TP TP þ FP
F-score is to calculate the accuracy of the test. It takes into account the precision values and recall values. F1-score has best score at 1 and worst score at 0. Precision is the ratio of related instances to the recovered instances. Recall is the ratio of related instances over the total number of related instances that have been recovered.
Brain Computer Interface Based Epilepsy Alert System
F ¼ 2:
201
precision recall precision þ recall
Where, TP = true positive, TN = true negative, FP = False positive, FN = False negative. During neural network training, the radial basis function is trained and a total of 300 epochs are obtained. An epoch is defined as the number of times our learning algorithm is trained. The more the epochs the more accurate results are obtained, Fig. 9 shown below.
Fig. 9. Graph of performance vs. epochs
Parameter analysis at different epochs (Table 3):
Table 3. Results of various parameters for given epochs Sr. No. Epochs 1. 100 2. 150 3. 200 4. 300
Accuracy 0.85 0.85 0.9 0.92
F1-score 0.857143 0.842105 0.9 0.92
Precision 0.9 0.83 0.9 0.95
The GUI in MATLAB is designed as shown in Fig. 10.
202
S. M. Dhongade et al.
Fig. 10. GUI on MATLAB for the proposed system
7 Conclusion and Future Scope The motive behind using all three parameter is that false epilepsy should not be detected. We have therefore used three different types of sensors i.e. the EEG sensor, accelerometer and the blood pressure sensor. On an average we have considered data of almost 500 patients train our neural network. The algorithm used is radial basis function (RBF) network. On the basis of all the three readings of sensors the fuzzy rules finally decide whether epilepsy is detected or not. This system uses single EEG electrode. In future we will be using 8 and 16 channel electrodes to measure the different epilepsy levels. Acknowledgments. This work was supported by the “JJ Plus Hospitals Pvt. Ltd. & NEURON International”, Aurangabad for recording EEG and providing data of epileptic patients. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest. ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Mamun Rashid, Md., Ahmad, M.: Epileptic seizure classification using statistical features of EEG signal. In: International Conference on Electrical, Computer and Communication Engineering (ECCE), Cox’s Bazar, Bangladesh, 16–18 February 2017 (2017) 2. Al-Omar, S., Kamali, W.: Classification of EEG signals to detect epilepsy problems. In: 2nd International Conference on Biomedical Engineering (2013)
Brain Computer Interface Based Epilepsy Alert System
203
3. Manjusha, M., Harikumar, R.: Performance analysis of KNN classifier and K-means clustering for robust classification of epilepsy from EEG signals. In: IEEE WiSPNET 2016 Conference (2016) 4. Choubey, H., Pandey, A.: Classification and detection of epilepsy using reduced set of extracted features. In: 5th International Conference on Signal Processing and Integrated Networks (SPIN) (2018) 5. AlSharabi, K., Ibrahim, S., Djemal, R.: A DWT-entropy-ANN based architecture for epilepsy diagnosis using EEG signals. In: 2nd International Conference on Advanced Technologies for Signal and Image Processing - ATSIP, Monastir, Tunisia, 21–24 March 2016 (2016) 6. da Silva, F.H.L.: The impact of EEG/MEG signal processing and modeling in the diagnostic and management of epilepsy. IEEE Rev. Biomed. Eng. 1, 143–156 (2008) 7. Saker, M., Rihana, S.: Platform for EEG signal processing for motor imagery - application brain computer interface. In: 2nd International Conference on Advances in Biomedical Engineering (2013) 8. Ilyas, M.Z., Saad, P., Ahmad, M.I., Ghani, A.R.I.: Classification of EEG signals for braincomputer interface applications: performance comparison (2016) 9. Hsieh, Z.-H., Chen, C.-K.: A brain-computer interface with real-time independent component analysis for biomedical applications (2012) 10. Lasefr, Z., Ayyalasomayajula, S.S.V.N.R., Elleithy, K.: Epilepsy seizure detection using EEG signals (2017) 11. Wang, L., Xue, W.: Automatic epileptic seizure detection in EEG signals using multidomain feature extraction and nonlinear analysis. Entropy 19, 222 (2017). Article in MDPI 12. Gajic, D., Djurovic, Z., Di Gennaro, S., Gustafsson, F.: Classification of EEG signals for detection of epileptic seizures based on wavelets and statistical pattern recognition. Biomed. Eng. Appl. Basis Commun. 26(2), 1450021 (2014)
Face Detection Based Automation of Electrical Appliances S. V. Lokesh1(&), B. Karthikeyan1, R. Kumar2, and S. Suresh3 1
2
3
School of Electronics Engineering, Vellore Institute of Technology, Vellore, India [email protected], [email protected] Electronics and Instrumentation Engineering, NIT Nagaland, Dimapur, India [email protected] Electronics and Communication Engineering, Sri Indu Institute of Engineering and Technology Sheriguda, Hyderabad, India [email protected]
Abstract. This paper addresses the problem of automation of electrical appliances with the use of sensors. The proposed method uses Viola-Jones method a real time face detection algorithm for controlling the electrical appliances which are part of our day to day life. A detector is initialized by Viola-Jones based face detection algorithm automatically without the need for any manual intervention. After initializing the detector, the position where the face detected is computed. The region captured by the camera is split into four different regions. Based on the position of the detected face the electrical appliances subjected to that particular region can be controlled. Keywords: Face detection
Viola Jones Split to four regions virtually
1 Introduction With the growth in the field of electronics, constant efforts have been made to automate various day-to-day tasks. One of these tasks is the automation of electrical appliances. This can be accomplished using GSM which requires the users to check the status of the appliances through mobile phones when they step out, or it can also be done through voice commands [12]. However this is still subjected to errors as it requires human effort and is not completely automated. Another way of controlling electrical devices is through the use of sensors [13] which provides complete automation, however the poor responsiveness of the sensors and complexity of network troubleshooting brings down the efficiency of the entire system. Due to the drawbacks posed by the already existing methods, this paper proposes automation of electrical appliances by using face detection. The simplicity of the circuit involved, the wide coverage range and a fully automated setup ensures power efficient control of appliances. Face detection is used to compute the size and identify the position of the face in a given input image. The background information are ignored and only the facial feature from the image are detected. From the detected face we can able to predict the age of the person, gender, facial expression etc. in general terms face © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 204–212, 2020. https://doi.org/10.1007/978-3-030-37218-7_23
Face Detection Based Automation of Electrical Appliances
205
detection is to localize the face in the given input image. Localization helps to know the amount of faces and where the faces are detected in the image.
2 Related Work This section addresses few techniques which are used for computer-vision face detection. Detection based on skin colour segmentation aims at analysing the visuals in different colour spaces and identifying the face with certain threshold values [14]. The different colour spaces that have been widely used are YCbCr, HIS and binary and some works have proposed the combination of different spaces to threshold the facial features. However this method can be efficient for stationary images but not for real time video streams. The second section involves feature extraction for object detection using features from SMQT. In order to increase the computation of SNOW classifier, the classifier is divided into multiple stages on which a single classifier is trained to form a weak classifier and these weak classifiers are combined to form a strong classifier in cascade which help in face detection [1]. Neural network consists of perceptron which are artificial neurons are available in multiple layers and are connected to each other. Perceptron network are under unsupervised learning and no kind of programming is required [2]. Perceptron network composes of a threshold function or summation function. Face detection using neural network is determined by checking whether the detected region is face or not at a small window level. It is not necessary to train with non-face images and thus computational speed is also reduced [3]. Face detection using neural network composes of two steps. A part of image of 20 * 20 size is fed to the neural network as input in the first step. The output of the filter determines whether the detected region is face or not a face. In the second step the efficiency of the overall neural network is increased by fusing all the single neural network to find the overlapping detections and this reduces the false detections. Maximum hyperplane or set of hyper planes in an infinite dimensional space is constructed using a supervised learning model which groups the information in two classes and separation is from two categories is through the hyperplane [4]. The area between hyper plane and closest data points is called the margin. Data points that exists on the margin boundary of the hyper plane are called as support vectors. This method can be applied for object detection then the object can be assumed from positive samples and non-object classes are from the negative samples. The kernels can be of type polynomial kernels or radial kernels. The main difficulty lies in finding the appropriate kernel for the given problem. The selected kernel is used for the classification of problems and the results produced by them may not be good when compared to the result produced from sample set. They are used in application like detection of pedestrians. Haar wavelets are applied on samples which are positive as well as negative to extract features and they are helpful in discrimination of classes [5].
206
S. V. Lokesh et al.
3 Proposed Method 3.1
Face Detection
The face detection was carried out through Viola Jones Algorithm. Paul viola and Michael jones proposed the algorithm in 2001 [6, 7]. Since this many works have been done using this algorithm [8–11]. It is mainly used for detecting faces in real time rather than object detection. The algorithm is very robust and also the computation speed is less. The main goal of face detection is to distinguish face from non faces. It is the first step in face recognition. The algorithm comprises of four steps which are: (1) Haar Feature Selection: Haar-like features are used in object recognition. Computation of image intensity (i.e. calculation of RGB pixel values from every pixel made the computation to be very slower and expensive. In a human face the nose bridge region is lighter than the eye region etc. Using Haar basis function the properties of face are compared using Haar features. They are rectangular pixels of dark and white regions and the resultant of each haar feature is calculated by subtracting the black pixel count from the white pixel count (Fig. 1).
Fig. 1. Haar features
(2) Creating an Integral Image: For a faster and effective way for calculating the pixel values we go for integral image. If we have to use integral image we go for the conversion of original image into gray scale image. Computation of adjacent pixel values with respect to the pixel value at (x, y) forms the integral image which helps in speeding up the computation. Integral image is the summation of values
Face Detection Based Automation of Electrical Appliances
207
pixels above, left and its own pixel value at (x, y). This is done to make the Haar feature compute efficiently (Fig. 2).
Fig. 2. Integral Image
(3) Adaboost Training: In order to separate relevant and irrelevant haar like features we go for Adaboost training. Irrelevant features doesn’t give any information about the detected image whether it is face not. Relevant features are the features which provide information about face. Relevant features are provided with weights to form a weak classifiers. Adaboost training gives a strong classifiers which we able to say whether the detected image is a face or not a face. These strong classifiers are nothing but a linear combination of all the weighted weak classifiers. f ðxÞ ¼ a1 f 1 ðxÞ þ a2 f 2 ðxÞ þ . . .. . . þ an f n ðxÞ
ð1Þ
(4) Cascading Classifiers: Adaboost gives information about the classifiers which will be able say whether the detected image is face or not. Cascade classifiers are used to integrate many classifiers which are useful in neglecting background windows so that more operation are performed on regions where the face is present. Each stage in the classifier consists of distinct sets of face features which is checked step by step for the input video to detect the presence of a face. The stronger classifiers are kept in set 1, so a when the detected image is not a face no need to check for all the classifiers. From the first set of features itself we can be able to predict whether the detected image is face or not a face (Fig. 3).
Feature set1
Feature set2
Found in the image
Found in image
Not found
Not found
No face detected
No face detected Fig. 3. Cascade classifiers
208
3.2
S. V. Lokesh et al.
Automation of Appliances
Raspberry pi is a microprocessor which is a small size CPU which runs on Rasbian OS. It supports cameras as peripherals and hence it is the best choice for this application. Up to three cameras can be connected to a raspberry pi board at a time (Fig. 4).
Fig. 4. Raspberry Pi model 3B
The automation of appliances happens by controlling them using a programmed Raspberry pi microcontroller to which a web camera is connected to record the real time video (Figs. 5, 6 and 7).
Fig. 5. Block scheme of hardware
Fig. 6. Proposed system hardware
Face Detection Based Automation of Electrical Appliances
209
Fig. 7. Design flow of hardware
4 Results and Discussion OpenCV is an open source computer vision library. The library is written in Python and runs under Raspbian Operating System. The proposed method with Viola-Jones detection was simulated in Python Idle using OpenCV Python library. The simulations were done on Raspberry Pi 3 Model B which has RAM of 1 GB. This system was tested on real time videos captured in laboratory. Video is captured at a resolution of 640 * 480 using a web camera. The video is split virtually into four regions. Each region has specific appliance which will be automated only if faces are detected in that region (Fig. 8).
Fig. 8. Region split of a video
Based on the region where the face got detected we can be able to say whether is very close or far to the camera. When camera placed at a certain height from the ground, if face gets detected at the top region we can say that the person is far from the camera since the scale factor of the image will be less. If the image gets detected in the bottom region we can say that the person will be closer to the camera since the scale factor will be high when face gets detected in this region (Figs. 9, 10, 11, 12 and 13).
210
S. V. Lokesh et al.
Fig. 9. Shows the face detected in top left and corresponding led is turned on and rest of the led is turned off.
Fig. 10. Shows the face detected in bottom left and corresponding led is turned on and rest of the led is turned off.
Fig. 11. Shows the face detected in bottom right and corresponding led is turned on and rest of the led is turned off.
Face Detection Based Automation of Electrical Appliances
211
Fig. 12. Shows the face detected in top right and corresponding led is turned on and rest of the led is turned off.
Fig. 13. Shows all the led turned off when there is no face detected.
The proposed method will help in efficient use of power. The system on the whole covers the drawbacks like human fatigue in conventional manual control of appliances, small coverage area of sensors. The system will have a disadvantage when there is no sufficient light to detect face. The above discussed idea was prototyped using led s. The work will be extended to controlling the device based on face recognition and also integrating the appliances to the system. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest ✓ No humans/animals involved in this research work. ✓ We have used our own data.
212
S. V. Lokesh et al.
References 1. Claesson, I.: Face detection using local SMQT features and split up snow classifier. In: 2007 IEEE International Conference on Acoustics Speech and Signal Processing - ICASSP 2007 (2007) 2. Rowley, H.A., Baluja, S., Kanade, T.: Neural networks based face detection. IEEE Trans. Pattern Anal. Mach. Intell. 20(1), 22–38 (1998) 3. Sung, K.K., Poggio, T.: Example-based learning for view-based human face detection. IEEE Trans. Pattern Anal. Mach. Intell. 20(1), 39–51 (1998) 4. Boser, B., Guyon, I.M., Vapnik, V.: A training algorithm for optimal margin classifiers. In: ACM Workshop on Conference on Computational Learning Theory (COLT), pp. 142–152 (1992) 5. Yilmaz, A., Javed, O., Shah, M.: Object tracking: a survey. ACM Comput. Surv. 38(4), 45 (2006) 6. Paul, V., Jones, M.J.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 511–518 (2001) 7. Allen, J.G., Xu, R.Y.D., Jin, J.S.: Object tracking using CamShift algorithm and multiple quantized feature spaces. In: Proceedings of the Pan-Sydney Area Workshop on Visual Information Processing. In: ACM International Conference Proceeding Series, vol. 100, pp. 3–7. Australian Computer Society, Inc., Darlinghurst (2004) 8. Jain, V., Patel, D.: A GPU based implementation of robust face detection system. Procedia Comput. Sci. 87, 156–163 (2016) 9. Chatrath, J., Gupta, P., Ahuja, P., Goel, A., Arora, S.M.: Real time human face detection and tracking. In: International Conference on Signal Processing and Integrated Networks (SPIN), pp. 705–710 (2014) 10. Tao, Q.-Q., Zhan, S., Li, X.-H., Kurihara, T.: Robust face detection using local CNN and SVM based on kernel combination. Neurocomputing 211, 98–105 (2016) 11. Da’san, M., Alqudah, A., Debeir, O.: Face detection using Viola and Jones method and neural networks. In: International Conference on Information and Communication Technology Research, pp. 40–43 (2015) 12. Singh, P., Chotalia, K., Pingale, S., Kadam, S.: Smart GSM based home automation system. Int. Res. J. Eng. Technol. (IRJET) 03(04), 19838–19843 (2016) 13. D’mello, A., Deshmukh, G., Murudkar, M., Tripathi, G.: Home automation using Raspberry Pi 2. Int. J. Curr. Eng. Technol. 6(3), 750–754 (2016) 14. Kang, S., Choi, B., Jo, D.: Faces detection method based on skin color modeling. J. Syst. Archit. 64, 100–109 (2016)
Facial Recognition with Open Cv Amit Thakur, Abhishek Prakash(&), Akshay Kumar Mishra, Adarsh Goldar, and Ayush Sonkar Bhilai Institute of Technology, Raipur, Raipur, Chhattisgarh, India [email protected]
Abstract. In this digital era, the wide-range implementation of digital authentication techniques has not only impacted the home security systems but also in difference facets of industries. The ever-growing facial recognition technology is now dominated by the digital authentication system with a steady increase in the sophistication of the real time image processing operations, which is more accurate and reliable. The system is more secured by implementing facial recognition as a primary security feature for observing a person in a 3D perspective. In recent times, digital authentication system aims to recognize individuals, payment confirmation, criminal identification etc. In this research work, we have worked with a type of facial recognition system, which is based on “Open-CV” model. Open-CV is a library that is built by intel in 1999. It is a cross platform library, primarily built to work on real time image processing systems, which includes the state-of-the-art computer vision algorithms. The system that we have built can achieve an accuracy of about 85–90% accuracy based on the lighting condition and camera resolution. Based on the database, this system is built by taking still images of the subject, then it trains a model and creates deep neural networks and then for detection it matches the subject in real time and give results based on the match. The system that we have built is quite simple, robust and easy to implement, where it can be used in multiple places for a vast application. Keywords: Digital authentication Facial recognition Open-CV Real time image processing
Image processing
1 Introduction The facial Recognition system is a revolutionary technique that is taken in account in this paper as a system that has been proven to be used in the multiple places as for the authentication system to mass crowd monitoring and much more. As per the development in the artificial system and machine learning system the computers are getting far better than the humans in many of the places in lot of the automation fields humans are being replaced by the human but when we look at the field of the facial recognition the story is quite different as per our latest achievement in [1] still humans are very accurate at the facial recognition but with the evolution in the field of AI and help of Machine Learning and Advancement in the Algorithm has made this gap very close with Deep Face AI now getting an accuracy up to 97.35%. As that we have made an approach with a library that is open CV which was developed by Intel in 1999 [2] for © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 213–218, 2020. https://doi.org/10.1007/978-3-030-37218-7_24
214
A. Thakur et al.
regarding the facial recognition and it is used in real time image processing and the library is implemented with python. The implementation of the facial recognition system comes up with its own problem as are: Aging, Pose, Illumination and Emotions these are the factors that we deal and there are physical factors as well like facial hairs, face makeup, face wearables as shown in “Fig. 1”. The other kind of problems that we can face regarding the recognition is the hardware problems like with the image capturing device mainly with the camera the quality of images that it captures and better the quality better will be the image and low quality possess the problem regarding the image processing and detection of facial features that may affect the accuracy of the system.
Fig. 1. Problems in face recognition
2 History The facial recognition system started in the 1960s by “Woodrow Wilson Bledsoe” known as the father of the facial recognition system that time basis of the system is developed regarding measurement of the certain facial features like: nose, eyes, hairline and mouth. These are then stored in a database (Graphics tablet or on RAND tablet) and when a certain photograph came it is matched with the features the results are given, they provide a base for the mechanism for the system. In 1973 the Japanese [3] researcher Kaneda was able to build first automation system which was able to recognize faces. The system was built on the principles that was developed by the bell labs
Facial Recognition with Open Cv
215
as first the image is digitized and the points are identified for the recognition and then measurements are taken and stored in the database and 21 points are taken into the consideration. With evolution of the social media that created a huge database for the images and also these sites use the recognition feature as to identify the individuals for tagging and also a lot of data mining is done by these sites regarding the faces and advanced ML algorithm are used that raised the accuracy of the system. As per the new major breakthrough came in 2017 with the apple’s Face Id feature that is major attention seeker with 3d scanning of face and also is used as primary authentication system in the phone and there is constant increase in system as shown in “Fig. 2”.
Millions
Trends in AuthenƟcaƟon 120 100 80 60 40 20 0 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024
Year Facial RecogniƟon
Iris Scan
Fingerprint
Voice RecogniƟon
Hand Geometry
Fig. 2. Trends in authentication
3 The Nodal Points The pillars of the facial recognition system are the facial features that helps in distinguishing the face of the individuals the nodal points of the face that define the significant face print of the individuals as these are that landmark differ they are like the: Separation between the two eyes, end length of the nose, bottom distance of the eye socket, shape of the cheek bones, the length jaw line as shown in “Fig. 3”. [3] These nodal points total measure makes up the face print that are converted into the numerical code that to be stored in the database. When a face image is seen the nodal points are measured and are compared with the stored data of the face and the matching makes the result. The most significant point about the nodal point is that don’t tend to change with the “aging” and also, they are not affected with fat deposition on the face so that creates a validation for the authenticity of the face they nodal points are chosen to be remain constant throughout the age of a person and doesn’t change much. Now
216
A. Thakur et al.
Fig. 3. Nodal points in image [3]
approximately a total of 80 nodal points in the face are being discovered in the face and marked.
4 Our Approach The current system approach that we have used is based on stages combine to make the final results, these each stage plays a significant role facial recognition as stages are: • • • • 4.1
The Detection Phase The Capturing Phase and Database Creation Creation of a classifier and adding data The Recognition Phase (Fig. 4). Detection, Capturing and Creating Dataset
The capturing phase is used to take the still images of the subject as the detection phase tells the camera about the presence of a face that is made by “harr cascade” implemented in Open CV which has pre-defined features of the face as frontal and side oriented features like: nose, eyes, mouth, eye socket etc. Then the camera takes 20–30 still images of the subject depend on the camera time allotted in the program and then images are cropped to fit to only face image and background are trimmed out and also grey scale conversion is done. The implementation of the system is done in Python 3.0 with the support of the Open CV library that implemented the classifier in LBPH [5, 8, 9] and the also the recognition of the face. The Algorithm has these parameters: • Neighbors: The number of the central point that build around the local central pixel.
Facial Recognition with Open Cv
217
DetecƟon phase
Capturing Phase & Databse CreaƟon CreaƟon of classifier and adding data
RecogniƟon Fig. 4. Phases of working
Training of system as each image a certain id is set to it make an input recognition and also in the output. The first step is followed by creating a sliding window on the image. • The values of the neighbor are set in the 0 to 1 based on higher or lower (Fig. 5).
Fig. 5. Matrix representation of image [4]
4.2
Recognition
The recognition part is done in to two stages: • The first part is the detection of the face in environment. • The second part is the histogram conversion of image and the matching with present dataset.
218
A. Thakur et al.
The first part uses the open CV “harr cascade” for the detection of a face in camera that is processed in real time with open CV. If face detection is completed then the second part is performed as the real time image is converted to the histogram, the comparison of the histogram is done with that present in the dataset for this the Euclidean distance is used and the algorithm proposed the image Id which has highest match.
5 Conclusion With the implementation of the open CV real time face recognition system we get an accuracy up to 90–95% varies upon many conditions like: lighting conditions, camera resolution, the training of dataset and also the number of images taken for training. We have used certain conditions varies the results like as shown in Fig:, the varies in number of dataset images are done from 10 to 50 and also in lighting conditions, the camera resolution also plays a major role in the accuracy better resolution camera have shown a higher accuracy rate, the optimal conditions have given us accuracy of 95.34%. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest. ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. https://research.fb.com/publications/deepface-closing-the-gap-to-human-level-performancein-face-verification/. Accessed 8 July 2019 2. https://en.wikipedia.org/wiki/OpenCV 3. The Impact of Facial Recognition Technology on Society, Derek Benson COMP 116: Information Security, 13 December 2017 4. www.howstuffworks.com. Accessed 10 July 2019 5. Ahonen, T., Hadid, A., Pietikainen, M.: Face description with local binary patterns: application to face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 28(12), 2037–2041 (2006) 6. Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 971–987 (2002) 7. Ahonen, T., Hadid, A., Pietikäinen, M.: Face recognition with local binary patterns. In: Computer Vision, ECCV 2004, pp. 469–481 (2004) 8. Ahonen, T., Hadid, A., Pietikäinen, M.: Face recognition with local binary patterns. Machine Vision Group, Infotech Oulu, University of Oulu, Finland. http://www.ee.oulu.fi/mvg/ 9. LBPH OpenCV. https://docs.opencv.org/2.4/modules/contrib/doc/facerec/facerec_tutorial. html#local-binary-patterns-histograms. Accessed 12 July 2019
Dynamic Link Prediction in Biomedical Domain M. V. Anitha(&) and Linda Sara Mathew Computer Science and Engineering, Mar Athanasius College of Engineering, Kothamangalam 686666, Kerala, India [email protected],[email protected]
Abstract. Anonymous adverse response to medicines available on the flea market presents a major health threat and bounds exact judgment of the cost/benefits trade-off for drugs. Link prediction is an imperative mission for analyzing networks which also has applications in other domains. Compared with predicting the existence of a link to determine its direction is more difficult. Adverse Drug Reaction (ADR), leading to critical burden on the health of the patients and the system of the health care. In this paper, is a study of the network problem, pointing on evolution of the linkage in the network setting that is dynamic and predicting adverse drug reaction. The four types of node: drugs, adverse reactions, indications and protein targets are structured as a knowledge graph. Using this graph different dynamic network embedding methods and algorithms were developed. This technique performs incredibly well at ordering known reasons for unfavorable responses. Keywords: Link prediction factorization
Dynamic network Nonnegative matrix
1 Introduction An adverse drug reaction (ADR) can be definite as ‘an significantly hurtful or unkind response ensuing from an intercession connected to the use of a medicinal product; adverse effects generally predict threat from future management and license inhibition, or exact action, or variation of the dose routine, or removal of the product. Meanwhile 2012, the description has comprised responses happening as a result of fault, abuse, and to supposed responses to drugs that are unrestricted or being used off-label in addition to the sanctioned use of a therapeutic product in usual dosages. Although this alteration possibly varies the reporting and observation agreed out by constructions and medicines managers, in clinical practice it should not affect our method to handling ADRs. Pivotal research commenced in the late 20th and early 21st period in the USA and the UK established that ADRs are a common exhibition in clinical preparation, including as a cause of unscheduled hospital admissions, occurring during hospital admission and establishing after discharge. Medications that have been especially embroiled in ADR-related emergency clinic confirmations incorporate antiplatelets, anticoagulants, cytotoxics, immunosuppressants, © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 219–225, 2020. https://doi.org/10.1007/978-3-030-37218-7_25
220
M. V. Anitha and L. S. Mathew
diuretics, antidiabetics and anti-microbials. Deadly ADRs, when they happen, are regularly owing to discharge, the most widely recognized presumed cause being an antithrombotic/anticoagulant co-controlled with a non-steroidal mitigating drug (NSAID). In recent times, various studies examine topological structures and disclose network roles for inclusive considerate the vital charms of composite networks. Link prediction is place onward to resolve the associated difficulties in some investigates and has engrossed more courtesies. Connection forecast prediction specifies the most effective method to exploit the data of edge and system association to foresee the joining probability of each separate farthest point. It very well may be connected in numerous areas for example discovering drug-to-target connections, learning the probable contrivance which drives coauthor ship progression, rebuilding airline networks, applauding friends and espousing purchasing deals, additional [1]. The closeness depend procedure as interrelation premonition is distinct on the convolution pattern and is more appropriate other procedures. The similitude record is generally exhibited to depict the likelihood of judging the missing in addition up and coming connections. This is a reprobate that immense costs are brought about by scarcely mining of attributes from limits in connection forecast. Seeing the metaphorical closeness dependent on the system development, ordinary strategies for connection forecast can be distanced within three category [2]. With these broad spread of the Internet, informal organizations have turned out to be across the board and individuals can utilize them to develop more extensive impacts. In addition informal communities, organize design are industriously identified in certifiable frameworks, for example, street traffic systems and neural systems. At the point when fabricate a system form to topologically genuinely exact complex frameworks, missing or lay off connections may unavoidably emerge because of time and cost restrictions in tests led for making the systems. Furthermore, since the system connections might be enthusiastically changing in nature, some potential connections may appear later on. Hence, it is necessary to notice such inconspicuous connections, or future connections, from the topological structure of the current system. This is a mission of connection forecast in complex systems. Connection forecast demonstrates a key part in taking care of numerous true issues. For example, in natural systems like ailment quality systems, protein-protein communication systems, and metabolic systems, joins assign the interface relations between the creature and the illnesses spoken to by the hubs they associate with. Multifaceted networks are highly dynamic objects they raised very fast over time with the addition of new edges and nodes making their study very hard. A progressively addressable issue is to comprehend the relationship between two explicit hubs. How new communications are formed? What are the factors responsible for the formation of new associations? More specifically, the problem being addressed here is a prediction of likelihood association between two nodes in the near future. This problem is a link-prediction problem.
Dynamic Link Prediction in Biomedical Domain
221
2 Related Works Two authors advised a standout amongst the most simple connection expectation models that work unmistakably on an interpersonal organization [1–3]. Every apex in the diagram demonstrates an individual and boundary betwixt two peaks suggests the relations between the general population [2, 4, 5]. Assortment of affiliations can be exhibited obviously by allowing parallel edges or by tolerating a proper weighting framework for the edges [6–9]. With the boundless achievement of the Internet [10, 11], social networks have turned out to be prevalent and individuals can utilized them to assembled more extensive associations. Other than informal organizations, arrange structures are continually seen in certifiable framework, for example, road traffic networks and neural networks. At the point when fabricate a system pattern network to topologically surmised complex framework, absents or repetitive connection may unavoidably happen inferable from time and cost confinements in trials led for building the networks. This is an assignment of link expectation in complex networks. Link forecast assumes a key job in taking care of numerous genuine issues [7] (Fig. 1).
Fig. 1. A case of a whole interpersonal organization, and how it may look to an eyewitness
For example, in a biological network like disease-gene network, protein-protein interaction network, and metabolic systems, link demonstrate the communication relations between the life form and the diseases represented by the hubs they associate with. Complex systems are exceedingly dynamic objects; they developed quickly after some time with the expansion of new edges and hubs making their investigation troublesome. A progressively addressable issue is to comprehend the relationship between two explicit hubs. How new connections are framed? What are the variables in charge of the arrangement of new affiliations? All the more explicitly, the issue being tended to here is the prediction of probability relationship between two hubs sooner rather than later. This issue is the link prediction issue.
222
2.1
M. V. Anitha and L. S. Mathew
Problem Definition
Given problem of recognizing associated ADR from drug interaction, are anticipating in the meantime whether the two medications are associating with one another, and what ADRs will be caused. A system G = in all peak e = [u, v] E says to the relationship among u furthermore v that existed at a specific time t(e). Think about numerous collaborations among u as well as v as various fringes, with conceivably unique future-impressions. For multiple times t, t’, allow G[t, t’] signify this subdiagram of G comprising of whole borders among one period seal among t along with t’. Also, it is apparent, the interim [t0, t’0] is preparing interim and [t1, t’1] is the test interim [1]. Presently, the strong definition of the connection expectation issue is, given four distinct occasions t0, t’0, t1, t’1 given G[t0, t’0] contribution to our connection forecast calculation, the yield is a rundown of all plausible future interactions (ends) neither at this time in G[t0, t’0] and all available at G[t1, t’1]. Furthermore, the interpersonal organizations develop on hubs with the inclusion of new people or components in the system, notwithstanding the development through new affiliations. As it isn’t feasible for new hubs not present in the preparation interim, future edges for hubs present in the preparation interim are anticipated. • • • •
Predict obscure unfriendly responses to drugs A Non-negative Matrix Factorization (NMF)-based technique Comparing with Graph embedding based strategy and novel multilayer model Solution of the link prediction issue in dynamic graph
To take care of the link prediction issue, it needs to decide the development or disintegration potential outcomes of links between all node sets. Typically, such potential outcomes can be estimated by likenesses or relative positions between node sets. For an underlying network, there are two different ways to predict the link development: similitude based methodologies and learning-based methodologies. Here take anticipating the new/missing/in secret links. 2.2
Predicting Unknown ADR
The work process of the expectation algorithm is the medicine that reason a inured ADR ought to share convinced highlights for all intents and purpose that are identified with the mechanism(s) by which they cause the ADR, for instance, protein targets, transporters or substance highlights. One-way to deal recognize those highlights is by a basic test. The advancement evaluation test act a shortsighted method to emulate a personal thinking procedure. Such as, since part of obsessed ADR is intrigue, the realized explanation abide probably going toward likewise source queasiness, yet the demand not call such accordingly some remedy this can create sickness keep a reason for ADR of intrigue since realize best medications can basis queasiness. As it were sickness is certainly not a particular component of the avowed reasons for an ADR also effect disregard it [1]. This outcome may be accomplished in strategy by examining for the enhancement of every component (for example likewise causing queasiness or focusing on a particular target) as entire the familiar reasons for an ADR of intrigue versus every single
Dynamic Link Prediction in Biomedical Domain
223
other medication. Highlights that are observed to be fundamentally advanced for the noted reasons for an ADR are utilized in ideal prescient model, different highlights are excluded. The arrangement of highlights can attention of as a “meta-medicate” along just advanced highlights of the obvious reasons for ADR. Any medication would presently able to be rack up for its closeness to this portrait, along with await the noted reasons for this ADR to grade moderately profoundly. Several medication this isn’t presently avowed to orginate the ADR however such has high comparability to enhanced “meta tranquilize” structure is anticipated to-be a promising new reason. This procedure is rehashed for each ADR to produce new forecasts. The highlights delivered against the system (related to the enhancement test) can be utilized for order by any standard calculations. The execution of the expectation calculation was surveyed usual three different ways: • Capacity to accurately order the popular reasons for various ADR • Strength to guss (supplant) peak erased from the chart • Talent to infer ADRs not current in the map but rather saw in EHRs (Fig. 2).
Fig. 2. Review of the NMF calculation. It comprises of three segments: (A) the target work development strategy successfully consolidates the fleeting systems as a regularizer, (B) the improvement methodology processes the refresh rules for the NMF calculation, and (C) the connection expectation the method acquires the fleeting connections at time T+1 dependent on the element lattice F. (Source: Physica A 496 (2018). P. 125).
3 Methodology Thereby yielding better execution. The time and basic data influence the technique to accomplish forecast outcomes that are progressively precise [13]. The highest quality level in our test is developed by the information accessible in an outer database DrugBank. The DrugBank database is an extraordinary bioinformatics and cheminformatics asset that joins point by point medicate (for example substance, pharmacological and pharmaceutical) information with thorough medication target (for example succession, structure, and pathway) data. We look for all the 23 medications to recognize whether a medication is accounted for to associate with some other medications utilizing the Interax Interaction Search 3 motor given by DrugBank.
224
M. V. Anitha and L. S. Mathew
Before going to prediction construct a graph or network (heterogeneous network). It contain different nodes and their relationship. Many of the existing work is focusing on single drug reaction but here concentrate on two or more drugs.
4 Conclusion To tackle the link prediction issue, it needs to decide the arrangement or disintegration conceivable outcomes of connections between all hub sets. Generally, such conceivable outcomes can be estimated by similitudes or relative positions between hub sets. For an underlying system, there are two different ways to foresee the connection development: comparability based methodologies and learning-based methodologies. Here take anticipating the new/missing/in secret links. A critical qualification between the current ways to deal with ADR expectation is whether the forecasts are produced for medication like atoms as of now being developed, or for medications that have experienced clinical preliminaries. From a demonstrating point of view, the key contrast between these two circumstances is the accessibility of a symptom profile for the medication (but a probable fragmented one). Natural auxiliary properties of the medication ADR arrange alone can accomplish shockingly superior in foreseeing extra ADRs, yet the incorporation of extra information enhances execution. ADR expectations for lead particles have would in general spotlight on a synthetic element, perhaps at the same time including quality articulation profiles or medication targets. This examination is centered around foreseeing extra ADRs for medications in the post-promoting stage. By and large, the strategy exhibited could be utilized to make expectations for lead particles, yet the medication information diagram utilized herein is restricted to objective, signs, including ADRs. Absence in either compound highlights implies within each next to no information staying in the present learning chart for a lead particle.
References 1. Bean, D., Wu, H., Dzahini, O., Broadbent, M., Stewart, R.J., Dobson, R.J.B.: www.nature. com/scientificreports 2. Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks. J. Am. Soc. Inf. Sci. Technol. 58(7), 1019–1031 (2007) 3. Leskovec, J., Huttenlocher, D., Kleinberg, J.: Predicting positive and negative links in online social networks. In: International Conference Worldwide Web, pp. 641–650 (2010) 4. Al Hasan, M., Chaoji, V., Salem, S., Zaki, M.: Link prediction using supervised learning. In: Proceedings of SDM Workshop of Link Analysis, Counterterrorism and Security (2006) 5. Bilgic, M., Namata, G.M., Getoor, L.: Combining collective classification and link prediction. In: Proceedings of the Workshop on Mining Graphs and Complex Structures at ICDM Conference (2007) 6. Wang, C., Satuluri, V., Parthasarathy, S.: Local probabilistic models for link prediction. In: Proceedings of International Conference on Data Mining, ICDM 2007 (2007)
Dynamic Link Prediction in Biomedical Domain
225
7. Doppa, J.R., Yu, J., Tadepalli, P., Getoor, L.: Chance-constrained programs for link prediction. In: Proceedings of Workshop on Analyzing Networks and Learning with Graphs at NIPS Conference (2009) 8. Taskar, B., Wong, M.F., Abbeel, P., Koller, D.: Link prediction in relational data. In: Proceedings of Neural Information Processing Systems, NIPS 2003 (2003) 9. Popescul, A., Ungar, L.H.: Statistical relational learning for link prediction. In: Proceedings of Workshop on Learning Statistical Models from Relational Data at IJCAI Conference (2003) 10. Popescul, A., Ungar, L.H.: Structural logistic regression for link analysis. In: Proceedings of Workshop on Multi-relational Data Mining at KDD Conference (2003) 11. Sarukkai, R.R.: Link prediction and path analysis using Markov chain. In: Proceedings of the Ninth World Wide Web Conference, WWW 2000, pp. 377–386 (2000) 12. De Verdière, É.C.: Algorithms for embedded graphs. MPRI (2013–2014) 13. Ma, X., Sun, P., Wang, Y.: Graph regularized nonnegative matrix factorization for temporal link prediction in dynamic networks. Sci. Direct 496, 121–136 (2018). www.elsevier.com/ locate/physa
Engineering Teaching: Simulation, Industry 4.0 and Big Data Jesús Silva1(&), Mercedes Gaitán2, Noel Varela3, and Omar Bonerge Pineda Lezama4 1
Universidad Peruana de Ciencias Aplicadas, Lima, Peru [email protected] 2 Corporación Universitaria Empresarial de Salamanca – CUES, Barranquilla, Colombia [email protected] 3 Universidad de la Costa (CUC), Calle 58 # 55-66, Atlantico, Barranquilla, Colombia [email protected] 4 Universidad Tecnológica Centroamericana (UNITEC), San Pedro Sula, Honduras [email protected]
Abstract. In the educational field, there is a paradigm shift, where the focus is on the student and his active role in the training process, and where there is a turn that involves moving from content teaching to the training of competences. This necessarily requires higher education institutions to articulate innovation processes that involve the entire academic community. On the other hand, in the current context of technological development and innovation, companies, particularly manufacturing companies, are committed to reviewing and adapting their processes to what has been called Industry 4.0, a circumstance that entails the need to require new professional profiles that have competencies not only technological, but fundamentally those that will allow them to be competitive in a world where technology is renewed at an ever-increasing speed. The work presents implementation of innovation strategies in the teaching methodology from the integration of simulation software under the educational format of Problem-Based Teaching, which on the one hand aims to develop various competencies increase levels of motivation and satisfaction. Keywords: Innovation
Simulation Teaching Engineering Big data
1 Introduction There is a coincidence in the academic fields of the need for a change in traditional teaching methods, however, it is not easy to innovate in the pedagogical models established in the different institutions [1]. It is noted that trends in education promote change from a passive learning approach, to an active one in which the participation and interaction of students and teachers is what is really significant [2]. These trends are highly positive in the training of engineers, because of the role they play, it is not enough that they have knowledge of basic and technological sciences, but it is essential © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 226–232, 2020. https://doi.org/10.1007/978-3-030-37218-7_26
Engineering Teaching: Simulation, Industry 4.0 and Big Data
227
that during their training they achieve develop capacities to apply such knowledge in practice. In the field of engineering teaching, the paradigm of competency-based training and the application of teaching and evaluation methods that contribute to achieving this achievement are installed. On the other hand, from the productive sector, the concept of industry 4.0 appears, as a new stage of technological development and manufacturing processes, it is a relatively new concept that refers to the introduction of technologies in the manufacturing industry [3, 4]. The concept emerged in 2011, in Germany, to refer to a government economic policy sustained in high-tech strategies [5], such as automation, process digitization and the use of technologies in the field of electronics and applied to manufacturing [6]. These changes are geared towards the personalization of production, the provision of services, high value-added businesses, as well as towards interaction capabilities and the exchange of information between people and machines [7]. The conceptualizations that exist to date on the so-called Industry 4.0 are recent, however it can be defined as those that are physical and devices with sensors and software that work in the network and allow to predict, control and better plan business and organizational outcomes [8]. Therefore, there is an interest in the emergence of an industry that is based on the development of systems, the Internet of Things (IoT) and the internet of people and services [9], along with additive manufacturing, 3D printing, simulation, reverse engineering, big data and analytics, artificial intelligence and virtual reality among others. It is not new that discrete event simulation has great potential to contribute to the performance of engineers and managers of strategic and operational planning. These are techniques that improve the ability to analyze complex processes and facilitate decision-making [10]. Following specialists in the field, the simulation seeks to describe the actual behavior of a system or part of it, also contributes to the construction of theories and hypotheses allowing the observation of impacts and risks [11]. It also allows you to test new physical devices or systems, without compromising resources or work positions. The simulation time can also be sectioned, expanded or compressed, allowing the analysis of tasks, long-term scenarios or complex phenomena [13]. This is how, in manufacturing processes, simulation has been increasingly used in order to identify risks and impacts to make preventive decisions in the sector [14]. In short, it can be said that to talk about Industry 4.0, is to address an innovative approach to products and processes that arise from smart factories, that integrate into work networks and foster new forms of collaboration. In this context, the concept of intelligent manufacturing arises that implies the possibility of digitally representing every aspect of the process with support in design and assisted manufacturing software (CAD, CAM), product lifecycle management systems (PLM), and use of simulation software (CATIA, DELMIA, among others) [15]. As noted, simulation is one of the pillars of change in the productive areas. While the contribution of simulation technology to education is undeniable, it is noted that there are some restrictions in this regard. On the one hand, the need to explore and experiment with the various tools available on the market (commercial or open source), and fundamentally develop case studies. It is a complex activity, which requires intense
228
J. Silva et al.
interaction between specialists in the subject on the one hand and those with experience in the management of the software, on the other. As well as simulation techniques in the field of industry, they have an important added value, in terms of education the use of simulation software allows students to acquire skills in the management of systems very similar to those that in their work environment [16]. In the same vein [17] they consider that the use of simulation is particularly useful, from the perspective of the teaching and learning process, since, through these techniques, the student is able to operate models in a real system, to carry out experiences with assess the consequences and limitations of its application.
2 Materials and Methods The case that is presented-Robotic Simulation-, has been developed by the Software Delmia Quest, by the technical team of the Laboratory of Simulation and Computational Modeling, of the University of Mumbai in India in 2017. It has been applied in the subject Logical Processes and has been integrated into the Bank of Business Cases that the university has. It aims to perform movements of an anthropomorphic robot and in turn save points in space, so that they can then be used by the robot. For the practice a robot brand KUKA, model KR6 R900 of 6 degrees of freedom is used. It is equipped with a pneumatic gripper for the clamping of parts (see Fig. 1).
Fig. 1. Robot brand KUKA, model KR6 R900 6-degree freedom.
To achieve this objective, several steps were considered: (a) system definition, (b) model formulation, (c) variable identification, (d) data collection, (e) model validation, (f) implementation and interpretation. “Digital Enterprise Lean Manufacturing Interactive Application” (DELMIA) is a technology solution [18], focused on digital production and manufacturing. It is a tool that supports the engineering and manufacturing areas of companies, by virtue of which through it you can plan and orient actions towards continuous improvement, starting with the generation of 3D simulations with which it is obtained in different possible
Engineering Teaching: Simulation, Industry 4.0 and Big Data
229
scenarios. Based on the information it provides, at the industry level, it supports critical decision-making on production lines and therefore helps ensure line efficiency. In the DELMIA environment, different software plug-ins can be differentiated, such as (Table 1). Table 1. DELMIA add-ons. Source: DELMIA Products and Services. Complement Delmia Quest [19]
Use Discrete event simulator focused on generating 3D scenarios that provide certainty for production line engineers
Delmia Robotics [20]
Production cell simulator with Virtual Robots to recreate plant operation behaviors
Delmia Ergonomics [21–26]
Workstation simulator where company staff are involved
Advantages - Detection of bottlenecks - Productivity assessment - Obtaining statistics - Validation of resource utilization and material handling systems with various mixtures of products - Simulation of robots of various brands - Simulation of product handling with robots - Simulation of spot welding - Simulation of arc welding It seeks to meet two main objectives: - Ergonomic evaluation where anthropometric characteristics can be established to obtain results of the type RULA, NIOSH and SNOOK and CIRIELLO - Evaluation of the workspace, where the activities of the operators in their environment are generated Work
Students receive a guide describing the case in advance, a simplified instruction manual for operating the software and the slogan to be developed (Fig. 1). At the end of the practice, and before retiring from the Laboratory they are asked to complete a survey, the structure of which includes three axes: Pre-class initiative in the laboratory; Development of activity in the laboratory and Development of practical work. The results obtained are systematically treated on a regular basis.
3 Results Taking into account that the interest of the project, is focused on evaluating the experience of such innovation in the teaching and learning process, at the end of the laboratory activity, a survey is distributed that seeks to gather information that practice towards continuous improvement of it. The survey, since 2017, was applied to 4171 students according to the following distribution: 2017 (1452 students); 2018 (1587 students); 1st Quarter 2019 (1132 students). With respect to the profile of respondents,
230
J. Silva et al.
the average age of the total number of students is slightly higher than 27 years (28.55). In terms of sex, 83.2% are men and 18.3% are women, while 1.7% do not specify it. In the surveys it is possible to identify some questions that are related to the understanding, use and application of the software seen in the laboratory, since they have been selected based on the interest that is pursued in this work. Including: 1. Existence of training instance in the use and scope of the software used: affirmative answers to the question is in the range [87.33%; 94.12%] with a confidence of 95%. No practices were identified in which negative answers to this question will be found. 2. Teacher’s concern to clarify doubts about the use of the software: The percentage of affirmative answers to the question is in the range [96.88%; 99.11%] with a confidence of 95%. 3. Adequacy of time to develop the practice: The information received through the surveys allowed, to adapt the practice, so that, without losing its essence, the concepts are fixed and the programming competence acquired to the extent expected. This possibility of readjusting the practices based on the results obtained in the survey favored, the indicator, which in this year (2019), stands at 96.66%. The percentage of affirmative responses for this variable is in the range [91.14%; 98.60%] with a confidence of 95%. 4. Level of understanding of the class. This variable was measured with a Likert scale ([1–6] being 1 Unsatisfied and 5 Very satisfied): all quarters are above 4.26 on average.
4 Conclusions Based on the results obtained it can be observed that the innovation proposal in the teaching strategy implemented through the Laboratory of Simulation and Computational Modeling has had a very good acceptance by the students. It is an experience that on the one hand motivates them, and on the other they value the tool as an approach to the world of work, through open problems, in which to solve them it is necessary to interact with colleagues. They also value the dedication and concern of teachers to approach and provide additional explanations, aimed at solving the problem that is posed to them; it is also noted that the previous instance of induction to the software is important and necessary in its training process. In relation to the level of understanding of the class and satisfaction with the software we also find a satisfactory level of response. In all cases it is observed that the values of the responses have improved over time. Improvements in the teaching process would be determining an improvement in overall assessments over the experience [27]. One factor that is repeated is that the time spent on the case is insufficient for students from two points of view: (a) on the one hand in relation to the practice itself. In this sense, work has been done to readjust its scope so that the case can be resolved in a shorter time. (b) However, the greatest point is in relation to the meager time they consider to be devoted to this type of activity during the race. In this case, it is important to note that the restrictions that the
Engineering Teaching: Simulation, Industry 4.0 and Big Data
231
institutions have, both buildings and equipment, which are very rigid factors and therefore their modification requires long-term planning. Also, in relation to the software used, another limitation is the number of licenses available to the Academic Unit, taking into account that it is facing proprietary software. These are aspects, which are being worked on in order to expand the coverage of subjects that are involved by developing cases to address different topics, as well as providing the Laboratory with a technical critical mass trained, for the structuring of cases that are proposed from the various chairs. If we analyze the experience longitudinally, it is observed that there has been a positive trend in the perception of students and this is taken as a signal so that from the institutions, work continues on the analysis of the scopes and characteristics to implement a sustained process towards the continuous improvement of the ongoing educational innovation that is being developed.
References 1. Marston, S., Li, Z., Bandyopadhyay, S., Zhang, J., Ghalsas, A.: Cloud computing—the business perspective. Decis. Support Syst. 51(1), 176–189 (2011) 2. Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., Zaharia, M.: A view of cloud computing. Commun. ACM 53(4), 50–58 (2010) 3. Mell, G.: The NIST definition of cloud computing. NIST Special Publication 800-145 (2011) 4. Valarie Zeithaml, A., Parasuraman, A., Berry, L.L.: Total, quality management services. Diaz de Santos, Bogota (1993) 5. Sommer, L.: Industrial revolution—Industry 4.0: are German manufacturing SMEs the first victims of this revolution? J. Ind. Eng. Manag. 8, 1512–1532 (2015) 6. Tao, F., Zuo, Y., Xu, L., Zhang, L.: IoT-based intelligent perception and access of manufacturing resource toward cloud manufacturing. IEEE Trans. Ind. Inf. 10(2), 1547– 1557 (2014) 7. UNESCO: Engineering: Issues Challenges and Opportunities for Development (2010) 8. Yajma, K., Hayakawa, Y., Kashiwaba, Y., Takahshi, A., Oiguchi, S.: Construction of active learning environment by the student project. Procedia Comput. Sci. 96, 1489–1496 (2016). https://doi.org/10.1016/j.procs.2016.08.195 9. Izquierdo, N.V., Lezama, O.B.P., Dorta, R.G., Viloria A., Deras, I., Hernández-Fernández, L.: Fuzzy logic applied to the performance evaluation. Honduran coffee sector case. In: Tan, Y., Shi, Y., Tang, Q. (eds.) Advances in Swarm Intelligence, ICSI 2018. LNCS, vol. 10942. Springer, Cham (2018) 10. Amelec, V.: Increased efficiency in a company of development of technological solutions in the areas commercial and of consultancy. Adv. Sci. Lett. 21(5), 1406–1408 (2015) 11. Viloria, A., Lis-Gutiérrez, J.P. Gaitán-Angulo, M., Godoy, A.R.M., Moreno, G.C., Kamatkar S.J.: Methodology for the design of a student pattern recognition tool to facilitate the teaching - learning process through knowledge data discovery (big data). In: Tan, Y., Shi, Y., Tang, Q. (eds.) Data Mining and Big Data, DMBD 2018. LNCS, vol. 10943. Springer, Cham (2018) 12. May, R.J.O.: Diseño del programa de un curso de formación de profesores del nivel superior. Mérida de Yucatán, UADY, p. 99 (2013)
232
J. Silva et al.
13. Medina Cervantes, J., Villafuerte Díaz, R., Juarez Rivera, V., Mejía Sánchez, E.: Simulación en tiempo real de un proceso de selección y almacenamiento de piezas. Revista Iberoamericana de Producción Académica y Gestión Educativa (2014). http://www.pag. org.mx/index.php/PAG/article/view/399. ISSN 2007-8412 14. Minnaard, C., Minnaard, V.: Sumatoria de Creatividad + Brainstorming la Vida Cotidiana: Gestar problemas de Investigación/Summary of Creativity + Brainstorming + Everyday Life: Gestating Research Problems. Revista Internacional de Aprendizaje en la Educación Superior, España (2017) 15. Thames, L., Schaefer, D.: Software defined cloud manufacturing for industry 4.0. Procedía CIRP 52, 12–17 (2016) 16. Gilchrist, A.: Industry 4.0 the Industrial Internet of Things, 2016th edn. Bangken, Nonthaburi (2016) 17. Arango Serna, M., Zapata Cortes, J.: Mejoramiento de procesos de manufactura utilizando Kanban; Manufacturing process improvement using the Kanban. Revista Ingenierías Universidad de Medellín 14(27), 2248–4094, 1692–3324 (2015) 18. Banks, J.: Introduction to simulation. In: Proceedings of the Simulation Conference, pp. 7– 13 (1999) 19. Cataldi, Z.: Metodología de diseño, desarrollo y evaluación de software educativo. UNLP, p. 74 (2000) 20. Chen, F., Deng, P., Wan, J., Zhang, D., Vasilakos, A.V., Rong, X.: Data mining for the internet of things: literature review and challenges. Int. J. Distrib. Sens. Netw. 11, 103–146 (2014) 21. Gilberth, S., Lynch, N.: Perspectives on the CAP theorem. Comput. IEEE Comput. Soc. 45, 30–36 (2012) 22. Gouda, K., Patro, A., Dwivedi, D., Bhat, N.: Virtualization approaches in cloud computing. Int. J. Comput. Trends Technol. (IJCTT) 12, 161–166 (2014) 23. Platform Industrie 4.0: Digitization of Industrie – Platform Industrie 4.0, pp. 6–9, April 2016 24. Schrecker, S., et al.: Industrial internet of things volume G4: security framework. Industrial Internet Consortium, pp. 1–173 (2016) 25. An Industrial Internet Consortium and Plattform Industrie 4.0: Architecture alignment and interoperability (2017) 26. OpenStack: Introduction to OpenStack, chap. 2. Brief Overview. http://docs.openstack.org/ training-guides/content/module001-ch002-brief-overview.html 27. OpenStack: Introduction to OpenStack, chap. 4. OpenStack Architecture. http://docs. openstack.org/training-guides/content/module001-ch004-openstack-architecture.html
Association Rule Data Mining in Agriculture – A Review N. Vignesh(&) and D. C. Vinutha Department of Information Science and Engineering, Vidyavardhaka College of Engineering, Mysuru, Karnataka, India [email protected]
Abstract. Agriculture is one of the most important occupations carried out by centuries - old people. Agriculture is our nation’s backbone occupation. This is actually the occupation that fulfills the people’s food needs. In this paper, we examine certain data mining techniques (Association rule) in the field of agriculture. Some of these techniques are explained, such as Apriori algorithm, K-nearest neighbor and K-means and the implementation of these techniques is represented in this field. In this field the effectiveness of the Associative rule is successful. In view of the details of agricultural soils and other data sets, this paper represents the role of data mining (association rule). This paper therefore represents the various algorithms and data mining techniques in agriculture. Keywords: Agriculture neighbor
Data mining Apriori K-means K-nearest
1 Introduction Agriculture contributes nearly 10–15% of the GDP to India’s economy [1]. The dataset is enormous in agriculture. Due to population growth, the demand for food supply is growing day by day. Therefore, scientists, trainers, researchers and governments always work to increase food supplies. For this purpose, the various techniques are applied. We discuss the conversion of an enormous dataset into knowledge. However, there is one major problem with the selection of appropriate data mining techniques and algorithms [2]. Here the data set can be converted into knowledge of ancient techniques and even current trends in agriculture [3]. As we can see, one example. i.e. Knowledge of crops like when and the harvest period should be plant. Even in the field of losses that can occur or the profit of the crop. It is very important in this field to know how the soil reacts with which fertilizers and to know the strength of the soil are important aspects. In this field, data mining techniques are merely statistics in which they can be properly explained [4]. In agriculture, the data can be of two types which are fundamental and technical in the unique way [3]. Which can be transformed into knowledge and profits beyond normal. The association rule is very useful where the data is enormous, which makes it very appropriate in this area. Where there is no such end to the data, so that we can finish our analysis, because new techniques or patterns are always evaluated. The Data mining is © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 233–239, 2020. https://doi.org/10.1007/978-3-030-37218-7_27
234
N. Vignesh and D. C. Vinutha
therefore used fully to restore the data that is evolving and to keep the old data together. Future forecasts are very important in the field of agriculture [2]. Because this field depends entirely on the future. Data mining for agriculture has many techniques, but here we use few techniques. Like the Apriori algorithm, K-neighbor, K-means that are more than enough to explain the purpose of this research paper.
2 Background of the Work The mining of data is the process of discovering the way things are in large data sets, including statistics, machine learning and database systems. It also adds new techniques discovered in the data set previously saved. But we concentrate only on statistics for this paper. The main goal of data mining is to separate the knowledge from the available data and then customize it into a simple, comprehensible substance and also use it for prediction methods [1].
3 Data Mining Techniques The explanation of the techniques used here is briefly given below. The data processing techniques and can be divided into two types i.e. Clustering and classification: This is mainly based on the grouping of unknown samples acquired from classified samples. These are training sets with which classification techniques are trained. In fact, Knearest neighbor is referred to as a training set because it learns nothing but classifies the given data into an ideal group. Because it uses a training group every time the classification needs to be done [6]. If classification is not available or if no previous data are available, the clustering techniques are used i.e. K-means where unknown data can also be converted into clusters. And it is the most used techniques for this operation. This technique is used in many fields and suits the agricultural sector very well because unknown samples are always evolving frequently. This paper demonstrates efficiently the different mining techniques and algorithms that can be used in agriculture. Since it contains the large data set. It is therefore a difficult task to process, so that we use some efficient techniques, such as the A priori algorithm, the nearest neighbors, the k-means, the bi-clusters. The details of the above techniques are explained below. 3.1
Apriori Algorithm
Apriori is a bottom - up approach. The Apriori algorithm is an impact for mining for frequent data sets over the Boolean association rule and is intended to work on transactions. It works in this field on soil, crops, income transactions, profit and losses. The pre-processing of numerical data sets is much needed in the association rule, which increases efficiency (also known as a training set). After the pre-processing of these data, they are used to compare the attributes of days a week to weeks a month. The values can be derived from the minimum price, the maximum price and even the moderate price of the trendy crops on the market. Because of its combination explosion, it is necessary to determine the relationship between these sessions in a day with a
Association Rule Data Mining in Agriculture – A Review
235
minimum of more than 10. Once the frequent pattern has been obtained, it is easy to create an association rule with a trust greater than or equal to 20. Apriori algorithm is a mining technique that is mainly used to find the pattern that is very common. In this paper, we store the data set on crops and soil activities and the activities that took place during daytime sessions. Let us consider crops which trend in one crop during the day’s session let the size of the crop be k. Let the crop be fk and the work be wk. Here Apriori searches and scans the data set and derives the frequent trends in a marketing day size of 1 on the particular day. 1. Generate wk + 1, work performed on the frequent trend in a day for certain seeds and the market size will be k + 1 and the specific market size will be k [2]. 2. Varify the data set and support for the work performed on a day-to-day basis for all types of crops on a specific market [2]. 3. Combining the day-to-day work for a market crop that meets the minimum requirements for saturation on the market Apriori Algorithm does work on two steps: i.e. it creates wk + 1 from fk 1. Join Step: Generate Rk + 1, which is the day-to-day trend for crops of market size f k + 1. Which is done by uniting the two part-to-day trend for crops of market size K 2. Prune Step: Examine the trends of the day-to-day morning, noon and evening session for all crops of market size k in Rk + 1 which will create the regular trend for crops of market size k in Rk + 1. This step will take count of the frequent trends that occurred recently or in certain period of time as we specified earlier above the three sessions in a day morning, noon and evening. The crop it will be taken as ‘k’ as the market size. Then the frequent trend that occurred on the same plot that will be taken as ‘k + 1’. The frequent trend that occurred in particular time in particular sessions of the day will be taken for count. Examine the data which has been collected previously from the frequent trend that occurred before. The trend may occur in any of the three sessions. Compute the contribution provided by each trend that occurred in any particular time in a day. That will be available in the particular market. The final step is that adding all the trends that occurred before in particular time i.e., in particular session of the particular day for the specific crop in the specific market. Which the support given by those trends to the demand that is required by Fk + 1. Thus if the minimum demand must be satisfied if not the steps must be carried on until we get the result. 3.2
K-nearest Neighbor
K-nearest neighbor is one of the methods used in data mining classifications and regression. It is also referred to as a lazy learning algorithm and also as a nonparametric based technique [5]. The bottom line assumption in the KNN algorithm is that it should have the same sample data set with the same classifications. It is mainly an experimental theoretical assumption and does not match the practical data set. In this case, the parameter k counts the number of samples known to classify the unknown. And the classifications repeatedly used in the nearest neighboring sample are allocated
236
N. Vignesh and D. C. Vinutha
to the unknown samples used The results of the KNN are helpful in the practical world [6, 7]. The KNN method will provide us with a very classic and simple classification rule and it is expensive to proceed, as we said the bottom line is to consider unknown samples and the distance from all the known samples nearby must be calculated and this calculation is cost-effective [3]. Of course, the KNN method uses the training set but does not extract it.
for all Un-Sample(i) ||Unknown samples for all sample(j) ||Known samples Compute the distance between Un-Samples(i) and Samples(j) end for search the K smallest distance locate the correlate samples Sample(ji...jk) assign Un-Sample(i) to the group which appears more often end for [1]. 3.3
K-Means Algorithm
It is a clustering method for vector quantification originally from signal processing, which is most commonly used in data mining clustering analysis. The system is given with a data set that is unknown in class. The main aim of the K-means algorithm is to find the same data set in a cluster. The measurement is provided with efficient distance. Assign the value to each sample at random in the clusters and calculate the center for each cluster. If clusters are unstable, the distance between the center and the sample should be calculated. The closer sample to the center must be found. And the center for the changed cluster should be reputed [6–8].
assign values to the object sample to one of the K clusters S(j) find the center c(j) for all the sample S(j) while (cluster unstable) for each Sample(i) find out the distance from center c(j) to Sample(i) find j* in c(j)* is closer Sample(i) Assign Sample(i) to the cluster S(j)* Re-compute center end for end while [1]. Many k-means algorithms have been created and even tried to implement in the field of data mining [6]. However, K-means does not allow empty clusters. Many types have been created, some of which are independent parameters where the k value is assigned by the execution time (Table 1).
Association Rule Data Mining in Agriculture – A Review
237
Table 1. Comparison between the different association rule data mining algorithm in agriculture Parameters Apriori Rain prediction Yes Disease prediction Yes Pattern of crops No Trends in sessions that are count No Specifying market crops Yes Dependency of crops No Specifying sessions in a particular day Yes Soil pattern No Nutrients in soil No
k-NN No No Yes No No Yes No Yes Yes
k-Means No No Low changes Yes No No No Yes Yes
4 Application of Data Mining Techniques in Agriculture Agriculture is a giant dataset and very difficult to understand. But this agriculture has been followed for ages that meet the basic needs of human food. I’m planning this field was started long before, but that doesn’t mean it’s already improved. If a system works properly, it doesn’t mean that it doesn’t have any improvements anymore [9, 10]. Therefore, this paper discusses the few techniques that can be implemented in the field of agriculture. We discuss crops, soil activities and much more. The problem can occur and the problems can arise in future forecasts and so on. 4.1
Application of Apriori Algorithm
The Apriori algorithm is used to find the pattern that is often used. We can frequently take the pattern of cultivation that people used to perform. The pattern has to be decided according to the soil wealth. That’s how the Apriori algorithm helps. The activity pattern of the three-shift crop should be taken into account. Apriori helps you to search and scan the data set, which is very frequent and time consuming. It estimates the crop activity and the market value, which will also help in future forecasting. 4.2
Application of k-Nearest Neighbors
In every field the technology has been greatly improved. We can implement many things in agriculture in this paper when we discuss soil and fertility. Using satellite sensors to observe land and soil activities is very easy to carry out in this field, researchers use means and k-NN techniques that are most commonly very efficient in either way. Sensors can observe and provide enormous amounts of data set at least for a long time, which is much needed in the field of agriculture [6, 7]. 4.3
Application of k-Means Algorithm
This algorithm is used to determine the fertility of the soil. To find the fertility, this uses AHP to determine soil nutrient values. Then the k-means algorithm is added to
238
N. Vignesh and D. C. Vinutha
continue. The optimum efficiency indicates the classification that improves the clustering algorithm’s efficiency. The weighted k-means clustering was more accurate than the unweighted clustering, and more efficient. The evolution of the changes that can occur in the soil must be studied in advance and with which fertilizer the soil works well. Using these methods, the soil nutrient value will improve after years. The result is that improved clustering is an efficient method for determining soil fertility [6, 7].
5 Conclusion Agriculture is one of the major occupations of Indian peoples. And in developing countries such as India, technology must be implemented in this field. Technology implementation will adversely improve the field. And it will also help farmers in a variety of ways, where the term intelligent work is used instead of hard work. Data mining plays its important role in this area if it occasionally finds difficulties and helps them to make decisions. We discuss soil and crop activities in this paper. Therefore, the characteristics of the soil can be effectively studied instead of using some old techniques, we will use modern techniques to study these activities. And the process of growing and forecasting the future for growing can be carried out efficiently. Therefore, revenue can also be improved and the losses that can occur can also be avoided. Acknowledgements. The authors express gratitude towards the assistance provided by Accendere Knowledge Management Services Pvt. Ltd. In preparing the manuscripts. We also thank our mentors and faculty members who guided us throughout the research and helped us in achieving desired results. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Jaganathan, P., Vinothini, S., Backialakshmi, P.: A study of data mining techniques to agriculture. Int. J. Res. Inf. Technol. 2, 306–313 (2014) 2. Thakkar, R.G., Kayasth, M., Desai, H.: Rule based and association rule mining on agriculture dataset. Int. J. Innov. Res. Comput. Commun. Eng. 2, 6381–6384 (2014) 3. Khan, F., Singh, D.: Association rule mining in the field of agriculture: a. Int. J. Sci. Res. Publ. 4, 1–4 (2014) 4. Nasrin Fathima, G., Geetha, R.: Agriculture crop pattern using data mining techniques. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 4, 781–786 (2014) 5. Ramesh Babu, P., Rajesh Reddy, M.: An analysis of agricultural soils by using data mining techniques. Int. J. Eng. Sci. Comput. 7, 15167–15177 (2017) 6. Mucherino, A., Papajorgji, P., Pardalos, P.M.: A survey of data mining techniques applied to agriculture. Oper. Res. 9, 121–140 (2009)
Association Rule Data Mining in Agriculture – A Review
239
7. Patel, H., Patel, D.: A brief survey of data mining techniques applied to agricultural data. Int. J. Comput. Appl. 95, 6–8 (2014) 8. Ramesh, V., Ramar, K., Babu, S.: Parallel K-means algorithm on agricultural database. Int. J. Comput. Sci. 10, 710–713 (2013) 9. Bharatha Krishnan, K., Suganya, S.S.: Application of data mining in agriculture. Int. J. Res. Comput. Appl. Robot. 5, 18–21 (2017) 10. Ayesha, N.: A study of data mining tools and techniques to agriculture with applications. Spec. Issue Publ. Int. J. Trend Res. Dev. 1–4 (2017)
Human Face Recognition Using Local Binary Pattern Algorithm - Real Time Validation P. Shubha(&) and M. Meenakshi Department of Electronics and Instrumentation Engineering, Dr. Ambedkar Institute of Technology, Bangalore 560056, India [email protected], [email protected]
Abstract. A real time face recognition using LBP algorithm and image processing techniques are proposed. Face image is represented by utilizing information about shape and texture. In order to represent the face effectively, area of the face is split into minute sections, then histograms of Local Binary Pattern (LBP) are extorted which are then united into a single histogram. Secondly, the recognition is carried out on computed feature space using nearest neighbor classifier. The developed algorithm is validated in real time by developing a prototype model using Raspberry Pi single board computer and also in simulation mode using MATLAB software. The above obtained results match with each other. On comparing both the results, recognition time taken by the prototype model is more than that of the simulation results because of hardware limitations. The real time experimental results demonstrated that the face recognition rate of LBP algorithm is 89%. Keywords: Local Binary Pattern (LBP) Matlab Raspberry Pi Real time face recognition
Histogram Classifier
1 Introduction Face recognition is a popular image processing application. Using this method a person can be identified and verified with the database. The process identification involves face recognition, which compares unknown face image with the large set of face image database. The obtained value is the measure of the similarity between two images. The face image is recognized when the obtained value is equal to or greater than the threshold value. The recognition process includes the following three stages. Firstly, Face representation carries out the process of modeling a face. Secondly, feature extraction stage includes extraction of unique and useful features of a face. Lastly, classification stage compares the face image with database images. Applications of face recognition includes in various fields such as banking, airports, passport office, government offices, security systems and many more. There are two phases of face recognition. The first phase includes training of the faces which are saved in the database and second phase is the verification stage to obtain the exact image of the face. Face recognition technique is considered as a challenging method due to variation in size, color, pose variations, illumination variations, and face rotation. To overcome © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 240–246, 2020. https://doi.org/10.1007/978-3-030-37218-7_28
Human Face Recognition Using Local Binary Pattern Algorithm - Real Time Validation
241
the above problem Local Binary Pattern algorithm (LBP) is proposed. The paper is organized in the following steps: First, Existing methods of face recognition is given in Sect. 2. Next, in Sects. 3 and 4 the proposed method and face recognition algorithm are explained. Sections 5 and 6 gives the working principle and the results of the experiment. Finally Sect. 7 is referred in order to draw conclusions.
2 Existing Techniques of Face Recognition The various methods are available for face recognition which includes: Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA). In PCA algorithm [2] the number of variables can be reduced statistically. The relevant features are extracted from the face. In the training set, the obtained images are combined as weighted eigen-vectors and eigenfaces. The covariance matrix of images in training database provides the eigen-vectors known as basic function and the corresponding weight is determined by choosing a set of relevant eigenfaces. Values of eigen are increased to obtain higher face recognition. Classification of images according to a group of features is carried out by LDA [2, 3]. Between class scatter matrix (SB) and within class scatter matrix (SW) of all the classes of all samples are defined. The differences between classes are found for recognition. The above methods cannot produce complete information of face due to the loss of information and they are sensitive to noisy image and lighting conditions in the images [4]. PCA algorithm fails to recognize the identical twin faces.
3 Proposed Method In order to overcome the above problems, local binary pattern algorithm is proposed [5]. The algorithm is less sensitive to illumination variations and scaling variations. LBP are very efficient approach for feature extraction. It encodes differences between pixel intensities in a local neighborhood of a pixel. Tiny patterns from a face image are identified by applying the local binary pattern operator [6]. Face recognition system includes the following modules. Firstly, image acquisition module in which the details of the image are gathered and these samples are taken for analysis. Secondly, feature extraction module extracts only significant features from the image. Lastly, classification and training classifier database module classifies the obtained samples. 3.1
Face Detection
Face detection is represented by a block diagram as given in Fig. 1. A face can be detected by using Haar cascade classifier from OpenCV. Detection of various organs on the face is carried out by Haar cascade classifier which operates based on AdaBoost algorithm. The detected faces are stored in a FR100.
242
P. Shubha and M. Meenakshi
Fig. 1. Block diagram representing detection of a face
3.2
Face Recognition
The technique of face recognition is depicted in Fig. 2. The calculation of the binary pattern of a face is carried out by converting it into a gray scale image [7]. The LBP operator is not fixed with the radius and neighborhood. Therefore the algorithm can be applied to the images of different size and texture. The face description is represented through feature vector. The histograms for the region are calculated. The united histogram represents an overall description of the face [8].
Fig. 2. The block diagram representation of the face recognition technique.
3.3
Database
Database FR100 is created to recognize a human face. The database stores 100 images of each face with different facial expressions and postures in an image. During acquisition process database images are converted to gray scale images for feature extraction. Then the images are normalized for better recognition results. Each image undergoes a process of normalization to eradicate noise and to align it in proper position.
Human Face Recognition Using Local Binary Pattern Algorithm - Real Time Validation
243
4 Block Diagram and Working Principle of Hardware Setup
Fig. 3. Block diagram representation of hardware setup
The hardware setup is shown in Fig. 3 whereas working principle of the face recognition system is depicted by means of a block diagram as shown in Fig. 4. The hardware module consists of Raspberry pi single board computer powered by a micro USB socket with 5 V, 2.5 A. The LBP algorithm is made to run in python IDE. The image is captured by a camera. The obtained color image is converted into grayscale image for further processes. The required facial region is cropped and is divided into several blocks. The unique ID is assigned to each image in the database as a result a database is created. Then the extraction of texture features of LBP from the test image is carried out. A face is recognized by using Harr cascade classifier and recognizer [9, 10]. Database image is used by the classifier to compare it with the test image. Face recognition is successful only when the test image matches with the images of the database [11].
5 Face Recognition Algorithm Local binary algorithm is applied to perform face recognition. Proposed algorithm is as described below: 1. 2. 3. 4. 5. 6.
Inputting of the face image. Dividing the face image into blocks Calculating histogram for each block Combining LBP histogram into single histogram Comparing the face image with the database image for further processing Successful recognition of the face only when the test image matches with the images of the database.
244
P. Shubha and M. Meenakshi
Fig. 4. Working principle for recognition of a face
6 Results and Discussions Algorithm is tested for its performance using different test images in simulation environment and in hardware setup. Before applying an algorithm, required face region is cropped as depicted in Fig. 5 (a). Input image and LBP Image are illustrated in Fig. 5(b). The recognition result is represented in Fig. 5(c). The hardware setup result is described in Fig. 6.
Fig. 5. Simulation results of LBP algorithm (a) Cropped image, (b) Original image and LBP image, (c) Detected image
The overall rate of recognition of an LBP algorithm is 89% which is observed in Table 1. The experiment is also extended for images with varying positions of the face. From Table 2, it can be concluded that rate of recognition of a face facing the front is greater as compared to the other positions of the face. The results of Table 3 depicts that the recognition rate is high in LBP algorithm when compared to the other face recognition algorithms such as PCA, 2D-PCA and LDA. This is represented in Fig. 7.
Human Face Recognition Using Local Binary Pattern Algorithm - Real Time Validation
245
Fig. 6. Face detected from experimental setup Table 1. Rate of recognition of FR100 database No. of images in the database 5500
No. of input faces tested 55
Recognized face 49
Un recognized face 06
Recognition rate 89
Table 2. Recognition rate of FR100 database at various position of face Left Front Right 58 89 45
Table 3. Face Recognition rate of standard face recognition algorithm [12] Standard face recognition methods Recognition rate (%) LBP 89 PCA 64 2D-PCA 63.1 LDA 55
Fig. 7. Comparison of various algorithms
246
P. Shubha and M. Meenakshi
7 Conclusion The paper presented a simple and efficient way of face recognition using LBP algorithm. The design and validation of real time face recognition is proposed. The proposed system is developed by using LBP algorithm and techniques of image processing. MATLAB software is used to validate the results in offline mode whereas experimental setup is used to validate the results in real time. These results match with each other. The rate of recognition of a face by LBP algorithm is obtained as 89% by which we can prove that above algorithm is better when compared to other face recognition algorithms. The same work can be extended to an autonomous robot.
References 1. Singh, S., Kaur, A., Taqdir, A.: Face recognition technique using local binary pattern method. Int. J. Adv. Res. Comput. Commun. Eng. 4(3), 165–168 (2015) 2. Shubha, P., Meenakshi, M.: Design and development of image processing algorithms-A comparison. In: 1st International Conferences on Advances in Science & Technology ICAST-2018 in Association with IET, India, Mumbai, 6th–7th April 2018 (2018). ISSN 0973-6107 3. Riyanka: Face recognition based on principal component analysis and linear discriminant analysis. IJAREEIE 4(8) (2015) 4. Gonzalez and Woods: Digital Image Processing. 3rd edn. Prentice Hall, Pearson Education, Upper Saddle River (2008) 5. Bruunelli, R., Poggio, T.: Face recognition: features versus templates. IEEE Trans. Pattern Anal. Mach. Intell. 15(10), 1042–1052 (1993) 6. Khanale, P.B.: Recognition of marathi numerals using artificial neural network. J. Artif. Intell. 3(3), 135–140 (2010) 7. Rahim, A., Hossain, N., Wahid, T., Azam, S.: Face recognition using local binary patterns. Glob. J. Comput. Sci. Technol. Graph. Vis. 13(4), 1–9 (2013) 8. Huang, D., Shan, C., Ardabilian, M., Wang, Y., Chen, L.: Local binary patterns and its application to facial image analysis: a survey. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 41(6), 765–781 (2011) 9. Gudipathi, V.K., Barman, O.R., Gaffoor, M., Harshagandha, A., Abuzneid, A.: Efficient facial expression recognition using adaboost and Harr cascade classifiers. In: IEEE 2016 Annual Connecticut Conference on Industrial Electronics, Technology & Automation (CTIETA) (2016) 10. Wilson, P.I., Fernandez, J.: Facial feature detection using haar classifiers. JCSC 21(4), 127– 133 (2006). Copyright © 2006 by the Consortium for Computing Sciences in Colleges 11. The Facial Recognition Technology (FERET) Database: National Institute of Standards and Technology (2003). http://www.itl.nist.gov/iad/humanid/feret/ 12. Vishnu Priya, T.S., Vinitha Sanchez, G., Raajan, N.R.: Facial recognition system using local binary patterns (LBP). Int. J. Pure Appl. Math. 119(15), 1895–1899 (2018)
Recreation of 3D Models of Objects Using MAV and Skanect S. Srividhya1(&), S. Prakash2, and K. Elangovan3 1
2
Department of ISE, BNM Institute of Technology, Bangalore, India [email protected] Department of CSE, Dr. Ambedkar Institute of Technology, Bangalore, India [email protected] 3 Er. Perumal Manimekalai College of Engineering, Hosur, India [email protected]
Abstract. Mapping of static objects in an indoor environment is a process to obtain the description of that environment. Simultaneous localization and mapping (SLAM) is an active area of research in mobile robotics and localization technology. To accomplish enhanced performance with low cost sensors, Microsoft Kinect has been mounted on a developed aerial platform in association with Skanect. Skanect transforms the structure sensor Kinect into a less expensive 3D scanner that can create 3D meshes. An experimental result shows how the proposed approach is able to produce reliable 3D reconstruction from the Kinect data. Keywords: SLAM
Quadrotor Skanect RANSAC SURF
1 Introduction The main important application of mobile robots is to arrive at and discover terrains which are unreachable or considered unsafe to humans. Such surroundings are commonly stumble upon in search and rescue scenarios where prior information of the surroundings is unknown but necessary before any rescue procedure can be organized. A small mobile robot furnished with a proper sensor bundle can be viewed as the best guide in such situation. The robot is estimated to navigate itself through the place and produce maps of the surroundings which can be used by human rescuers for navigation and localization of victims. In spite of the wealth of investigation in planar robot mapping, the restrictions of traditional 2D feature maps in providing constructive and comprehensible information in these situations have thrust the increase in research efforts towards the generation of richer and more descriptive 3D maps instead [1]. Pioneering work carried out by the research team of Prof. Vijay Kumar, director – GRASP Laboratory, University of Pennsylvania has almost pushed the MAV technology to its thresholds of applicability especially in collaborative autonomous missions by swarm of nano quad copters. One of the striking features of his work is the demonstration of 3D image reconstruction and live transmission by integration of versatile visual/motion sensors in MAVs which has triggered the imagination of this research work. © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 247–256, 2020. https://doi.org/10.1007/978-3-030-37218-7_29
248
S. Srividhya et al.
In this article, the development of a quadrotor with a payload capacity of up to 500 gm and integrating the readily available sensor hardware Kinect® to a MAV platform and creating the 3D view of targeted objects in an indoor environment using Skanect has been shown and explained. It’s not affordable to capture a color 3D model of an object, a human or a room. Skanect will transform the Structure Sensor, Microsoft Kinect camera into a less expensive 3D scanner, which will create 3D mesh out of real landscape in a few minutes. The functionalities of sub systems like 3D scanner, camera, microchip etc. would be individually studied with intent to remove the extraneous elements and reduce the system weight [2].
2 Goals of the Study It’s a challenging task to map the internal environment of an environment. In that capacity, it is compulsory to assess the necessities of this task, as: • • • •
Device cost Practical application of the method Computational cost Easy to use and exploitation of the attained results. The following parameters can be considered to accomplish these necessities:
• Less expensive 3D sensing equipment • Less expensive computer 2.1
Principles of 2D/3D Reconstruction
2.1.1 Sensing Different types of sensors will be used by SLAM to obtain information with probably different errors. To manage with metric preconception and with noisy measures it is important to have statistical independence. Those type of optical sensing equipment’s may be 3D or 2D cameras. There has been an extreme research in VSLAM using mainly visual sensors, due to the increase use of cameras in mobile phones. As a compliment to unpredictable wireless measures, the recent methods pertain quasioptical wireless ranging for multi-lateration (RTLS) or multi-angulations in conjunction with SLAM SLAM has been introduced to human pedestrians’ by means of shoes accumulated with inertial measurement unit as the major sensor in order to avoid walls. This method is known as FootSLAM which is used to make the floor plans of buildings automatically which can be used by an indoor localization system. 2.1.2 Positioning The outcome from sensing will provide the algorithms for localization or positioning. According to geometrical plan, the sensing should include at least one iteration and (n + 1) estimating equations for n dimensional problem. It should also have some additional information like a priori knowledge about the adjustment of results or the coordinates of the systems with rotation and mirroring.
Recreation of 3D Models of Objects Using MAV and Skanect
249
2.1.3 Modeling Mapping can be used in 2D exhibiting and corresponding illustration or in 3D forming and 2D depiction. To improve the estimation of sensing under intrinsic condition and noise, the kinematics of the robot is inculcated as an element in the model itself. From various sensors and partial error models the dynamic model has been balanced. It finally encompasses a sharp virtual description as a map with the position and direction of the robot. The final representation of these model can be represented by mapping and the map should be either descriptive or conceptual design of the model. 2.2
Need for Custom Platforms
We have proposed to utilize a stripped down Kinect Sensor custom fitted onto a vibration proof chassis and mounted on top of a quadrotor. Kinect is a structure sensor by Microsoft for Xbox 360 and Xbox One video game consoles and Microsoft Windows PCs. A 3D scanning evoke can be used to measure the 3D outline of an object using anticipated pattern of the light and a camera. It works on the principle by foretelling a narrow band of light onto a 3D structured plane producing a line of illumination that shows distorted from other point of view than that of the projector and can be used for an accurate geometric restoration of the surface outline.
3 Development of the Mobile Platform 3.1
Selection of Mobile Platform
The quadrotor has unbounded applications and it is also an emerging Micro Aerial Vehicle (MAV). Recent quadrotors are surfacing into miniature and agile vehicles. The emerging research in the area of miniature vehicles allows quadrotor to communicate smartly with other autonomous vehicles, to discover unknown environments or surroundings and to move in dense environment with high speed and with a prior knowledge. These advancements will let the quadrotors to accomplish the missions like surveillance, search and rescue operations. Even though it is rotor based, it is different from the helicopters and predominantly makes it attractive as an MAV. It is highly reliable and manoeuvrable with the four rotors. It can also be used in applications like multi-craft communication, environment exploration and manoeuvrability to increase the feasibility of quadrotors. A quadrotor is a kind of rotorcraft that has two pairs of counter-rotating, fix-pitched blades for lift. The quadrotor propellers are connected to four individual motors using the fixed-pitched blades. This leads to the non usage of control pitch. These motors are then associated in an ‘X’ arrangement. The battery and microcontrollers are placed near to the center of the craft in order to power and control the rotors. The altitude and attitude of the craft can be changed by shifting the speed of individual rotors. Figure 1 represents the layout of these components and shows the various changes in rotor to adjust the state of the craft. Using these basic designs it is easy to design vehicles that are much smaller than the conventional rotorcrafts. It is also important to consider the payload and size based upon the requirements of the task [3].
250
S. Srividhya et al.
Fig. 1. Control scheme of quadrotor design
Due to its intrinsic stability, the quadrotor design provides tremendous flight characteristics. As each pair of rotor blades are spinning in opposite direction the torques are terminated and keeps the quadrotor flying straight and accurate. The propellers rotating in anti clockwise direction increase the efficiency and flight time. And no extra propel is required to balance the unwanted rotation (Table 1). 3.1.1
Mobile Aerial Platform Control and Dynamics
Reference Frames. This segment explains the different frames for the references and coordinate systems that is utilized in demonstrating the position and direction of the platform. It is essential to use numerous diverse coordinate systems for the subsequent causes: i. The coordinate frame is substituted with Newton’s equations of motion appended to the aerial platform. ii. The body frame is applied with aerodynamic forces and torques. iii. Accelerometers and rate gyros are the on board sensors from which the values of the body frames will be measured. iv. The inertial frame contains the information’s about hover points and flight trajectories. v. There are several coordinated systems available for a quadrotor. The frames of the co-ordinator are discussed below.
3.1.2 Forces and Moments Due to the gravity and the four propellers the force and moments are generated.
Recreation of 3D Models of Objects Using MAV and Skanect
251
Table 1. Selected quadrotor technical specifications are listed below (DragonFly) Standard configuration Dimensions Width: 64.5 cm (25.4 in.) Length: 64.5 cm (25.4 in.) Top Diameter: 78.5 cm (30.9 in.) Height: 21 cm (8.3 in.) Without rotor blades or landing gear Dimensions Width 36 cm (14 in.) Length: 36 cm (14 in.) Top Diameter: 51 cm (3 in.) Height: 7.5 cm (3 in.) Rotor Blades Four Counter Rotating Blades Rotor Blade Material: Molded Carbon Fiber Electric Motors
Rechargeable Helicopter Battery Cell Chemistry: Lithium Polymer Voltage: 14.8 V Capacity: 2700 mAh Connectors: Integrated Balance and Power Recharge Time: Approx. 30 min (after typical flight) Landing Gear Material: Molded Carbon Fiber Materials Carbon Fiber Glass Filled Injected Nylon Aluminum A Stainless Steel Fasteners RoHS Compliant
Without Rotors Dimensions Width: 36 cm (14.1 in.) Length: 36 cm (14.1 in.) Top Diameter: 51 cm (20 in.) Height: 21 cm (8.3 in.) Weight & Payload Helicopter Weight (with battery): 680 g (24 oz) Payload Capability: 250 g (8.8 oz) Maximum Gross Take-Off Weight: 980 g (33 oz) Brushless Motors: 4 Configuration: Direct Drive (One Motor per Rotor) Ball Bearing: 2 per Motor Rotor Mounting Points: Integrated Voltage: 14.8 V Landing Gear Installed Height: I7 cm (7 in.) Stance Width: 26 cm (12 in.) Skid Length: 31 cm (12 in.)
The whole force acting on the quadrotor is represented by: F ¼ Ff þ Fr þ Fb þ Fl Figure 2 shows the top angle representation of the quadrotor. An rising force F and a torque T is generated by every motor. The front and rear motors and the right and left motors spin in clockwise direction and anti clockwise direction respectively (Fig. 3). Due to Newton’s 3rd law, a yawing torque will be produced by the drag of the propellers on the body of the quadrotor. The way of the torque and motion of the propeller will be in the opposite direction to each other. Consequently the total yawing torque is represented by Eq. (1) Tu ¼ Tr þ Tl Tf Tb
ð1Þ
252
S. Srividhya et al.
Fig. 2. Top angle view of the quadrotor
Fig. 3. Forces and Torques in the Quadrotor
The force and torque of each motor can be represented as: F ¼ k1d
ð2Þ
T ¼ k2d
ð3Þ
Where, k1 and k2 are constants that has to be identified practically, d* is the command signal of the motor and s represents f, r, b, and l. Matrix form represents the forces and torques on the quadrotor F k1 su 0 ¼ sh lk1 su k2
k1 lk1 0 k2
:
k1 0 lk1 k2
k1 df lk1 dr ¼M 0 db k2 dl
ð4Þ
Forces and torques are the control strategies resulted in succeeding sections. The real motors commands can be represented as:
Recreation of 3D Models of Objects Using MAV and Skanect
df F dr 1 su ¼M db sh dl su
253
ð5Þ
The PWM commands are necessary to be between 0 and 1. In accumulation to the motor force, gravitational force will also be applied on the quadrotor. The gravity force performing on the center of mass in the vehicle frame Fv is given by df F dr 1 su ¼M db sh dl su
ð6Þ
It should be changed to the body frame to give 0 fgv ¼ 0 mg 0 fgb ¼ Rbv 0 mg
ð7Þ
mg sinh ¼ mg cosh sinu mg cosh cosu The Kinect sensor will be fitted on the developed MAV to extract the 3D view of an environment using Skanect as shown in Fig. 4 [5]:
Fig. 4. Quadrotor Fitted with a Kinect Sensor
254
S. Srividhya et al.
4 Methodology This part deals with the method used to create the 3D mapping of an object in a room using the MAV in association with Skanect [6]. Up to 30 frames per second of dense 3D information can be obtained in a scene. If the Structure Sensor is moved roughly, Kinect can capture a full set of point of views and we will get a 3D mesh in real time [7, 8]. The configuration details are: To sense: Microsoft Kinect v1 To localize: RGDBSLAM algorithm To characterize: 3D point cloud RGBDSLAM [9] is a SLAM solution committed for RGB-D cameras. Using a hand held Kinect camera the 3D models of objects and interior views can be captured easily. The key points are detected using the visual features that in turn will be used in feature matching step. SURF), SIFT, or ORB [10, 11]. The obtained solution will be key points are used in determining the points extracted, the results are key-points, localized in 2D and a précis of their surrounding area. 3D transformation of successive frames is enumerated using the A RANSAC (Random Sampling and Consensus). It takes more processing power to calculate the key points. So General Processing on GPU is used to calculate the key points with less processing time. In order to reduce the errors the resulting graph for the for Pose graphs on Manifolds is optimized with hierarchical optimization (Table 2). The Fig. 5 a, b, c shows the depth image of the objects such as chair, monitor and helmet.
Table 2. Number of points and their storage size Place Office Corridor Lab Chair Monitor Helmet
No. of points (millions) Storage size (MB) 17.0 271 12.2 202 18.1 298 5 101 3.6 88 2.4 71
Recreation of 3D Models of Objects Using MAV and Skanect
(a)
255
(b)
(c)
Fig. 5. (a). Depth image of a chair (b). Depth image of a monitor (c). Depth image of a helmet
5 Conclusion In this paper an approach for creating the 3D view of objects in indoor environment using robotics method is presented. In this paper we have shown an illustration of actual use of a Kinect sensor fitted on a MAV platform to plot the interior environment. A less expensive sensor Kinect in association with Skanect emerges to be a good combination to execute this assignment. The accomplished research showed an essential aspect that has to be dealt with. In perspective of memory, the information acquired from the Kinect and RGBDSLAM are huge. Using high end graphics card with adequate dispensation power the point clouds can be generated. Compliance with Ethical Standards. ✓ All authors declare that there is no conflict of interest ✓ No humans/animals involved in this research work. ✓ We have used our own data.
256
S. Srividhya et al.
References 1. Zhou, W., Jaime, V., Dissanayake, G.: Information efficient 3D visual SLAM for unstructured domains. IEEE Trans. Robot. Autom. 24, 1078–1087 (2008) 2. Kinect for Windows: (n.d.). Microsoft Developer Network. http://msdn.microsoft.com/enus/library/jj131033.aspx 3. DragonFly: (n.d.). DragonFly X4. http://www.draganfly.com/uav-helicopter/draganflyer-x4/ specifications/. Accessed Sept 2013 4. Beard, R.: Quadrotor dynamics and control rev 0.1. (2008) 5. “gumstix: finally! - a very small linux machine”. gumstix.org. 8 April 2004. Archived from the original on 8 April 2004. http://web.archive.org/web/20040408235922, http:// www.gumstix.org/index.html. Accessed 29 July 2009 6. https://skanect.occipital.com/ 7. Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981) 8. Genevois, T., Zielinska, T.: A simple and efficient implementation of EKF-based SLAM relying on laser scanner in complex indoor environment. J. Autom. Mob. Robot. Intell. Syst. 8, 58–67 (2014). https://doi.org/10.14313/JAMRIS_2-2014/20 9. Endres, F., Hess, J., Engelhard, N., Sturm, J., Cremers, D., Burgard, W.: An evaluation of the RGB-D SLAM system. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2012) 10. Srividhya, S., Prakash, S.: Performance evaluation of various feature detection algorithms in VSLAM. PARIPEX Indian J. Res. 6(2), 386–388 (2017) 11. Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: European Conference on Computer Vision. Springer, Heidelberg (2006)
Inferring Machine Learning Based Parameter Estimation for Telecom Churn Prediction J. Pamina(&), J. Beschi Raja, S. Sam Peter, S. Soundarya, S. Sathya Bama, and M. S. Sruthi Sri Krishna College of Technology, Coimbatore, India [email protected], [email protected]
Abstract. Customer churn is an important issue and major concern for many companies. This trend is more noticeable in Telecom field. Telecom operators requires an essential proactive method to prevent customer churn. The existing works fails to adopt best feature selection for designing model. This works contributes on developing churn prediction model, which helps telecom operators to identify the customers who are about to churn. The significance for the recall evaluation measure, which actually solves the real-time business problem is highlighted. The prominent goal of this churn analysis is to perform binary classification with customer records and figure out who are likely to cancelled in the future. Fifteen machine learning methods with different parameters are employed. The performance of the model is evaluated by various measures like Accuracy, Recall, Precision, F-score, ROC and AUC. Our aim in this work is to produce highest recall value which has a direct impact on real-world business problems. Based on experimental analysis, we observed that Decision Tree model 3 outperforms all other models. Keywords: Telecom Recall
Machine learning Parameters Churn prediction
1 Introduction Due to wide range of global development, Information Technology has remarked as high increase in different sorts of service providers which finally cause to great rivalry among them. Tackling customer churn and satisfying the customers are most common challenges for them [1, 2, 22]. Due to high impact of revenues of the companies, predicting factors that maximize customer churn is necessary to reduce customer churn. Churn is defined as when a customer terminates the service from one service providers and switches to another for some reasons. This activity impacts more on company revenues [22]. So, it’s always better to design effective churn model to predict specific customer cluster who are likely to churn and acknowledge their fidelity before cancel the subscription. The current research trends are emerging in analysing customers and developing patterns from it, especially in Telecom filed. Churn prediction is a major factor because it shows direct effect on profit and revenues of the company [3, 4]. The structure of this article is as follows: In Sect. 1, describes related works related to churn prediction using ML Techniques. Section 2, reveals about the pre-processing using © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 257–267, 2020. https://doi.org/10.1007/978-3-030-37218-7_30
258
J. Pamina et al.
PCA [7–10]. Section 3, describes various machine learning techniques with different parameters. In Sect. 4, we discussed about the cross-validation and it’s for accumulating results. Section 5, presents the description of dataset and its feature. Section 6, presents summary and results of 15 ML algorithms. Finally, Sect. 7, concludes this paper.
2 Pre-processıng The pre-processing work includes converting object data types to numeric type. The dataset contains two different kinds of customers: those who left the telecom service and those who stayed [12–15]. The average of both the groups are depicted in Fig. 1. The Table 1 shows the statistics of customers those who left vs who stayed. It is observed that on average those who left had higher monthly Charges and lower tenure. It also noticed that only 7032 observations are for Total Charges. so, we filled NAN values with median values. After converting the dummy variables, we noticed that impact of churn is very less in gender and phone service.
Fig. 1. Availability of churn
Table 1. Statistics on churners Churn Senior citizen Tenure Monthly charges NO 0.128721 37.569965 61.265124 YES 0.254682 17.979133 74.441332
2.1
Principal Component Analysis (PCA)
Principal component analysis is used for reduction of dimensions which involves zeroing out the smallest components and leads in less-dimensional projection of the data that cause maximal data variance preservation. In PCA, relationship is computed by finding principal axes list in the data and describe the dataset. It is noticed that first feature interprets 21% approximately of variance within dataset.
Inferring Machine Learning Based Parameter Estimation for Telecom Churn Prediction
259
PCA removes least principal axis information and leaving the high variance data components. The fractional variance that cut out is evaluated roughly by measuring how much information is removed during dimensionality reduction. We chose to add all features in designing our models. Figure 2 denotes proportional to points spread to the line formation. The train and test data features are standardized. This will transform features by scaling to given range between zero and one.
Fig. 2. PCA analysis
3 Machine Learning Models 3.1
KNN Models
We have created three KNN models by changing weights and nature algorithms. KNN algorithm is transparent, simple, consistent, straight forward, easy implement when data is distributed with no prior knowledge [17]. If kx [ ky 8x 6¼ y then put f in class x. Three types of KNN models are created by changing the parameters and algorithms. The K range is set between 20 to 50. The main goal is to increase Recall (Churn = yes). We set K value as 27 for all models since this point creates highest AUC value. First, KNN model was created by setting algorithm as auto and weights are set as uniform. Here ‘uniform’ denotes uniform weights. The data points at every neighbourhood are equally weighted. Second model was designed by setting algorithm as auto and weights parameter as distance. Here ‘distance’ denotes weight points by the inverse of their distance. In this position, closer neighbours’ points will experience higher impact than neighbours far away. Third KNN was created by setting algorithm as Kd_tree and weights as uniform. Kd trees is a binary tree that constructs trees very fast by recursively partition the parameter with data axes. The evaluation measures such as confusion matrix, cross-validation, recall, accuracy, precision, AUC ROC and F1 scores are computed using packages. After evaluating the measures for three models, the third model with kd algorithm gives the optimal scores for all evaluation metrics. We have selected third KNN model as the best KNN from our experiments (Fig. 3).
260
J. Pamina et al.
Fig. 3. KNN model 3 ROC curve
3.2
Decision Tree
Next, we created three decision tree models. It is one of the methods for decision making. The common idea in decision tree is multistage method is to divide up a more complex decision of simple decision [18]. Cast every training instance on the root of the decision tree. Flag current node to root node. For every attribute • Based on the attribute value, every data instance should be partitioned. • Based on the partitioning calculate the information gain ratio. The feature that contributes to the highest information gain ratio should be identified and set this as the splitting criterion. The current node will be flagged as the leaf and returned if higher information gain ratio is 0. According to the prominent feature divide, all instances based on the attribute value Represent every partition as a child node of the current node. For each child node:If the child node mark it as a leaf and return else mark the child node as the current node and go back to process of attribute partition. We chose Max_depth parameter which would increase the recall metric. We notice that the model overfits for greater values. The tree correctly detects the train data. Still, it declines to predict the new data. Based on this, we chose Max_depth value as 5 to decrease overfitting. When testing the model with Test data, we observed that the accuracy is 78.2% with Recall of 0.60 and the value exceeds all three KNN models. Next, we designed Decision Tree model 2 using Min_samples_leaf parameter. We selected min_samples_split = 0.225 and this produced the highest AUC score for the Test and train data. For Decision model 3 selected min_samples_split = K which represents minimum sample number. Decision tree model 3 provides less accuracy value when compared to DT 1 and DT2 but gives the increased recall of 0.71 (Fig. 4). 3.3
Random Forest Model
Random forest is integration of decision trees such that each decision tree relays on values of vector independently and with distribution for all decision trees [19]. Random Forest 1 model was created by using Max_Depth parameter. The default parameter is making all the leaves prune and expanded for this data which leads to overfitting. We observed the when max_depth increases, train curve approach 1.0 while test curve began to level off at 7 (Fig. 5).
Inferring Machine Learning Based Parameter Estimation for Telecom Churn Prediction
261
Fig. 4. Decision tree model 1 ROC curve
Fig. 5. Random forest 1 ROC curve
3.4
Stochastic Gradient Descent Model
Stochastic Gradient Descent is a technique for optimization of objective function. The term “stochastic” refers to randomly elected samples to predict the gradients [20]. In stochastic gradient descent, the exact gradient of Gð xÞGG is imprecise by a gradient at a sole example: x := x grGi ð xÞ In this work, three Stochastic Gradient Descent Models were created. The first SGD model was designed by setting loss = ‘modified_huber’ which is a smooth loss which helps to handle outliers and probability estimates. For second model we fixed loss = ‘log’ which yields logistic regression classifier. The third model was designed by fixing loss = ‘log’ and penalty = ‘l1’ which helps for feature selection. From the experiment, we selected second model as best since it gives highest recall value among all models (Fig. 6).
262
J. Pamina et al.
Fig. 6. Stochastic gradient 2 ROC curve
3.5
Neural Network Models
Neural network model is a collection of units called neurons which are connected to each other. The neuron receives a signal for processing and weights can be added or reduced according the problem [21]. Multilayer perceptron is a classifier considered as a part of feed-forward artificial neural network. It combines several perceptron’s to create a non-linear decision boundary. The perceptron consists of weights, bias, the summation processor and an activation function. Each perceptron provides a non-linear mapping function into a new dimension. Let us consider a multilayer perceptron consists of input layer, hidden layer and an output layer. Input layer represents the input data, it takes node’s weights and bias to the next layer. Intermediate layer or hidden layer which present next to the input layer performs activation function to make sense of the given input. Node’s in hidden layer performs activation function i.e. sigmoid (Logistic) function, it is used to predict the probability of anything that ranges between 0 and 1. If anything ranges from −1 to 1 then we can use hyperbolic tangent function. Sigmoid function, f ð xÞ ¼
1 1 þ exi
Finally, nodes in output layer performs loss function i.e SoftMax function. To find the exact class of the input given by the user. SoftMax function, exi f ð x i Þ ¼ PN k¼1
exk
Inferring Machine Learning Based Parameter Estimation for Telecom Churn Prediction
263
Where N in the output layer corresponds to the number of classes. Multi-layer perceptron model was used for designed Neural Network models. The first MLP model was designed using hidden layer = (20,8). The activation function is set to ‘tanh’ and solver = ‘adam’. second model was designed using ‘relu’ activation function with 15 hidden layers. Last model was created by changing ‘solver = sgd’ with 30 hidden layers (Fig. 7).
Fig. 7. MLP model-2 ROC curve
4 Cross Validation The models in paper are tested by 10-fold cross validation for accumulating best results. The k-fold CV otherwise called rotation estimation. Cross validation is used to assess the results of a statistical analysis on an independent dataset [15]. It is sometimes called out of sampling which is mainly used to analyse the performances of various predictive models. The whole dataset S is divided randomly into K folds or subsets S1, S2, S3…SK of equal size approximately. This process evaluates the accuracy of overall number of correct classifications. The estimation of cross validation is evaluated by random number which depends on the splits into subsets (folds). The formula for cross validation is mentioned below acccv ¼
1 X d L SnSðiÞ ; ci ; bi n ðv ;y Þ2S i
i
where, S(i) be th test dataset that contains Xi = uci; bii, is estimation of cross validation. using 10-fold cross validation the errors and variance are minimum and provides best results.
264
J. Pamina et al.
5 Dataset The experiments are carried out with a standard benchmark IBM Watson Telecom churn dataset. The dataset contains the following information about services that each customer has opted multiple lines, device protection, internet, online security, tech support and streaming TV and movies. The second type of information is about customer account information which contains contract, paperless billing, payment method, monthly charges and total charges. The last type of information contains demographic content of the customer which includes age, gender and their marital status.
6 Results and Comparison The experimental analysis was conducted with standard benchmark algorithms such as KNN, Multilayer Perceptron, Stochastic Gradient Descent, Decision Tree and Random Forest on IBM Watson Telecom churn Dataset. In the first phase, three KNN models were designed by using diverse parameters in algorithms and weights. The highest value is produced by KNN model 3 which has algorithm as kd tree and weights as distance. the recall and accuracy produced by KNN model 3 is 0.57 and of 0.78 respectively. The second phase presents the evaluation of Decision Tree models. Among all, Decision tree model 3 which adds Min_Samples_Split parameter receives the highest recall and accuracy of 0.71 and 0.73 respectively. The third phase contains the creation of three Random Forest models using various parameters in that, Random forest models 2 and 3 receives the recall of 0.49 and accuracy of 0.79. The next phase presents the creation and analysis of Stochastic Gradient Descent models. In that, all three SGD models provides the highest accuracy of 0.79 but SGD model 2 (Loss = ‘log’) yields the highest recall of 0.54. The final session is composed of creating ANN models. We used Multi-layer perceptron for designing ANN models. The second and third models provides the highest accuracy of 0.79 but the third models (solver = ‘sgd’) achieves the highest recall value of 0.53. After the experimental analysis of 15 models, we selected the decision tree model-3 as the best model for real-time business problem. Decision Tree model-3 secured highest recall value compared to other models. The values and experimental analysis for 15 models are shown in Table 2. So, this model has been selected for feature selection to find the impact of churn.
Inferring Machine Learning Based Parameter Estimation for Telecom Churn Prediction Table 2. Experimental results and summary of 15 ML models No Method Parameters 1 2 3
4 5 6 7 8 9
10 11 12 13 14 15
KNN-1 Algorithm = ‘Auto’ Weights = ‘Uniform’ KNM-2 Algorithm = ‘Auto’ Weights = ‘Distance’ KNN-3 Algorithm = ‘K d_Tree’ Weights = ‘Uniform’ DT-1 Max_Depth DT-2 Min_Samples_Leaf DT-3 Min_Sample_Split RF-1 Max_Depth RF-2 Estimators, Max_Depth RF-3 Estimators, Max_Depth, Min_Samples_Leaf SGD-1 Modified_Huber SGD-2 Loss = ‘Log’ SGD-3 Penalty = ‘L1’ MLP-1 Solver = ‘adam’ MLP-2 Solver = ‘relu’ MLP-3 Solver = ‘sgd’
Accuracy AUC Recall CV score 0.78 0.80 0.52 0.76
Precision F1 score 0.59 0.55
0.76
0.79
0.47
0.75
0.56
0.51
0.78
0.82
0.57
0.77
0.60
0.58
0.78 0.74 0.73 0.79 0.79
0.82 0.75 0.50 0.83 0.83
0.60 0.61 0.71 0.48 0.49
0.78 0.75 0.73 0.79 0.79
058 0.51 0.50 0.65 0.65
0.59 0.55 0.71 0.55 0.56
0.79
0.83
0.49
0.79
0.65
0.56
0.79 0.79 0.79 0.79 0.78 0.79
0.83 0.83 0.83 0.82 0.81 0.83
0.52 0.54 0.52 0.49 0.50 0.53
0.79 0.80 0.79 0.78 0.78 0.79
0.64 0.63 0.64 0.64 0.60 0.63
0.57 0.59 0.57 0.55 0.55 0.58
Fig. 8. Decision tree visualization
265
266
J. Pamina et al.
7 Conclusion This work concentrates on designing an efficient churn prediction model that helps telecom sector to predict the customers who are going to churn. Principal Component Analysis is employed for pre-processing and normalization of data. A comparative experimental analysis of 15 models with different parameters were employed and carried out for churn dataset. Recall metric makes a strong impact of real-time business to reduce the churn. Decision Tree model-3 secured highest recall value and hence it is used for feature selection which impact the churn of customers. From Fig. 8. the variables “contract” and “online security” are prominent factors impacting churn. Online security refers to whether the person hires online security or not. Contact refers the term and duration of a person. The extension of this work is intended toward make a comparative analysis of with many machine learning methods with large volume of records for other different applications. Compliance With Ethical Standards. ✓ All authors declare that there is no conflict of interest ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Verbeke, W., et al.: Building comprehensible customer churn prediction models with advanced rule induction techniques. Expert Syst. Appl. 38(3), 2354–2364 (2011) 2. Zhao, L., et al.: K-local mamum margin feature extraction algorithm for churn prediction in telecom. Clust. Comput. 20(2), 1401–1409 (2017) 3. Idris, A., Khan, A., Lee, Y.S.: Genetic programming and adaboosting based churn prediction for telecom. In: 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE (2012) 4. De Bock, K.W., Van den Poel, D.: Reconciling performance and interpretability in customer churn prediction using ensemble learning based on generalized additive models. Expert Syst. Appl. 39(8), 6816–6826 (2012) 5. Keramati, A., Ardabili, S.M.S.: Churn analysis for an Iranian mobile operator. Telecommunications Policy 35(4), 344–356 (2011) 6. Lee, H., et al.: Mining churning behaviors and developing retention strategies based on a partial least squares (PLS) model. Decis. Support. Syst. 52(1), 207–216 (2011) 7. Chen, Z.-Y., Fan, Z.-P., Sun, M.: A hierarchical multiple kernel support vector machine for customer churn prediction using longitudinal behavioral data. Eur. J. Oper. Res. 223(2), 461–472 (2012) 8. Zhu, B., Baesens, B., vanden Broucke, S.K.L.M.: An empirical comparison of techniques for the class imbalance problem in churn prediction. Inf. Sci. 408, 84–99 (2017) 9. Bi, W., et al.: A big data clustering algorithm for mitigating the risk of customer churn. IEEE Trans. Ind. Inform. 12(3), 1270–1281 (2016) 10. Babu, S., Ananthanarayanan, N.R.: Enhanced prediction model for customer churn in telecommunication using EMOTE. In: Dash, S., Das, S., Panigrahi, B. (eds.) International Conference on Intelligent Computing and Applications. Advances in Intelligent Systems and Computing, vol. 632 (2018)
Inferring Machine Learning Based Parameter Estimation for Telecom Churn Prediction
267
11. Qi, J., et al.: ADTreesLogit model for customer churn prediction. Ann. Oper. Res. 168(1), 247 (2009) 12. Karahoca, A., Karahoca, D.: GSM churn management by using fuzzy c-means clustering and adaptive neuro fuzzy inference system. Expert Syst. Appl. 38(3), 1814–1822 (2011) 13. Adris, A., Iftikhar, A., ur Rehman, Z.: Intelligent churn prediction for telecom using GPAdaBoost learning and PSO undersampling. Clust. Comput., 1–15 (2017) 14. Vijaya, J., Sivasankar, E.: An efficient system for customer churn prediction through particle swarm optimization-based feature selection model with simulated annealing. Clust. Comput., 1–12 (2017) 15. Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. Ijcai 14(2), 1137–1145 (1995) 16. García, S., Luengo, J., Herrera, F.: Data Preprocessing in Data Mining. Springer, New York (2015) 17. Adeniyi, D.A., Wei, Z., Yongquan, Y.: Automated web usage data mining and recommendation system using K-Nearest Neighbor (KNN) classification method. Appl. Comput. Inform. 12(1), 90–108 (2016) 18. Safavian, S.R., Landgrebe, D.: A survey of decision tree classifier methodology. IEEE Trans. Syst. Man Cybern. 21(3), 660–674 (1991) 19. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001) 20. Mei, S.: A mean field view of the landscape of two-layer neural networks. Proc. Natl. Acad. Sci. 115(33), E7665–E7671 (2018). https://doi.org/10.1073/pnas.1806579115. PMC 6099898. PMID 30054315 21. Zhang, G.P.: Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing 50, 159–175 (2003) 22. Pamina, J., et al.: An effective classifier for predicting churn in telecommunication. J. Adv. Res. Dyn. Control Syst. 11, 221–229 (2019)
Applying a Business Intelligence System in a Big Data Context: Production Companies Jesús Silva1(B) , Mercedes Gaitán2 , Noel Varela3 , Doyreg Maldonado Pérez3 , and Omar Bonerge Pineda Lezama4 1 Universidad Peruana de Ciencias Aplicadas, Lima, Peru
[email protected] 2 Corporación Universitaria Empresarial de Salamanca – CUES, Baranquilla, Colombia
[email protected] 3 Universidad de la Costa (CUC), Calle 58 # 55-66, Atlantico, Baranquilla, Colombia
{nvarela1,dmaldona}@cuc.edu.co 4 Universidad Tecnológica Centroamericana (UNITEC), San Pedro Sula, Honduras
[email protected]
Abstract. Industry 4.0 promotes automation through computer systems of the manufacturing industry and its objective is the Smart Factory. Its development is considered a key factor in the strategic positioning not only of companies, but of regions, countries and continents in the short, medium and long term. Thus, it is no surprise that governments such as the United States and the European Commission are already taking this into consideration in the development of their industrial policies. This article presents a case of the implementation of a BI system in an industrial food environment with Big Data characteristics in which information from various sources is combined to provide information that improves the decision-making of the controls. Keywords: Industry 4.0 · Big Data · Business Intelligence (BI) · Food industry · ERP systems
1 Introduction Industry 4.0 envisages within the same paradigm the latest advances in hardware and software at the service of industry such as the Internet of Things (with machines collaborating with other machines) [1], Systems cyber-physicists (in which the human being interacts with the machine naturally and safely), the ‘doer’ culture or maker-culture (with customization devices such as 3D printers), simulation systems (such as Augmented Reality, Simulation of Productive Processes for Predicting Behaviors of Environments and Productive Assets) or Big Data Technology for the processing, processing and representation of large volumes of data [2, 3]. Big Data can be simply defined as a system that allows the collection and interpretation of datasets that because of their large volume it is not possible to process them with conventional tools of capture, storage, management and Analysis. Big Data technologies are designed to extract value from large volumes of data, from different types © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 268–274, 2020. https://doi.org/10.1007/978-3-030-37218-7_31
Applying a Business Intelligence System in a Big Data Context
269
of sources, quickly and cost-effectively [4–6]. The objective is to support the decisionmaking process based on information by managing the five “V” [7]: Volume, Variety, Speed, Veracity and Value. A Big Data system can be structured into three layers: (1) the infrastructure layer (such as a computer storage and distribution service), (2) the compute layer (with middleware that includes tools for data integration and management, along with the programming model) and, (3) the application-visualization layer (with business applications, web services, multimedia, etc.). Big Data has begun to be the present and will hold the future of employment in the world. Thus, several sources refer to the employment opportunity that exists around Big Data. In addition to Big Data, BI systems are part of Industry 4.0, and include applications, infrastructure, tools, and best practices that enable access and analysis of information to improve both the decision-making process and its impact [8–10]. In [11] they define BI systems as the set of techniques and tools for transforming large amounts of raw data into consistent and useful visual information for business analysis in seconds. Big Data and BI systems are changing businesses and industry [12], although their degree of implementation is not, at the moment, as high as you would expect [13, 14].
2 Materials and Methods The present implementation of a BI system in a context of Big Data has been developed in a multinational production sector with approximately 5800 employees spread over 6 production plants that invoiced during 2015 1254.2 Million, thanks to selling about one trillion product units on several million customer order lines. The multinational handles information regarding different business areas of at least 5 exercises in its decisionmaking processes, thus giving an idea of the amount of data being handled. For example, for the quality of the product it manufactures, each unit must be able to be plotted unitally in front of any potential quality incidence, extracting from which raw materials and production processes the same is obtained in order to establish possible correlations, statistical analysis, etc. In reference to customer service, all of the above line items have different service policies and penalties that must be able to be analyzed unitally in the event of any potential service failure that the customer wishes to claim. These examples are just some of the examples that show that the multinational needs to comply and meet the premises that make up the 5 “Vs” of big data mentioned above. 2.1 Methods The multinational has deployed its BI tool (Qlikview®) for data analysis both centrally (for purchasing, logistics, sales, etc.) and in a decentralized manner (to meet the demand for plant information). To do this, the method applied has been that of the Quick Prototype or Quick Prototyping, which, as explained in [15], consists of acting before disposing of the results and being persuaded by the results to refine them. The Rapid Prototype consists of the concurrent performance of the design and development stages of a product so that the project team can test it experimentally, observe it and even feel it as soon as possible in order to improve it.
270
J. Silva et al.
The implementation methodology followed a cyclical process of selection and development of the area (purchases, production, etc.) to be implemented, to subsequently concurrently perform requirements, conceptual designs, functional designs and prototypes which users tested for validation or tailoring. The tests were performed without integrating the development made with the rest of the company’s systems (i.e., with experimental data and in a specific hardware and software test environment, not interacting with the other developments made so far). If necessary the development would be adapted until its acceptance, and at that time it was tested in a test environment that already integrated the programing res- to the other areas of the company. Once the solution was completed, it was closed and moved to a maintenance phase while the IT team approached another area of the company. The deployment of the company’s BI system includes, broadly speaking, the development of 5 online analytical processing (or On-Line Analytical Processing, OLAP) cubes and Supply Chain and Corporate. These OLAP cubes collect up-to-date information from the servers to copy and compress to a BI data server that will then be used by the BI engine itself (located on another server) for near-instant querying and viewing. Each of these areas supports decision-making on issues as diverse as the management of purchase contracts, the management of products blocked by quality, claims regarding low service fees, etc. 2.2 Materials The deployment shown in the following sections of the article was performed based on the use of a number of servers used to (a) support ERPs (Enterprise Resource Planning or Resource Planning systems (b) additional application and documentary servers, (c) the BI tool management server, and (d) the BI data management tool server. Thus, the set of case BI cubes is mainly fed on plant ERPs and marketers, which were initially found on all types of ERP platforms (SAP R3, Baan, Navision and AX) and which are in a migration process to AX2012. This information is complemented by different types of files found on other types of platforms (e.g. Sharepoint) and applications of various kinds (e.g. SAP BPC). Additionally, group ERPs interact one-way or two-way with hardware and applications of various kinds, such as barcode scanning guns, invoice scanning systems, etc. [16, 17]. 2.3 Features Once the model is built, the BI tool periodically reads the data and visualizes the crossdata that is created appropriate. In this case, the tool is used to be able to filter and obtain any data that is in the filter area of the BI screen. The tool displays as a result (in the middle of the same screen) all kinds of information requested, such as: (a) who has produced a pallet that has a number of SSCCs (Serial Shipper Container Code, identification number of the logistics unit produced, in this case the pallet) in particular, together with when it has been drawn up, (b) which workers have produced more or less units in a period, or (c) as shown in Fig. 1, what product quantities have been produced in a period on a line both in total (at the bottom) as well as production by reference (at the top of the image).
Applying a Business Intelligence System in a Big Data Context
271
Fig. 1. Chart of units produced on a production line per anonymized period
The advantages of having the system implemented are obvious: prior to the implementation of it, and in reference to what is shown in Fig. 1, the visualization of this type of information was not graphical, nor as easy to filter as with the BI tool. With regard to the unit traceability of finished products to investigate who had made each product in the face of incidents, in the past it was 100% manual and bond-based (neither agile nor operational) while currently such information you get it instantly.
3 Results The Supply Chain cube is shown in Fig. 2. This screen is used to track the rate of service provided on time and quantity (On Time In Full, OTIF) per plant, customer, reference, etc. in the period studied showing also an evo- tion of the finished product stock both on heeldays as in Tons (chart at the bottom right). Finally, Fig. 3 shows a screenshot of part of the corporate BI cube, which tracks the group’s 75 strategic improvement projects: they track to see the extent to which the expected improvement (‘Target Annual Savings (K’) is being achieved through current progress (‘Actual Annual Savings (Ke)’). As part of the Spanish continuous market, the multinational must make its results public on a quarterly basis. As noted in the company’s release of results on a quarterly basis, improvements made in the area of operations (production and Supply Chain)have favored an improvement of the business of both the previous year and this year. Specifically, the letter justifies the results obtained on the basis of “the improvement in direct production costs, mainly in personnel costs and transport costs, together with the improvement of other operating expenses”. It is understood that BI has served to support decision support and has therefore influenced the outcome. It is worth noting that, for example, and as shown in the (top center) of Fig. 2 the OTIF service fee has improved by 6.42% compared the current year to the previous year, even if the costs associated with transport have been reduced. To the quantitative results obtained can be added another series of advantages obtained as a result of the implementation carried out, since the system has served,
272
J. Silva et al.
Fig. 2. Supply chain BI cube screen
Fig. 3. Corporate BI cube screen
among others, to: (a) discriminate customers based on business criteria based on data, (b) prioritize customers when a limited amount of resources have been available and it was not possible to serve everyone in the short term, (c) act agilely in the face of fluctuations in market availability in relation to certain raw materials, (d) improve monitoring of products that could become obsolete, etc.
4 Conclusions This article presents the potential of Big Data and BI systems within the framework of Industry 4.0 indicating that, however, the literature does not find a significant number of success stories or good practices of its application in the area of or manufacturing industry and specifying that no such cases have been found concerning the food industry. The following is the deployment and implementation of BI in an industrial food group that can be considered as a case of big data. This deployment sets out in detail the development of the solution and the functionalities obtained for the specific case of a production management control system and shows in its overall capacity part of the
Applying a Business Intelligence System in a Big Data Context
273
5 cubes created in the group. Subsequently, the results and advantages obtained through the implementation of the system are presented, among which are an improvement in the overall turnover, emphasizing an increase in the rate of service to customers and a reduction in production costs and Transport. At the pedagogical level it is worth noting that the developed case has served to develop pedagogical examples in the degrees and Master of Engineering in the classes that the authors teach. To do this, the free version of the software applied in the company shown (Qlikview®) has been used, offering students modified versions of the data handled in reality and facing them the challenge of building their own cubes. The results have been encouraging, as in some cases students have suggested to teachers improvements that the company has considered appropriate to implement. As for future lines, these are oriented to the analysis of how much is necessary the adequacy of the processes and information systems of the different parts of the company used as a support, for example, a common ERP system in all its areas so that it is in training obtained is fully confrontational, as the use of different processes and platforms makes it difficult to compare between different plants and delegations. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest. ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Blanchet, M., Rinn, T., Von Thaden, G., Georges, D.T.: Industry 4.0 The new industrial revolution How Europe will succeed. Rol. Berger Strateg. Consult. 1–24 (2014) 2. Bauer, W., Schlund, S., Ganschar, O., Marrenbach, D.: Industrie 4.0 3. Volkswirtschaftliches Potenzial für Deutschland. Bitkom, Fraunhofer Inst., pp. 1–46 (2014) 4. E. y P. Comisión Europea: Foro de políticas estratégicas sobre el emprendimiento digital, Dirección General de Mercado Interno, Industria. Report: Digital Transformation of European Industry and Enterprises (2015) 5. Luftman, J., Ben-Zvi, T.: Key issues for IT executives 2009: difficult economy’s impact on IT. MIS Q. Exec. 9(1), 203–213 (2009) 6. Mazuin, E., Yusof, M., Othman, M.S., Omar, Y., Rizal, A.: The study on the application of business intelligence in manufacturing: a review 10(1), 317–324 (2013) 7. Ur-Rahman, N.: Textual data mining for next generation intelligent decision making in industrial environment: a survey. Eur. Sci. J. 11(24), 1857–7881 (2015) 8. Fitriana, R., Djatna, T.: Progress in Business Intelligence System research: a literature review. Int. J. Basic Appl. Sci. IJBAS-IJENS 11(03), 96–105 (2011) 9. Fitriana, R., Djatna, T.: Business intelligence design for decision support dairy agro industry medium scaled enterprise. Int. J. Eng. Technol. 12(05), 1–9 (2012) 10. Rashid Al-Azmi, A.-A.: Data text and web mining for business intelligence: a survey. Int. J. Data Min. Knowl. Manag. Process. 3(2), 1–21 (2013) 11. Izquierdo, N.V., Lezama, O.B.P., Dorta, R.G., Viloria, A., Deras, I., Hernández-Fernández, L.: Fuzzy logic applied to the performance evaluation. Honduran coffee sector case. In: Tan, Y., Shi, Y., Tang, Q. (eds.) Advances in Swarm Intelligence, ICSI 2018. Lecture Notes in Computer Science, vol. 10942. Springer, Cham (2018)
274
J. Silva et al.
12. Amelec, V.: Increased efficiency in a company of development of technological solutions in the areas commercial and of consultancy. Adv. Sci. Lett. 21(5), 1406–1408 (2015) 13. Viloria, A., Lis-Gutiérrez, J.P., Gaitán-Angulo, M., Godoy, A.R.M., Moreno, G.C., Kamatkar, S.J.: Methodology for the design of a student pattern recognition tool to facilitate the teaching - learning process through knowledge data discovery (big data). In: Tan, Y., Shi, Y., Tang, Q. (eds.) Data Mining and Big Data, DMBD 2018. Lecture Notes in Computer Science, vol. 10943. Springer, Cham (2018) 14. May, R.J.O.: Diseño del programa de un curso de formación de profesores del nivel superior, p. 99. UADY, Mérida de Yucatán (2013) 15. Medina Cervantes, J., Villafuerte Díaz, R., Juarez Rivera, V., Mejía Sánchez, E.: Simulación en tiempo real de un proceso de selección y almacenamiento de piezas. Revista Iberoamericana de Producción Académica y Gestión Educativa (2014). ISSN 2007 – 8412. http://www.pag. org.mx/index.php/PAG/article/view/399 16. Minnaard, C., Minnaard, V.: Sumatoria de Creatividad + Brainstorming + la Vida Cotidiana: Gestar problemas de Investigación / Summary of Creativity + Brainstorming + EverydayLife: GestatingResearchProblems. Revista Internacional de Aprendizaje en la Educación Superior. España (2017) 17. Thames, L., Schaefer, D.: Software defined cloud manufacturing for industry 4.0. Procedía CIRP 52, 12–17 (2016)
Facial Expression Recognition Using Supervised Learning V. B. Suneeta(B) , P. Purushottam, K. Prashantkumar, S. Sachin, and M. Supreet Department of ECE, BVBCET, Hubballi 580031, India suneeta [email protected]
Abstract. Facial Expression Recognition (FER) draws much attention in present research discussions. The paper presents a relative analysis of recognition systems for facial expression. Facial expression recognition is generally carried out in three stages such as detection of face, extraction of features and expressions’ classification. The proposed work focuses on a face detection and extraction method is presented based on the Haar cascade features. The classifier is trained by using many positive and negative images. The features are extracted from it. For standard Haar feature images like convolutional kernel are used. Next, Fisher face classifier a supervised learning method is applied on COHN-KANADE database, to have a facial expression classifying system with eight possible classes (seven basic emotions along with neutral). A recognition rate of 65% in COHN-KANADE database is achieved. Keywords: Facial expression recognition · Supervised learning · Fisher face · Haar cascade · Feature extraction · Classification · PCA
1
Introduction
FER is important in interpersonal development of relationships. The intentions or situations can be anticipated by interpreting facial expressions in the social environment, allows people to respond appropriately. Human facial expressions can reveal feelings and emotions that control and monitor the behavior among themselves. The fast growing technology leads to humans and robots communication where robots are instructed through facial expressions. FER of humans by machines has been a very emerging area of research. Following are the major issues identified in FER: expressions’ variations, fast processing and product specific applications. Presently FER has become an essential research field due to its large range of applications and efforts are towards increasing FER accuracy, by using the facial expression database involving large span of ages. Universally there are eight basic expressions i.e., disgust, sadness, fear, surprise, happiness, anger, contempt and neutral. These expressions can be found in all societies and cultures. For any FER system, following are the necessary steps: (A) Face Detection (B) Features Extraction (C) Facial Expressions Classification. c Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 275–285, 2020. https://doi.org/10.1007/978-3-030-37218-7_32
276
V. B. Suneeta et al.
Authors carried out an extensive work on FER techniques using various feature extraction methods. Various techniques for face and FER have been presented. However each of them have their own limitations and advantages. The Eigenfaces and Fisherface algorithms, that are built most recent 2D PCA, and Linear Discriminant Analysis (LDA) are few examples of holistic methods. Various authors studied these methods but, local descriptor have caught attention because of their robustness to illumination and posed variations. Some of the other approaches are discussed further in the fore coming sections. Authors in paper [1] discussed about Automated Facial Expression using features of salient patches. This paper has presented facial expression recognition for six universal expressions. Authors in [2] discussed about FER that deals about facial expression for aging. Here Gabor and log Gabor filters are employed for extracting facial features. The use of log Gabor filter shows superior results when compared to Gabor filter. But the processing time in log Gabor filter is greater than that of Gabor filter. The accuracy of implementation using Gabor filter or log Gabor filter with SVM classifier is greater when compared to natural human prediction on standard as well as synthetic database. Authors in [3] discussed about Facial Expression Recognition which deals about aging with two databases such as lifespan and faces collected in psychology society. They have detailed explanation about the reasons of aging and their influence on FER. Authors in [4] discussed about FER using feed forward neural networks. Cohn Kanade dataset is used for recognition tests. It has given a better rate of recognition as compared to conventional linear PCA and histogram equalization. Authors in [5] presented their discussion about the working of the most recent face recognition algorithms. It was inferred from results that the latest algorithms are 10 times better than the face recognition algorithms of year 2002 and 100 times better than those of 1995. Some of the algorithms were capable of leaving behind human participants in recognizing faces. Also these algorithms were capable of identifying even identical twins. Authors in [6] discussed about Vytautas Perlibakas (2004) using Principal Component Analysis and Wavelet Packet Decomposition which permits the usage of PCA based face recognition involving enormous number of training images and are faster training than PCA method. Authors in [7] have introduced a face recognition algorithm and classification, depending on linear subspace projection. The subspace is established using neighboring component analysis (NCA) which uses dimensionality reduction introduced recently. Authors in [8] discussed about features fusion method using Fisher Linear Discriminant (FLD). The technique extracts features by employing Two-Dimensional PCA (2DPCA) and Gabor Wavelets, and then fuses their features that are extracted with FLD respectively. From literature review during 1980’s typical classification techniques were used for facial expression recognition. In recent years extraction of features like eyes, mouth etc. are developed using neural networks classifier.
Facial Expression Recognition Using Supervised Learning
277
The organization of the paper is as follows. The Sect. 2 explores the proposed methodology and Sect. 3 highlights the comparison of different methodologies used and Sect. 4 elaborates on the results and their analysis. The paper is concluded in Sect. 5. The contributions of the paper are to, – Develop a framework to define the expression detection problem as machine learning problem. – Propose a framework for feature extraction and training using the dataset.
2
Proposed Framework for Feature Extraction
In this section the architecture of facial expression recognition is being summarized, where each stage enhances the application of image data set with a brief discussion on them. 2.1
Face Detection
Face detection is a technology which recognizes human faces in images. To obtain a proper classification of expressions building an efficient algorithm is required. From the survey it is clear that fisher face classifier for classification and Haar Cascade technique for feature extraction yield good results. The data set is first sorted and images with different expressions are put into separate baskets. Then these sorted images are re-scaled into same size. After this the image sets are manually labeled as happy, sad, etc. into eight different labels as shown in Fig. 1. 2.2
Features Extraction
When the face is detected, key features like eyes, eyebrows, lips etc are extracted from the facial image. Extraction of features is related to dimensionality reduction. Feature extraction starts from initial stage of the given data and it constructs derived features to interpret informative and non redundant features. Feature extraction using Haar Cascade algorithm is applied to these labeled set of images as shown in Fig. 2. Calculation speed of Haar like features is high over other feature extraction techniques. Use of integral images provides the easy calculation of Haar-like features of any size in constant time. Every feature type can specify the availability or non availability of certain characteristics in the image, such as edges or modifications in texture. Feature extraction algorithm yields specific features for each eight different classes. Through this feature matrix X is obtained. This, along with label Y is sent to fisher-face classifier algorithm to obtain hypothesis hθ (X) for each class.
278
2.3
V. B. Suneeta et al.
Facial Expressions Classification
Last step of the automatic FER system is the classification that can be implemented either by attempting recognition or by interpretation as shown in Fig. 4. After the extraction of these important features, the expression is categorized by making the comparison between the image with the images in the training dataset using fisher face classifier. Classification is a general process related to categorization in which objects/images are identified, differentiated and understood. Classification of expressions using fisher-face classifier gives high success rate than Eigen-face classifier and bag of features technique. LDA is used in
Fig. 1. A framework to train the machine learning algorithm for seven facial expressions using the image data set as training data set.
Facial Expression Recognition Using Supervised Learning
279
Fig. 2. A framework to train and test the machine learning algorithm for eight facial expressions using the Haar cascade features to get extracted features [3].
fisher-face classification to reduce the dimension there by number of features required to classify face expressions is reduced. To determine the Fisher-faces, the data in every category is assumed to be Normally distributed (N D), the multivariate N D as Ni (μi , Vi ), with mean μi and covariance matrix Vi , and its pdf is given by, fi (x|μi , Vi )
(1)
In the C class problem, we have Ni (μi , Vi ), with i = 1....C. Given these NDs and their class prior probabilities Pi , the classification of a test sample x is given by comparing the log-likelihoods of, fi (x|μi , Vi )Pi
(2)
arg min di (x)
(3)
di (x) = (x − μi )T Σi−1 (x − μi ) + ln|Σi | − 2 ln Pi
(4)
for all i, i.e., 1≤i≤C
where,
However, in the case where all the covariance matrices are identical, Vi = V, ∀(i), the quadratic parts of di cancel out, providing linear classifiers. These classifiers are known as linear discriminant bases. Assume that C = 2 and that the classes are homoscedastic normals (Fig. 3). It ensures that number of wrongly classified samples in the original space of p dimensions and in this subspace of just one dimension are the same. This can be verified easily. In general, there will be more than 2-classes. In such case, there will be reformulation of the above stated problem as that of decreasing withinclass differences and increasing between-class distances. Within class differences can be determined using the within-class scatter matrix given by, Sw =
nj C j=1 i=1
(xij − μj )(xij − μj )T
(5)
280
V. B. Suneeta et al.
where xij is the ith sample of class j, μj is the mean of class j, and nj the number of samples in class j.
Fig. 3. A framework to test the machine learning algorithm for seven facial expressions using the image data set as testing set.
3
Comparison of Different Methods
Since correlation techniques are computationally expensive and need large quantity of storage, it is essential to follow dimensionality reduction schemes. 3.1
Eigen Face Mathematical Equations
(1) N sample images x1 , x2 , x3 , .......xN
Facial Expression Recognition Using Supervised Learning
(2) C classes X1 , X2 , X3 , ...........Xc (3) New feature vector yk ∈ Rm yk = W T xk
281
(6)
where k = 1,2,...N and W matrix with orthonormal columns. 3.2
Fisher Face Mathematical Equations
(1) Between class scatter matrix Sb =
C
(μj − μ)(μj − μ)T
(7)
j=1
where μ represents the mean of all classes. (2) Within class scatter matrix Sw =
nj C (xij − μj )(xij − μj )T
(8)
j=1 i=1
In case of Eigenface recognition scatter being increased is not only due to between-class scatter but, also due to the within-class scatter. But in Fisherface, ratio of the determinant of the between-class scatter matrix of the projected samples to that of within-class scatter matrix projected samples is maximized. Hence Fisherface is preferred.
4
Results and Discussions
Face expressions are classified into respective classes using the Fisherface classifier algorithm. Results and optimization technique are discussed below in three sub-sections. A. Face detection and Feature extraction. B. Classification. C. Performance variation on test size. comparison of fisherface and eigen face classifier algorithm is done in this section and conclusion is drawn. 4.1
Face Detection and Feature Extraction
Face detection with Feature extraction is the first step performed in FER system. In this step faces of different expressions are detected from the image data-set and then all important features of faces with different expressions are extracted from the image data set using Haar cascade features. Detected faces with different expressions and corresponding features of face are as follows.
282
V. B. Suneeta et al.
Fig. 4. Figure depicts the six different facial expressions of human [7].
Fig. 5. Figure shows the images of faces with happy expressions classified into class HAPPY.
4.2
Classification
Classification of images is being generalized using feature vector of projected test image, set hypothesis of each expression are computed and the images are put into respective class as shown in Figs. 5 and 6, depending on the label Y found in this step. Fisherface classifier gives more success rate than Eigen face classifier. In case of Eigen face only in between class scatter matrix is used where as in Fisherface both in between class scatter matrix and within class scatter matrix are used which contribute more success rate than eigenface. 4.3
Performance Variation on Test Size
Success rate defines the number of images rightly classified among test image set. It is expressed in terms of percentage. Here the dependency of success rate
Facial Expression Recognition Using Supervised Learning
283
Fig. 6. Figure shows the images of faces with anger expressions classified into class ANGER [6]
on test size variation is discussed. In general for a fixed training data set as test size increases, success rate decreases initially and after reaching certain test size, success rate increases with increase in test size. Table 1. Performance variation of classifier algorithm at fixed train size = 95% Train size = 95% Test size Success rate 1%
96.2%
3%
96%
5%
93.5%
10%
60%
20%
82.5%
30%
88.9%
40%
90.6%
50%
93.7%
The increasing success rate in the later stage is due to repeated occurrence of images in both test and train set of data. During the initial stage upto 10% of test size, system would get hardly few images for testing. Thus the rate of rightly classifying the images is more. A bias towards classes which have number of instances of standard classifier algorithms behavior is the reason for it’s behavior. The tendency is to predict only the majority class data. The minority class features are treated as noise and are often neglected. Thus it leads to a high probability of misclassification of these minority classes in comparison with the
284
V. B. Suneeta et al.
Table 2. Performance variation of classifier algorithm at fixed train size = 85% Train size = 85% Test size Success rate 1%
92.1%
3%
91.5%
5%
89.4%
10%
58.5%
20%
64%
30%
75.6%
40%
81%
50%
84%
Table 3. Performance variation of classifier algorithm at fixed train size = 75% Train size = 75% Test size Success rate 1%
89.3%
3%
88%
5%
85.6%
10%
56.7%
20%
60%
30%
65%
40%
73.6%
50%
78.1%
majority classes. By observing Tables 1, 2 and 3 the system has good mixture of both training and test data in Table 2. Hence data with train size of 85% is preferred.
5
Conclusion
An image processing and classification method has been implemented in which face images are used to train a dual classifier predictor that predicts the seven basic human emotions given a test image. The predictor is relatively successful at predicting test data from the same dataset used to train the classifiers. However, the predictor is consistently poor at detecting the expression associated with contempt. This is likely due to a combination of lacking training and test images that clearly exhibit contempt, poor pre-training labellings of data, and the intrinsic difficulty at identifying contempt.
Facial Expression Recognition Using Supervised Learning
285
Compliance with Ethical Standards. All authors declare that there is no conflict of interest No humans/animals involved in this research work. We have used our own data.
References 1. Fu, Y., Guo, G.-D., Huang, T.S.: Age synthesis and estimation via faces: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 32(11), 1955–1976 (2010) 2. Flynn, P.J., Scruggs, T., Bowyer, K.W., Worek, W., Phillips, P.J.: Preliminary face recognition grand challenge results. In: Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition, pp. 15–24, April 2016 3. Perlibakas, V.: Distance measures for PCA-based face recognition. Pattern Recogn. Lett. 25(6), 711–724 (2004) 4. Butalia, M.A., Ingle, M., Kulkarni, P.: Facial expression recognition for security. Int. J. Mod. Eng. Res. (IJMER) 2, 1449–1453 (2012) 5. Zhang, Z., Zhang, J.: A new real-time eye tracking for driver fatigue detection. In: 6th IEEE International Conference on Telecommunications Proceedings, pp. 181–188, April 2006 6. Guo, Y., Tian, Y., Gao, X., Zhang, X.: Micro-expression recognition based on local binary patterns from three orthogonal planes and nearest neighbor method. In: International Joint Conference on Neural Networks (IJCNN), pp. 3473–3479 (2014) 7. Tian, Y.L., Kanade, T., Cohn, J.F.: Facial expression recognition. In: Li, S., Jain, A. (eds.) Handbook of Face Recognition, pp. 487–519. Springer, London (2011) 8. Pawaskar, S., Budihal, S.V.: Real-time vehicle-type categorization and character extraction from the license plates. In: International Conference on Cognetive Informatics and Soft Computing-2017. Advances in Intelligent Systems and Computing a Springer book chapter, VBIT, Hyderabad, pp. 557–565, July 2017 9. Wu, Y.W., Liu, W., Wang, J.B.: Application of emotional recognition in intelligent tutoring system. In: First IEEE International Workshop on Knowledge Discovery and Data Mining, pp. 449–452 (2008) 10. Youssif, A., Asker, W.A.A.: Automatic facial expression recognition system based on geometric and appearance features. Comput. Inf. J. 4(2), 115–124 (2011) 11. Gomathi, V., Ramar, K., Santhiyaku Jeevakumar, A.: A neuro fuzzy approach for facial expression recognition using LBP histograms. Int. J. Comput. Theory Eng. 2(2), 245–249 (2010) 12. Hawari, K., Ghazali, B., Ma, J., Xiao, R.: An innovative face detection based on skin color segmentation. Int. J. Comput. Appl. 34(2), 6–10 (2011)
Optimized Machine Learning Models for Diagnosis and Prediction of Hypothyroidism and Hyperthyroidism Manoj Challa(B) Department of Computer Science and Engineering, CMR Institute of Technology, Bengaluru, India [email protected]
Abstract. Thyroid is a major complaint happens due to absence of thyroid hormone among the human beings. The diagnosis report of thyroid test composed of the T3 (Triiodothyronine, T3-RIA) by radioimmunoassay, T4 (Thyroxine), FT4 (FT4, Free Thyroxin), FTI (Free Thyroxine Index, FTI, T7), TSH (Thyroid Stimulating Hormone). Manual evaluation of these factors is tedious, so machine learning approaches will be used for this diagnosis and prediction of hypothyroidism and hyperthyroidism from the larger datasets. In this proposed research, the optimized machine learning classification approaches Decision Tree, Random Forest, Bayes Classification and K- nearest neighbor methods are used for predicting the thyroid disorder. The performance measure of all these approaches are estimated and compared with R Programming Tool. Keywords: Thyroid · Hypothyroidism · Hyperthyroidism · Diagnosis · R
1 Introduction The thyroid gland is located below the Adam’s apple as shown in the Fig. 1 wrapped around the trachea (windpipe). A thin area of tissue in the gland’s middle, known as the isthmus, joins the two thyroid lobes on each side. When the thyroid produces either too much or too little of these hormones it causes the gland to work wrongly, leading to disorders like hyperthyroidism and hypothyroidism. The Risk factors of hypothyroidism and hyperthyroidism are depicted in the Table 1.
© Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 286–293, 2020. https://doi.org/10.1007/978-3-030-37218-7_33
Optimized Machine Learning Models
287
Fig. 1. Thyroid Disorder [6]
Table 1. Hypothyroidism and hyperthyroidism risk factors Hypothyroidism
Hyperthyroidism
Age and gender
Gender
Preexisting condition
Family or personal history of autoimmune disorders
Pituitary gland disorder Vitamin D and selenium deficiencies Pregnancy
Current pregnancy
1.1 Factors that Affect Thyroid Function Figure 2 depicts the routine factors that rise the possibility of thyroid complaints. Some routine factors rise the thyroid complaints are listed below. • • • •
Smoking. Psychological stress. Injury to the thyroid. High amounts of lithium and iodine usage.
1.2 Thyroid Function Tests Thyroid function diagnosis composed of the following parameters (1) (2) (3) (4) (5)
T3 - T3 (Triiodothyronine, T3-RIA) by radioimmunoassay T4 - T4 (Thyroxine) FT4 - T4 (FT4, Free Thyroxin) FTI - FTI (Free Thyroxine Index, FTI, T7) TSH - TSH (Thyroid Stimulating Hormone)
288
M. Challa
Fig. 2. Factors that affect Thyroid Function [5]
2 Proposed Classification Models 2.1 Decision tree is one of the very widespread machine learning approach. Decision Tree is a Supervised Machine Learning algorithm which resembles like an inverted tree. Each node in the tree represents a predictor variable (feature), the link between the nodes denotes a Decision and each leaf node denotes an outcome (response variable). 2.2 Random Forest (RF) is an classification approach that comprises of several decision trees and outputs the class that is the mode of the classes output by individual trees. Random Forests are frequently used when there are very large training datasets and a very large number of input variables. A random forest model is normally made up of tens or hundreds of decision trees. 2.3 Bayesian classification is basically based on Bayes’ Theorem. This Bayesian classification approach can predict the probability of class membership. For example, the probability that a given tuple belongs to a particular class. Classification algorithms have found a simple Bayesian classifier known as naïve Bayes classifier to be comparable in performance with decision tree and selected neural network classifiers. High amount of accuracy and speed will be achieved through Bayesian Classifiers. 2.4 K-NN (Nearest-Neighbor Method) Nearest-neighbor classifiers approach is based on learning by analogy. It compares a given test dataset with training dataset that are similar to it.
3 Implementation The dataset is extracted from the UCI Machine Learning Repository. The dataset contains 3163 instances. In this 151 data comes under hypothyroid and 3012 data is negative cases. The attributes are shown in the table below (Table 2):
Optimized Machine Learning Models
289
Table 2. Hypothyroidism and hyperthyroidism attributes S. No
Attribute name
Possible values
S. No
Attribute name
Possible values
1
AGE
NUMERIC
15
HYPOPITUITARY F, T
2
SEX
M (0), F (1)
16
PSYCH
3
ON THYROXINE
F, T
17
TSH MEASURED F, T
4
QUERY ON THYROXINE
F, T
18
TSH
NUMERIC
5
ANTITHYROID MEDICATION
F, T
19
T3 MEASURED
F, T
6
SICK
F, T
20
T3
NUMERIC
7
PREGNANT
F, T
21
TT4 MEASURED
F, T
8
THYROID SURGERY
F, T
22
TT4
NUMERIC
9
I131 TREATMENT
F, T
23
T4U MEASURED
F, T
10
QUERY HYPOTHYROID
F, T
24
T4U
NUMERIC
11
QUERY F, T HYPERTHYROID
25
FTI MEASURED
F, T
12
LITHIUM
F, T
26
FTI
NUMERIC
13
GOITRE
F, T
27
TBG MEASURED F, T
14
TUMOR
F, T
28
TBG
F, T
NUMERIC
The following Table 3 depicts the vital attributes that plays a major role in both hypothyroidism and hyperthyroidism disorder Table 3. Mean, Median, Standard Deviation, Range, Skewness and Kurtosis S. No
Target Min Attribute Value Name
Max Value
Mean
Median
Standard Range Deviation
Skewness Kurtosis
1
AGE
1
98
51.15
54.00
19.29
97
−0.16
−0.96
18
TSH
0
530
5.92
0.70
23.90
530
10.21
153.65
20
T3
0
10.20
1.94
1.80
1.00
10.20
2.12
10.25
22
TT4
2
450
108.85
104
45.49
448
1.51
6.94
24
T4U
0
2.21
0.98
0.96
0.23
2.21
0.92
3.41
26
FTI
0
881
115.40
107
60.24
881
5.10
47.10
28
TBG
0
122
31.28
28.00
19.22
122
2.82
8.96
290
M. Challa
4 Results and Discussion The Naïve Bayes, Decision Tree, KNN and SVM classification approaches implementation is done using R programming tool with imported thyroid datasets from UCI machine learning repository. The results are estimated and evaluated, and it is clearly visible that Decision Tree Approach shows the high accuracy for the prediction of Hyperthyroidism when compared to the other classification algorithms Naïve Bayes, KNN and SVM (Fig. 3 and Table 4). Table 4. Performance evaluation for hyperthyroidism Recall
Precision
Sensitivity
Specificity
F-Measure
0.444
0.400
0.444
0.976
0.421
Decision Tree
0.778
1
0.778
1
0.875
KNN
0.680
0.960
0.100
0.100
0.960
SVM
0.060
0.870
0.110
0.810
0.880
Hyperthyroidism Naïve Bayes
Fig. 3. Performance evaluation of hyoerthyroidism
The Naïve Bayes, Decision Tree, KNN and SVM classification approaches implementation is done using R programming tool with imported thyroid datasets from UCI machine learning repository. The results are estimated and evaluated, its clearly visible that Decision Tree Approach shows the high accuracy for the prediction of Hypothyroidism when compared to the other classification algorithms Naïve Bayes, KNN and SVM (Figs. 4, 5 and Tables 5, 6).
Optimized Machine Learning Models
291
Table 5. Performance evaluation for hypothyroidism Recall
Precision
Sensitivity
Specificity
F-Measure
0.500
0.455
0.500
0.976
0.476
Decision Tree
0.900
0.600
0.900
0.976
0.720
KNN
0.680
0.960
0.100
0.100
0.960
SVM
0.060
0.870
0.110
0.810
0.880
Hypothyroidism Naïve Bayes
Fig. 4. Performance evaluation of hyperthyroidism
4.1 Accuracy of Classification Methods The accuracy of the classification approaches of the thyroid dataset based on the classification models Naïve Bayes, Decision Tree, KNN and SVM algorithms is depicted in Table 6. Accuracy of classification models Models
Accuracy
Naïve Bayes
91.60
Decision Tree 96.90 KNN
95.10
SVM
96.00
292
M. Challa
Fig. 5. Accuracy of classification models
the Table. From their performance it is observed that the Decision Tree Classification Approach is consistently producing more accuracy than the other three approaches for both hypothyroidism and hyperthyroidism prediction.
5 Conclusion In this research paper the authors have deliberated the implementation of four classification approaches namely, Naïve Bayes, Decision Tree, KNN and SVM on UCI Machine Learning Repository thyroid data set for accurate prediction of hypothyroidism and hyperthyroidism disorder. The best classification approach was the decision tree approach in all the effectuated implementations. The future work will focus on the diagnosis and prediction of risk factors that affect the thyroid diseases and on testing more machine learning approaches for the classification of vital human diseases. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest. ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Bologna, G.: A model for single and multiple knowledge based networks. Artif. Intell. Med. 28, 141–163 (2003) 2. Chang, W.W., Yeh, W.C., Huang, P.C.: A hybrid immune-estimation distribution of algorithm for mining thyroid gland data. Artif. Intell. Med. 41, 57–67 (2007)
Optimized Machine Learning Models
293
3. Chen, H.L., Yang, B., Wang, G., Liu, J., Chen, Y.D., Liu, D.Y.: A three stage expert system based on support vector machines for thyroid disease diagnosis. J. Med. Syst. 36, 1953–1963 (2012) 4. Dogantekin, E., Dogantekin, A., Avci, D.: An automatic diagnosis system based on thyroid gland. ADSTG. Expert Syst. Appl. 37, 6368–6372 (2010) 5. Dogantekin, E., Dogantekin, A., Avci, D.: An expert system based on Generalized Discriminant analysis and Wavelet Support Vector Machine for diagnosis of thyroid disease. Expert Syst. Appl. 38(1), 146–150 (2011) 6. Duch, W., Adamezak, R., Grabezewski, K.: A new methodology of extraction, optimization and application of crisp and fuzzy logic rules. IEEE Trans. Neural Netw. 12, 277–306 (2001) 7. Temurtas, F.: A comparative study on thyroid disease diagnosis using neural networks. Expert Syst. Appl. 36, 944–949 (2009) 8. Jiawei, H., Micheline, K., Jian, P.: Data Mining Concepts and Techniques. Morgan Kaufmann, Tokyo (2006) 9. Keles, A., Keles, A.: ESTDD: expert system for thyroid disease diagnosis. Expert Syst. Appl. 34, 242–246 (2008) 10. Kele¸s, A., Kele¸s, A.: ESTDD: expert system for thyroid diseases diagnosis. Expert Syst. Appl. 34(1), 242–246 (2008) 11. Kodaz, H., Ozsen, S., Arslan, A., Gunes, S., Arslan, A., Gunes, S.: Medical application of information gain based artificial immune recognition system (AIRS): diagnosis of thyroid disease. Expert Syst. Appl. 36, 3086–3092 (2009) 12. Kumari, M., Godara, S.: Comparative study of data mining classification methods in cardiovascular disease prediction 1. IJCST 2 (2011). ISSN 2229-4333 13. Lavarello, R.J., Ridgway, B., Sarwate, S., Oelze, M.L.: Imaging of follicular variant papillary thyroid carcinoma in a rodent model using spectral-based quantitative ultrasound techniques. In: IEEE 10th International Symposium on Biomedical Imaging (ISBI), pp. 732–735 (2013) 14. Li, L.N., Ouyang, J.H., Chen, H.L., Liu, D.Y.: A computer aided diagnosis system for thyroid disease using extreme learning machine. J. Med. Syst. 36, 3327–3337 (2012) 15. Liu, D.Y., Chen, H.L., Yang, B., Lv, X.E., Li, L.N., Liu, J.: Design of an enhanced fuzzy k-nearest neighbour classifier based computer aided diagnostic system for thyroid disease. J. Med. Syst. 36, 3243–4354 (2012) 16. Ng, S.K., McLachlan, G.J.: Extension of mixture-of-experts networks for binary classification of hierarchical data. Artif. Intell. Med. 41, 57–67 (2007) 17. Ozyilmaz, L., Yildirim, T.: Diagnosis of thyroid disease using artificial neural network methods. In: Proceedings of ICONIP 2002 9th International Conference on Neural Information Processing, Orchid Country Club, Singapore, pp. 2033–2036 (2002) 18. Pasi, L.: Similarity classifier applied to medical data sets. In: 10 sivua. Fuzziness in Finland 2004. International Conference on Soft Computing, Helsinki, Finland & Gulf of Finland & Tallinn, Estonia (2004) 19. Polat, K., Sahan, ¸ S., Güne¸s, S.: A novel hybrid method based on artificial immune recognition system (AIRS) with fuzzy weighted pre-processing for thyroid disease diagnosis. Expert Syst. Appl. 32(4), 1141–1147 (2007) 20. Parimala, R., Nallaswamy, R.: A study of spam e-mail classification using feature selection package. Global J. Comput. Sci. Technol. 11, 44–54 (2011) 21. Godara, S., Singh, R.: Evaluation of predictive machine learning techniques as expert systems in medical diagnosis. Indian J. Sci. Technol. 910 (2016) 22. Savelonas, M.A., Iakovidis, D.K., Legakis, I., Maroulis, D.: Active contours guided by echogenicity and texture for delineation of thyroid nodules in ultrasound images. IEEE Trans. Inf. Technol. Biomed. 13(4), 519–527 (2009)
Plastic Waste Profiling Using Deep Learning Mahadevan Narayanan, Apurva Mhatre, Ajun Nair(B) , Akilesh Panicker, and Ashutosh Dhondkar Department of Computer Engineering, SIES Graduate School of Technology, Navi Mumbai, India {mahadevan.narayanan16,apurva.mhatre16,ajun.nair16, akilesh.panicker16,ashutosh.dhondkar16}@siesgst.ac.in
Abstract. India is the second populated country across the globe, with a rough population of 1.35 billion. Around 5.6 million tonnes of plastic wastes are generated every year, which is approximately about 15,342 tonnes per day [1]. India produces more plastic than its recycling limit. Multi-National companies like Frito Lay, PepsiCo are accountable for most of the waste generated. The Government of India has decided to make India plastic-free by 2022, though the rules and regulations pertaining to reach the goal are not robust enough to inflict any change but the citizens can make this change. We will explore a potential solution using Machine Learning, Image processing and Object detection to assist the smooth profiling of the waste in our locality. Keywords: Deep Learning · Object detection · Image processing · Plastic waste · Convolutional Neural Network · TensorFlow Object Detection API · Faster RCNN
1 Introduction The plastic pollution crisis in India is at its apex and the issue is India doesn’t have any efficient waste management system, hence the country produces more waste than it can recycle. Half of the waste generated comes from major metropolitan cities like Mumbai, Delhi and Kolkata. Majority of the waste generated consists of packaged plastic consumables which include chips biscuits, chocolates and bottled drinks. The companies responsible for manufacturing these are in a way responsible for the current circumstances. EPR (Extended Producer Responsibility) is a law which was issued by the government which states that it is the responsibility of the producer to manage and recycle the post-consumer goods [2]. Most of the multinational companies don’t take responsibility for the waste they produce. Food Manufacturing Companies are the major contributors in the game. Out of the total plastic waste generated, most of the plastic waste is generated by Nestle, PepsiCo and Coca-Cola and many such multinational companies [3]. This waste is the root cause of all the underlying plastic pollution caused in India. High time is during monsoon when this waste gets accumulated in the sewage system causing the water level to rise and hence causing floods. We have developed a © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 294–303, 2020. https://doi.org/10.1007/978-3-030-37218-7_34
Plastic Waste Profiling Using Deep Learning
295
solution to restrain this problem by empowering the citizens to help contribute to control plastic pollution. Another perspective for this project can be for the multinational companies to help them set up collection points for plastic waste strategically to incur lower costs.
2 Survey of Object Detection Methods Object detection is a common term in computer vision for classification and localization of objects in an image. Object detection in recent times is largely based on use of convolutional neural networks (CNN). Most deep learning models use CNN at some point in their architecture. Some of the models that are trending today are Faster R-CNN, YOLO (You Only Look Once) and Single Shot Detector (SSD) [9] Some of these models gives an accuracy greater than a human eye can identify an object. It is difficult to have a fair comparison among different models for object detection and there is no single answer on which model is the best amongst all. For real-life applications like plastic waste brand detection, we had to make choices to balance accuracy, speed, object size, number of proposals and input image resolution. Besides these many other choices impact the performance like data augmentation, feature extractors, training data, number of proposals, strides, etc [7]. YOLO (You Only Look Once) works on a single convolutional network on the whole input image to predict bounding boxes with confidence scores for each class simultaneously. This approach is simple and hence the YOLO model is extremely fast (compared to Faster R-CNN and SSD). Single Shot Detector (SSD) also has a lightning speed but less than YOLO as it takes only one shot of the image to detect multiple objects within an image. Methods such as YOLO or SSD work very fast, but this amounts to a decrease in accuracy. On the other hand, Faster R-CNN achieve higher accuracy but is a bit more
Fig. 1. Accuracy vs Time, with colours denoting feature extractor and shapes denoting metaarchitecture. Each (feature extractor, meta-architecture) pair can correspond to more than one points due to changing strides, input sizes, etc.
296
M. Narayanan et al.
expensive to run. R-FCN (Region based Fully Convolutional Network) uses positionsensitive score maps which helps to increase processing speed while maintaining good accuracy as Faster R-CNN. R-FCN and Faster R-CNN demonstrate a good advantage in terms of accuracy if real-time speed is not desired by the application. SSD works well for real-time processing (Fig. 1). The next important factor is the selection of a good feature extractor. Inception v2 and Mobile net feature extractors are most accurate and are the fastest respectively. ResNet falls somewhere in between them. But if the number of proposals in ResNet were reduced then the accuracy improved. Another important factor is how the model performs on different sizes of objects. Not surprisingly, all models give pretty good results on large objects. But SSD models with any architecture typically has a below par performance on small objects. Although it is on par with R-FCN and Faster RCNN to detect large objects. In our use case we want to label plastic brands that we find lying around in streets or public spaces. Hence our model needs to have a good accuracy in detecting small plastic wrappers as well. It has been observed by other authors that input resolution can significantly impact object detection accuracy. Higher the resolution better is the detection performed on small objects. Resolution does not impact to a certain extent for large objects. For R-FCN and Faster R-CNN, we can alter the number of proposals that is given by the region proposal network. Number of proposals is the number of regions in the feature map (output of the feature extractor CNN) identified by the region proposal network which can be an object. Observations were made such that when the number of proposals was set to 100, the speed and accuracy for Faster R-CNN with ResNet becomes comparable to that of R-FCN models (which uses 300 proposals). Memory analysis in this experiment showed that more powerful feature extractor used up more memory. Faster RCNN with inception ResNet architecture used way more memory space than with ResNet101 architecture. Hence to sum up YOLO, SSD and R-FCN models are faster on average but cannot beat the Faster R-CNN in accuracy if speed is not a prime factor in the desired application. And Faster RCNN with ResNet101 as feature extractor we got a commendable speed which provided a good trade-off between accuracy, speed and computational power which was needed [4, 10] (Fig. 2). 2.1 Working of Faster RCNN in Detecting Plastic Waste Our final goal is to obtain the following: 1. Bounding boxes around the logo of plastic brand. 2. A label assigned to each bounding box indicating the brand label. 3. A probability for each label to be of a particular brand. First step in the working of the model is the feature extractor (ResNet). Images given as input to the model are in the form of multidimensional arrays (also known as tensors) having Height × Width × Depth, which are passed through a pre-trained CNN. After going down a few layers an intermediate layer is reached called a convolutional feature map which will be used in the next part.
Plastic Waste Profiling Using Deep Learning
297
Fig. 2. Faster R-CNN architecture
Every convolutional layer in CNN creates some abstractions based on information from the previous layer. The first layers of CNN usually learn edges or borders, the second layer identifies patterns in edges and as the layer deepens it starts to earn more complex feature that humans cannot visualize. Hence a convolutional feature map is created which has information about the spatial dimensions which is much smaller compared to original image, but has greater depth. In our case ResNet is used as a feature extractor. Next in the architecture is a Region Proposal Network (RPN). Using the convolutional feature map that feature extractor module computed in the earlier stage, RPN locates some regions in the original image (bounding boxes), which may contain some objects. Number of these regions are already defined. These boxes may or may not contain the whole object. One of the hardest issues with Deep Learning (DL) used for object detection is generation of a list of variable length bounding boxes that is when RPN proposes the existence of an object at a particular spot the bounding box is not accurate. The variable-length problem is solved in the RPN by using anchors. Anchors are a set of many bounding boxes that are placed throughout the image having varying sizes and are used as reference while predicting location of objects. The RPN takes in all the anchors (reference boxes) and gives a set of good proposals (where the object is located). The RPN computes two outputs which is used in the next step. First is the probability that the object within the anchor is some relevant object. The RPN does not take into account what class the object belongs to. The only focus is that it does look like an object. The RPN thus gives two scores to every anchor, the score of it(anchor) being background (not an object) and the score of it being foreground (an actual object). The second output is the change in bounding box for adjusting the anchors to better locate the object in the image [11]. The final score of each anchor is used to compute object coordinates and thus a good set of proposals for objects in the image is obtained.
298
M. Narayanan et al.
Then region of interest (RoI) pooling is applied on the features map that was generated by the CNN in the earlier stage and the anchors (bounding boxes) with relevant objects by RPN. Then relevant features from feature map is extracted which would correspond to the relevant objects. Using these features a new tensor is formed. At last, the R-CNN module uses the extracted features to assign a class to the object in the bounding box and adjust the coordinates using two fully connected layers as in CNN.
3 Data Collection Here the main goal was to target the most commonly found plastic waste on streets which are produced by multinational companies. Our application would force them to reduce the plastic that they produce and help them recycle the same plastic which otherwise ends up in streets, water bodies, clogging canals and causing every form of pollution there is. Some plastics are burnt which is hazardous to the atmosphere. Many online resources and surveys were also considered. Lays, Kurkure, Sprite, Vimal and Pepsi were the brands most commonly found on Indian streets and public places [3]. Frito-Lay, Inc is a subsidiary of PepsiCo (Pepsi manufacturers) that manufactures and sells snack foods known as Lays. This company also manufactures Kurkure. Sprite is manufactured by The Coca Cola company. All these companies have a huge market share in India. These are also some of the most plastic pollution companies in the world as well as in India. We web scraped images of all the above brands by using a simple python script. These images needed to be filtered out and we ended up with approximately 100 images per brand. But the problem with these images were that it was a full resolution clear image most of which have a white background. For building a robust model the training data should have crumbled plastic waste, torn or dis-coloured logos, and distorted images with different background where people may find waste plastic. To create a good dataset the only source for collecting data was to click images of thrash, dustbins, plastics lying around in streets, near railway stations, dumping ground and other public places. More than 200 images of each brand were collected. The application was designed keeping in mind citizens as end users. Hence, the training data almost covered all the possible places from where citizens could click images. A total of more than 1800 images were collected.
4 Data Augmentation Data augmentation is a powerful technique that allows to significantly increase the diversity and quantity of data available for training as well as testing without actually having to collecting any new data. Data augmentation were performed on the data that we manually collected from various places. Following techniques were performed using some simple python script.
Plastic Waste Profiling Using Deep Learning
299
Flip Flipping images horizontally and vertically. Rotation It involves simple rotation of the image by 90°. Cropping Some portion of the image is cropped and then resized to the size of original image also known as random cropping. Gaussian Noise Over-fitting of a deep learning model is most likely to happen when neural network tries to learn patterns that occur more frequently. A method to solve this is the salt and pepper noise, which is random black and white pixels spread through the image. Adding these grains to images will decrease the chances of over- fitting. These data augmentation techniques increase size of the relevant dataset 4–5 times the original number of dataset. Scale The training images are scaled outward (Zoom in) or inward (Zoom out) to get a diversified data. For scaling outwards most image frameworks crop out some part from the new image but end image is of size equal to the original image.
5 Implementation 5.1 Object Detection with Customized Model TensorFlow Object Detection API can be used with a variety pre-trained model. In this work, a Faster Rcnn model with Resnet101 (faster_rcnn_resnet101_coco) was chosen. The model had been trained using MSCOCO Dataset which consists of 2.5 million labelled object instances in 328,000 images, containing 91 object types. All the dataset was labelled using LabelImg tool. LabelImg is a simple tool to annotate objects and xml file is created for each image comprising of annotations coordinates. Training of the model was performed with 85% of the total dataset. Another 10% was used for testing and 5% data was used for validating the results. The pre-trained Faster Rcnn model (Faster Rcnn resnet101) was fine-tuned for our dataset using manually labelled images. A provided configuration file (fasterrcnn_resnet101.config) was used as a basis for the model configuration (after testing different configuration settings, the default values for configuration parameters were used). The provided checkpoint file for faster_rcnn_resnet101_coco was used as a starting point for the finetuning process. Training the model in took about 48 h with Nvidia 940Mx GPU. The total loss value was reduced rapidly as training was started from the pre-trained checkpoint file [6, 8].
300
M. Narayanan et al.
5.2 User Interface Image input A desktop application written in python was developed where there are options to upload an image to detect plastic item brands in those images. These images then get stored in an archive folder from where results can be viewed. One can send images which will be loaded into a folder where it is passed on to the neural network to perform plastic waste detection. Location (of the device) from where the image was taken is also recorded along with date and time. The results of object detection will be same as input image but with a bounding box surrounding the logo of the plastic waste item found in that image. A confidence percentage adjoining the box will also be visible which will be greater than 60 as the threshold is set at 60%. This output image will be stored in the archive folder. But the output that gets stored in the database is in the form of number of instances (count) of a plastic waste identified in the image along with date, time and location. Video Input Input can also be given in video format. A separate option is listed in the menu for uploading a video. The video format is converted into images via a simple python script. To avoid duplication and speed up the processing one frame is captured among ten consecutive frames from the video. For example, consider that the video file has 20 frames per second, only 2 frames will be considered and these two frames (images) will be sent as an image input to the neural network. Output of each frame will be a frame with a bounding box surrounding the plastic brand logo. Count of each brand in each frame will be recorded. Unless there is an increase in the count of plastic waste for a certain brand in consecutive frames the resultant count of that plastic brand will not be updated. If a decrease in count for a certain brand is found in subsequent frame then the previous greater value will still be the final count of that brand and that count will be updated in the database. Duplicate entries will be avoided in this manner. The output frames are then collated to generate a video. This video gets stored in another archive folder. Plastic Waste Analysis and Output Visualization As soon as the output is generated whether as an image or video the count of plastic waste from that image or video is updated in the database by using SQL queries. This database has following columns- image name, date, time, location and count of each plastic brand in that image. For easier visualization the count of each plastic brand recorded in database over a certain period of days and location can be viewed as a line graph with an option to select a particular location and date to get a more precise visualization. A cumulative count for a particular location can be visualized by choosing a bar graph. For further analysis the database for a specific location and days can be exported in csv format from the application directly to email inbox (Figs. 3, 4 and 5).
Plastic Waste Profiling Using Deep Learning
301
Fig. 3. Interface to visualize the stored database of different plastic brand identified in between a span of the dates entered by the analyst.
Fig. 4. Visualization using bar graph where X - axis represents brand names and Y- axis represents count of plastic waste generated at a particular location.
302
M. Narayanan et al.
Fig. 5. Visualization using line graph where X-axis represents a span of number of days entered by the user and Y- axis represents count of plastic waste obtained in that span. Colour of the line indicates the different brands of plastic waste.
Fig. 6. Input image taken from a garbage dump that has various plastic items [5]
Fig. 7. Output after processing the image with bounding boxes around plastic brand logos. The model successfully predicted two thumbs up bottles, a wrapper of vimal and one sprite bottle.
6 Results The accuracy of the model on validation data provided a good accuracy. Around 150 images in the validation dataset were tested and the results were fairly accurate. The model performed accurate detection in almost all cases. Some images where the plastic was torn in pieces or where the logos were covered with dust are the images where the model failed. Very few images were present in the training data. Otherwise the model did a good prediction on test images taken in garbage dumps, streets and gardens. The model provided accurate results 90% of times from the test dataset. There is a scope for
Plastic Waste Profiling Using Deep Learning
303
improvement in accuracy by adding more training images by crowd sourcing the data. More brands of plastic can be added to make this application practically viable (Figs. 6 and 7).
7 Conclusion The goal of this project was to create a profile of companies that causes plastic waste pollution. The audience are the citizens who send the images of plastic waste lying around in public spaces. Creating a profile of such waste will help to curb multinational companies who are causing pollution. Government authorities can take action for those companies that not abiding the law. This deep learning model can be trained to detect many brands of plastic, which in turn increases the scope of applications. The solution was implemented as a proof of concept and we can further expand this solution to a larger scale where we can significantly reduce plastic pollution in our country and throughout the world. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest. ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. https://india.mongabay.com/2018/04/india-to-galvanise-greater-action-against-plasticwaste-on-world-environment-day/ 2. https://en.wikipedia.org/wiki/Extended_producer_responsibility 3. https://www.downtoearth.org.in/news/africa/coca-cola-pepsico-nestl-worst-plasticpolluters-in-global-cleanups-brand-audits-61834 4. Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., Fischer, I., Wojna, Z., Song, Y., Guadarrama, S.: Speed/accuracy trade-offs for modern convolutional object detectors. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017) 5. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 2–5 (2017) 6. https://medium.com/@14prakash/understanding-and-implementing-architectures-of-resnetand-resnext-for-state-of-the-art-image-cf51669e1624 7. https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/ detection_model_zoo.md 8. https://medium.com/@sidereal/cnns-architectures-lenet-alexnet-vgg-googlenet-resnet-andmore-666091488df5 9. Ren, S., He, K., Girshick, R., Zhang, X., Sun, J.: Object detection networks on convolutional feature maps. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1476–1481 (2017) 10. https://mc.ai/object-detection-speed-and-accuracy-comparison-faster-r-cnn-r-fcn-ssd-andyolo/ 11. Tang, C., Feng, Y., Yang, X., Zheng, C., Zhou, Y.: The object detection based on deep learning. In: 2017 4th International Conference on Information Science and Control Engineering (ICISCE), Changsha, pp. 723–728 (2017)
Leaf Disease Detection Using Image Processing and Artificial Intelligence – A Survey H. Parikshith(B) , S. M. Naga Rajath, and S. P. Pavan Kumar Department of Computer Science and Engineering, Vidyavardhaka College of Engineering, Mysuru, Karnataka, India [email protected], [email protected], [email protected]
Abstract. In today’s world, crop diseases are one of the main threats to crop production and also to food safety. Disease detection using traditional methods that are not so accurate. Current phenotyping methods for plant disease are predominantly visual and are therefore slow and sensitive to human error and variation. Accuracy can be achieved using technologies such as artificial intelligence, IoT, algorithm based on rules, machine learning regression techniques, image processing, transfer learning, hyper-spectral imagery, leaf extraction and segmentation. Keywords: Leaf disease · Artificial intelligence · Image processing · Crop
1 Introduction Crop diseases remain a major threat to the world’s food supply. Due to advances in technology, new methods have been developed for the early detection and diagnosis of plant diseases. These methods are mainly based on deep learning techniques or models that include convolutionary neural networks (CNN), transfer learning, support vector machines (SVM) and genetic algorithms for hyper-spectral band selection. A Convolutionary Neural Network (CNN or ConvNet) is a special type of multi - layer neural network that recognizes visual patterns directly from pixel images with minimal preprocessing. The CNNs are inspired by the natural biological process that is similar to the visual cortex of animals. AlexNet and GoogleNet are the basis of the CNN model. AlexNet is the name given to Alex Krizhevsky’s CNN model, containing a total of eight layers; the first five layers are convolutionary layers and the last three are fully connected layers. AlexNet uses the function of non- saturated ReLu activation. Transfer learning is a popular machine learning method that uses pre - trained models as a starting point for computer vision and the processing of natural languages. Transfer learning and imaging techniques are used in the detection of diseases. Image recognition with transfer learning prevents the complex process of extracting features from images to train models. Another approach used to address the problem of disease detection in plants is the use of genetic algorithms as an optimizer and as a classifier of vector machines. These methods include the acquisition of hyper-spectral images for classification and © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 304–311, 2020. https://doi.org/10.1007/978-3-030-37218-7_35
Leaf Disease Detection using Image Processing and Artificial Intelligence
305
the use of the charcoal rot protocol where disease progression was manually evaluated by measuring the length of soybean external, internal and dead tissue lesions. It’s a difficult problem to segment leaves from natural images. It’s a difficult problem to segment leaves from natural images. To overcome this problem, a unique method, including leaf extraction, segmentation, leaf structure refining and morphological processing, was developed and developed. The refining of the leaf structure consists of primary and secondary vein detection.
2 Literature Review Sarangdhar et al. [1] have proposed a system that uses vector- based regression to identify cotton leaf diseases and that also uses Gabor filter, median filter and an android application with Raspberry pi integration to display and control the disease. Morco et al. [2] have developed a mobile application called e- Rice that provides farmers with information to identify rice problems using algorithms based on rules to detect and prescribe possible control methods. Dutta et al. [3] proposed an approach to detect salad leaf disease using Hyper Spectral sensing based on machine learning to acquire different vegetation spectrum profiles and have used portable high - resolution ASD spectroradiometers. And used PCA, LDA classifiers to classify salad leaves affected by disease. Deep learning and robotics have been used by Durmus et al. [4] to detect leaf diseases in tomato plants and the deep learning network architectures used by AlexNet and SqueezeNet. The diseases they examined had physical changes in the leaves and RGB cameras showed these changes. Ashourloo et al. [5] submitted a paper on the detection of Leaf Rust disease using Machine Learning regression techniques and investigated PLSR, v- SVR and GPR regression methods for wheat leaf rust disease detection. Mohanty et al. [6] used deep learning to detect plant disease. They used the neural network of convolution to identify crop species of 14 types and diseases of 26 types. They aim to develop smartphone-assisted disease diagnostics in order to make it better for farmers. Ramcharan et al. [7] have submitted a paper to detect cassava disease caused by viruses using transfer learning that is efficient, fast, affordable and easy to deploy on mobile devices. Nagasubramanian et al. [8] submitted a paper on the identification of carbohydrate disease in soybean by selecting hyper-spectral band. They showed that selected bands gave more information than RGB images and that this can be used in multispectral cameras to identify the disease remotely. Anantrasirichai et al. [9] proposed a system to operate the venous leaves system in conjunction with other low- level features. They have developed primary secondary vein detection and removal of the protrusion clamp to refine the extracted leaf. Ramcharan et al. [10] presented a paper on the mobile deep learning model for the detection of plant diseases. Each disease category was tested with two serious symptoms, mild and pronounced.
Regression system based on the support vector machine to identify and classify five cotton- leaf diseases
Application to help farmers identify problems in rice plants and provide practical advice on detecting and diagnosing diseases in rice plants and prescribing possible control options
#1
#2
Rule-Based algorithm as a classification method to generate rules
Methods
Machine based regression system, Support Vector Machine, Gabor Filter, Median Filter, Raspberry pi Android app
Ref. No Approach –
Datasets
Advantages
Recommendation
The developed mobile Application reacts well with application system provides its functionality information on how to determine the possible causes of diseases such as bacteria, fungi or viruses and information on rice plants with symptom-based malnutrition. And information about rice plants that suffer from symptomsbased malnutrition
(continued)
Including rice plant varieties in the country and including insects and nematodes as factors affecting rice plants to broaden the scope of diagnosis of diseases and to calculate the cost of controlling rice crop problems may be included to help farmers decide which type of fertilizer or chemicals to use
The android app is designed The current system gives – to display disease and sensor 83.26% accuracy for disease information together with detection the relay’s ON/OFF. The app also handles the movement of the entire system from one location to another so that the farmer can move the system to check the soil and control the relay condition
Result
Table 1. Comparison of different leaf disease detection using image processing and artificial intelligence
306 H. Parikshith et al.
PlantVillage dataset
Using deep learning and robotics for the detection of tomato leaves
#4
Deep learning, precision fanning, deep learning network architectures were AlexNet and SqueezeNet
The machine- based approach together with a high- resolution spectroradiometer
#3
Datasets
Methods
Machine learning algorithms – and Hyper Spectral sensing, portable high resolution ASD FieldSpec Spectroradiometer [1] Principal Component Analysis (PCA), MultiStatistics Feature ranking and Linear Discriminant Analysis (IDA) [2] Evaluating the impact of training sample size on the results [3] companions between the performances of SVIs and machine learning techniques
Ref. No Approach
Trained models are tested on the GPU” validation set. AlexNet did a little bit better than SqueezeNet
First, an early evaluation of the effectiveness of the use of a hyperspectroradionieter in the detection of salad leaf disease Secondly, the spectral data collected from field experiments could be used to formulate a problem of multi-classification based on the linear separation of classes
Result
Table 1. (continued) Advantages
SqueezeNet is a good mobile deep learning candidate because of its lightweight and low computing requirements
Machine learning based classification paradigm based on PCA MultiStatistics and LDA data is a highly effective methodology for the rapid detection of salad leaf diseases
Recommendation
(continued)
Leaf extraction study for the completion of the system from the complex background
To capture a variety of salad leaves and related diseased leaves to test the long- term effectiveness of this newly developed detection technology in the context of real- life farming
Leaf Disease Detection using Image Processing and Artificial Intelligence 307
Machine learning techniques have been used to estimate vegetation parameters and detect diseases
Applying deep CNN AlexNet and GoogleNet classification problems
Application of deep CNN transmission learning for cassava image data set using the latest version of the initial model based on GoogleNet
#5
#6
#7
Cassava leaf images were captured and screened for co- infections to limit the number of images with multiple diseases
[l] Dataset description: Analysis of dataset [2] Measurement of performance: Run experiments across training dataset
Methods
Machine Learning, partial least square regression (PLSR), v support vector regression(v-SVR), and Gaussian process regression (GPR) methods for wheat leaf rust disease detection, spectroradio meter in the electromagneticregion of 350 to 2500 nm
Ref. No Approach
Datasets
Cassava dataset, Leaflet cassava dataset
ImageNet dataset, PlantVillage dataset
Spectral dateset, ground truth dataset
Advantages
–
This study shows that the training data set of GPR results in greater precision than other implemented methods
The overall accuracy of the – original casavva data set ranged from 73% to 91%. The leaf let cassava data set the accuracy ranged from 80 to 93%
Former AlexNet and GoogleNet were better equipped to train publicly available data set images
The experiment was conducted under controlled conditions to study the various effects of symptoms on the leaf reflection
Result
Table 1. (continued) Recommendation
(continued)
The aim is to develop mobile applications that allow Technicians to monitor:the prevalence of disease quickly
A more diverse set of training data is needed to improve accuracy. Since many diseases do not occur on the upper side of the leaves, images from many different perspectives should be obtained
In order to be used in the field, PLSR, _-, GPR must be tested on different sensors and different types of wheat
308 H. Parikshith et al.
Used genetic algorithm as an optimizer and classifier for support vector machines
The leaf isolation closest to the camera as this is not obscured by other leaves and lesioma segmentation is further applied to the leaf segment
Use the model on a mobile app and test its performance on mobile images and videos of 720 leaflets with disease
#8
#9
#10
Datasets –
Convolutional neural COCO [Common Objects in network and SSD(Single context) dataset. Shot Detection) algorithm as Cassava leaf dataset object detection algorithm
[1] Leaf extraction – [2] Leaf structure refinement [3] Segmentation [4] Morphological processing
Methods
Used hyperspectral image acquisition where hyperspectral line scanning imager was used for imaging and also used charcoal rot rating protocol
Ref. No Approach
The CNN(Convolutional Neural Network) model achieves accuracy of 80.6 ± 4.10% for symptomatic leaves and 43.2 ± 20.4% for mild symptomatic leaves
The proposed method is tested with 100 random leaf images with an average accuracy and a reminder of 0.92 and 0.90
A binary classification of healthy and infected samples achieved 97% accuracy and a F1 score of 0.97 for the infected class
Result
Table 1. (continued) Advantages
Recommendation
–
Develop an automatic lesion segmentation and classification of the leaf disease
Field inoculations of various Use field inoculations and soybean genotypes are assessments to increase this imaged using a multispectral technology camera with the selected wavebands from the GASVM model for the early identification of carbohydrate disease to understand the resistance of specific genotypes to the disease. This study provides an efficient method for selecting the most valuable wavebands from hyperspectral data for early detection of disease
Leaf Disease Detection using Image Processing and Artificial Intelligence 309
310
H. Parikshith et al.
3 Comparison of Different Leaf Disease Detection Using Image Processing and Artificial Intelligence The Table 1, gives us a picturesque idea about the methodologies used by various authors in the field of leaf disease detection using image processing and artificial intelligence. It also gives the list of recommendations which we thought, could have been implemented in the system in future.
4 Conclusion Food losses from pathogens caused by crop infections are persistent problems in agriculture throughout the world for centuries. In order to prevent and detect these plant diseases early, this paper describes disease detection methods, such as training a CNN model based on AlexNet on the plant image data set. Other methods include vector regression support technique, transfer learning, a genetic algorithm consisting of hyperspectral band selection and morphological leaf processing used to train the network. The present system, which includes the above techniques used for the detection of diseases in plants, proves its efficiency in the detection and control of leaf disease by improving crop production for farmers. Acknowledgements. The authors express gratitude towards the assistance provided by Accendere Knowledge Management Services Pvt. Ltd. In preparing the manuscripts. We also thank our mentors and faculty members who guided us throughout the research and helped us in achieving desired results. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest. ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Sarangdhar, A.A., Pawar, P.V.R., Blight, A.B.: Machine learning regression technique for using IoT. In: International Conference on Electronics, Communication and Aerospace Technology, pp. 449–454 (2017) 2. Morco, R.C., Bonilla, J.A., Corpuz, M.J.S., Angeles, J.M.: e-RICE: an expert system using rule-based algorithm to detect, diagnose, and prescribe control options for rice plant diseases in the Philippines. In: CSAI 2017, pp. 49–54 (2017) 3. Dutta, R., Smith, D., Shu, Y., Liu, Q., Doust, P., Heidrich, S.: Salad leaf disease detection using machine learning based hyper spectral sensing. In: IEEE SENSORS 2014 Proceedings, pp. 511–514 (2014) 4. Durmu, H., Olcay, E., Mürvet, K.Õ.Õ.: Disease detection on the leaves of the tomato plants by using deep learning. In: 6th International Conference on Agro-Geoinformatics (2017) 5. Ashourloo, D., Aghighi, H., Matkan, A.A., Mobasheri, M.R., Rad, A.M.: An investigation into machine learning regression techniques for the leaf rust disease detection using hyperspectral measurement. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 9, 4344–4351 (2016)
Leaf Disease Detection using Image Processing and Artificial Intelligence
311
6. Mohanty, S.P., Hughes, D.P., Salathé, M.: Using deep learning for image-based plant disease detection. Front. Plant Sci. 7, 1419 (2016) 7. Ramcharan, A., Baranowski, K., Mccloskey, P., Legg, J., Hughes, D., Hughes, D.: Using transfer learning for ımage-based cassava disease detection. Frontiers (Boulder) 1–10 (2017) 8. Nagasubramanian, K., Jones, S., Sarkar, S., Singh, A.K.: Hyperspectral band selection using genetic algorithm and support vector machines for early identification of charcoal rot disease in soybean. Arxiv. 1–20 (2017) 9. Anantrasirichai, N., Hannuna, S., Canagarajah, N.: Automatic leaf extraction from outdoor images. Arxiv. 1–13 (2017) 10. Ramcharan, A., Mccloskey, P., Baranowski, K., Mbilinyi, N., Mrisho, L.: Assessing a mobilebased deep learning model for plant disease surveillance. Arxiv (2018)
Deep Learning Model for Image Classification Sudarshana Tamuly(&), C. Jyotsna, and J. Amudha Department of Computer Science and Engineering, Amrita School of Engineering, Bengaluru, Amrita Vishwa Vidyapeetham, Bengaluru, India [email protected], {c_jyotsna,j_amudha}@blr.amrita.edu
Abstract. Starting from images captured on mobile phones, advertisements popping up on our internet browser to e-retail sites displaying flashy dresses for sale, every day we are dealing with a large quantity of visual information and sometimes, finding a particular image or visual content might become a very tedious task to deal with. By classifying the images into different logical categories, our quest to find an appropriate image becomes much easier. Image classification is generally done with the help of computer vision, eye tracking and ways as such. What we intend to implement in classifying images is the use of deep learning for classifying images into pleasant and unpleasant categories. We proposed the use of deep learning in image classification because deep learning can give us a deeper understanding as to how a subject reacts to a certain visual stimuli when exposed to it. Keywords: Image classification Computer vision Eye tracking Heat map Deep learning
1 Introduction Image classification refers to the process of categorizing images logically and methodically into sub groups. This grouping is done depending upon the traits exhibited by the images, with the images sharing common traits being stacked into one subgroup. This indexing of images into logical groups is very helpful because by doing this, maintaining of databases which deals with images becomes much easier. Companies such as giant e-commerce sites, websites delivering wallpapers and likewise benefit greatly from image classification. Till now, computer vision and eye tracking were the notable methods used for the classification of images. However, we propose the use of deep learning for the classification of images. Deep learning employs different machine learning algorithms to get a clearer picture of how a subject is reacting to a given visual stimuli. It helps us get a deeper understanding of the various subtle interactions going on between the subject and the visual stimuli displayed which is something we can’t accurately monitor had we just used computer vision or eye tracking. Extraction of multiple features from each layer of the neural network of an image is possible with the help of deep learning algorithms. One popular deep learning algorithm which goes by the name of Convolutional Neural Networks (CNN) is extensively used because it provides high image classification accuracy. Following a © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 312–320, 2020. https://doi.org/10.1007/978-3-030-37218-7_36
Deep Learning Model for Image Classification
313
hierarchical model, CNN is laid out like neurons which are built on a network resulting in the formation of a layer which is fully connected. The main objective of this paper was to develop a deep learning model for classifying images using VGG-16 architecture. The system overview of this architecture is depicted in Sect. 3 including the stimuli used and the methodology adopted in Sects. 3.1 and 3.2 respectively. Section 4 contained results from the tests which were conducted using the VGG-16 architecture. Three image datasets were considered for these tests, namely: IAPS, NUSEF and MIT300 and the results recorded were compared with those obtained by using only computer vision and eye tracking for image classification. With the help of deep learning and tools associated with it, this paper tries to develop a method to automatically categorize pictures as pleasant, non-pleasant and neutral. For doing this, the first challenge will be to categorize which eye feature gets triggered/activated upon exposure to different scenes or images. Different images will result in the eye having a different response to it. Tools such as TensorFlow, Keras and the VGG-16 architecture were used for analyzing and classifying the test images according to the three categories which are pleasant, unpleasant and neutral.
2 Related Works Image classification can be done by the application of deep learning algorithm such as Convolutional Neural Network (CNN) [1–3]. CNN was tested on UC Merced Land Use Dataset and SUN database [1], and in turn each dataset was tested for different number of iterations. In total 10 experiments were carried out: face vs. Non-face image classification, dense residential vs. Forest, dense residential vs. Agricultural, constructed regions vs. Green regions, agriculture vs. Forest, building vs. Dense residential, classification from large aerial images, garden scene vs. Street scene, road scene classification and war scene classification. Classification accuracies were found to be very high in the experiments where the two classes were very distinct. In experiments such as building vs. dense residential, agricultural vs. forest, the classification accuracies were found to be low because the set of images were similar. So, fairly good classification accuracy was achieved using the CNN. A miniature CNN was built for functioning well on a CPU [2]. A subset of Kaggle Dog- Cat Dataset was used to train the network. Here an image classifier which identifies and separates the images of dog and cat was build based on CNN. For training and testing, 2000 and 600 images were used respectively. The whole dataset was trained for 50 iterations. Tensorflow was used to classify the input images as dog or cat. The classification accuracy was found to be 75%. So comparatively Convolutional Neural Network worked well on CPU. MNIST and CIFAR-10 datasets were trained using CNN for image classification [3]. Different optimization algorithm and methods of learning rate were analyzed. Errors can be reduced to a greater extent with the help of Stochastic Gradient Descent (SGD) by increasing the number of iterations. The results after applying CNN on the MNIST dataset was not that high compared to existing systems, but it imposes less computational cost and structure is very simple. Convolutional neural network (CNN) have proven to be particularly useful in tasks involving the use of computer vision like detection, classification and segmentation of
314
S. Tamuly et al.
objects and likewise. Extensive use of CNN is seen amongst medical researchers nowa-days to diagnose diseases such as Alzheimer’s. For the classification of the brain MRI scans, a transfer learning based mathematical model PFSECTLis used [4]. ImageNet database trained VGG-16 architecture (which is used in the CNN structure) is used as the feature extractor. To predict the amount of ripening of dates, CNN was used [5]. The architecture used was based on VGG-16 and the CNN model also had a classifier block which contained max-pooling, dropout, batch normalization, and dense layers. The image dataset with which the model was trained contained four categories of dates namely Rutab, Tamar, Khalal and defective dates. The CNN model achieved 96.98% classification accuracy overall. The conclusion drawn was that classification methods employing CNN performed much better when compared to the traditional methods of classification. A system to detect vehicles just with aerial images and not with the intervention of a human operator was proposed [6]. Feature extraction capabilities of VGG-16 and Alexnet neural network models was compared with classical feature extraction techniques like Histogram of Oriented Gradients and Singular Value Decomposition. The problem was converted to a binary class by segregating all the vehicles as one class and the rest as non-vehicles. On the VEDAI dataset that was used, features extracted using neural network achieved a classification accuracy of 99%. To improve the accuracy of classification when classifying art collections, fine art objects and items likewise, a two stage approach to image classification was proposed [7]. Six different CNNs namely ResNet-50, VGG-16, VGG-19, GoogleNet, AlexNet and Inceptionv3 were pre-trained and used as the first stage classifiers and for the second stage, a shallow neural network was used. Individual patches were the factor which influenced the categorization of images in the first stage while the second stage further categorized the images based on their artistic style. Here, instead of using images, the second stage uses probability vectors and is trained independently on the first stage making it very effective in classifying images as well as compensating for any mistakes done in the first stage while classifying images. Three datasets were used in the experiment performed and in all three, the proposed method showed impressive results in classifying images. Also, eye tracking technology was employed in various other fields such as in the development of applications, evaluating stress levels among others [15–18].
3 System Overview A deep learning network in the form of the VGG-16 architecture was employed for the classification of images. This is because by using deep learning architectures, a higher degree of accuracy in the classification of images can be obtained and also it has been known to work better with large datasets. One of the wider known techniques employed for the classification of images is known as Convolutional Neural Networks (CNN). The overview of the system is represented in Fig. 1. VGG-16 neural network which has convolution layers of size 3*3, max pooling layers of size 2*2 and fully connected layers at end which make up a total of 16 layers
Deep Learning Model for Image Classification
315
Fig. 1. System overview
was the deep learning architecture used here. RGB image of size 224*224*3 is the form of input here. A representation of the various layers of VGG-16 architecture is shown in Fig. 2. To identify visual patterns directly from pixel images with very little preprocessing, Convolutional Neural Networks is used. It contains two layers (input and output) including several other layers which are concealed. These hidden layers comprise of pooling, convolutional and normalization layers, RELU layer i.e. activation function and fully connected layers. During the training period, an RGB image of fixed size (224 224) is used as the input to the ConvNets. For preprocessing, subtraction of only the mean RGB value is done from each of the pixels from the computed training set. Filters with receptive field size of 3 3 are used in the stack of convolutional layers through which images are passed. To decrease the spatial size of the representations progressively is the work of the pooling layer. The reduction is done to bring down the computation and parameter amount in the network. Also, the operation of a pooling layer on each feature map is independent. Out of max pooling and average pooling, it is seen that max pooling is more preferred. Three layers which are Fully-Connected (FC) follow a stack of convolutional layers and its arrangement is same in all networks. 4096 channels are present on both the first two channels, 1000 channels (one for each class) is present on the third since it performs 1000-way ILSVRC classification. The last layer is the soft-max layer. The two classes into which the images are classified into are as follows: pleasant and unpleasant. After the splitting of the images into training and testing, 1484 images considered as training samples and 637 images as test samples.
316
S. Tamuly et al.
Fig. 2. Layout of VGG-16 architecture.
3.1
Stimuli
The stimuli included images from IAPS, MIT-300 and NUSEF datasets. IAPS which stands for “International Affective Picture System” is a picture database which is widely used in psychological research. The IAPS dataset has total 1037 images out of which 614 images were pleasant and 423 images were unpleasant. NUSEF dataset consists images of various orientations, scale, illumination among others other factors. It has 329 pleasant and 102 unpleasant images. MIT300 which contains 300 natural indoor and outdoor images is used as a benchmark test set as because it was the first data set with held-out human eye movements. So a total number of 2121 images were used as the stimuli containing 1484 pleasant and 637 unpleasant images. 3.2
Methodology
A deep learning network is used for the image categorization. For the classification of images, a deep learning network can be employed. This is because by deep learning architectures, a higher degree of accuracy in the classification of images can be obtained and also it has been known to work better with large datasets. One of the wider known techniques employed for the classification of images is known as Convolutional Neural Networks (CNN).To identify visual patterns directly from pixel images with very little preprocessing, Convolutional Neural Networks is used which is a kind of multi-layered neural network. It contains two layers (input and output) including several other layers which are concealed. These hidden layers comprise of pooling layers. (224 224) is used as the input to the ConvNets. For preprocessing, subtraction of only the mean RGB value is done from each of the pixels from the computed training set. Filters with receptive field size of 3 3 are used in the stack of
Deep Learning Model for Image Classification
317
convolutional layers through which images are passed. The two classes into which the images are classified into are as follows: pleasant and unpleasant. After the splitting of the images into training and testing, 1484 images considered as training samples and 637 images as test samples. From the eye tracking technology, heat maps and focus maps were extracted. Similar eye features were considered for image classification using machine learning techniques [12]. The focus map as shown in Fig. 3(b) displays the areas which are given less visual attention by the user and thus gives us an idea about the regions which were not seen by the user. Heat maps as shown in Fig. 3(b) is another representation where evaluation is done based upon the frequency of focus of the user’s gaze on different regions of an image. The regions which have been gazed more are known as “hot zones”. For eye tracking studies, heat maps are one of the widely used visualization technique.
Fig. 3. (a) Pleasant image [4]. (b) Heat map, (c) Focus map
4 Result and Analysis A deep learning architecture was applied on the IAPS dataset, the accuracy and the loss was calculated. The architecture used was a VGG-16 architecture. The training accuracy achieved after applying on the 1037 images was 61% and the testing accuracy was 54%. As seen in Fig. 4, the accuracy was calculated for both training and testing samples across 10 epochs.
Fig. 4. Accuracy graph for IAPS dataset
318
S. Tamuly et al.
Again the architecture was applied on the IAPS, MIT-300 and NUSEF datasets, the accuracy and the loss was calculated. The architecture used was a VGG-16 architecture. The training accuracy achieved after applying on the 2121 images was 71% and the testing accuracy was 65%. As seen in Fig. 5, the accuracy was calculated for both training and testing samples across 5 epochs. Training accuracy is the accuracy we get on training data and test accuracy is the accuracy of a model on images which are new and not seen earlier while training, while.
Fig. 5. Accuracy graph for IAPS, NUSEF and MIT-300
After the extraction of the focus maps and heat maps from the 20 images and for 11 participants; out of 220 images, 110 were pleasant and 110 were unpleasant. On these 220 images deep learning architecture was applied and the corresponding accuracy and loss graphs were achieved. The training accuracy was found to be 88% and testing accuracy was found to be 80% as shown in Fig. 6.
Fig. 6. Accuracy graph
Deep Learning Model for Image Classification
319
The proposed model of VGG-16 architecture was applied on the IAPS dataset. And the training accuracy achieved after applying on IAPS dataset of 1037 images was 61% and the testing accuracy was 54%. Again it was applied on the IAPS, NUSEF and MIT-300 datasets and the achieved training accuracy was 71% and the testing accuracy was 65%. Lastly, the training accuracy achieved after applying on 20 images from IAPS dataset and its corresponding heat and focus maps of 220 images was 88% and the testing accuracy was 80%. The future enhancement in this project can be determining the derived parameters and using those, the classification of the images can be done in an efficient way. Derived parameters from the eye features like saccade slope and saliency map can be considered to enhance the chances for obtaining a better classification.
5 Conclusion and Future Scope It can be concluded from the various studies done till now that deep learning algorithms can be used for the better classification of images. A model was proposed with VGG-16 architecture and applied on the IAPS dataset and the training accuracy achieved after applying on IAPS dataset of 1037 images was 61% and the testing accuracy was 54%. Again it was applied on the IAPS, NUSEF and MIT-300 datasets and the achieved training accuracy was 71% and the testing accuracy was 65%. Lastly, the training accuracy achieved after applying on 20 images from IAPS dataset and its corresponding heat and focus maps of 220 images was 88% and the testing accuracy was 80%. The future enhancement in this project can be determining the derived parameters and using those, the classification of the images can be done in an efficient way. Derived parameters from the eye features like saccade slope and saliency map can be considered to enhance the chances for obtaining a better classification. Compliance with Ethical Standards. ✓ All authors declare that there is no conflict of interest. ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Jaswal, D., Vishvanathan, S., Kp, S.: Image classification using convolutional neural networks. Int. J. Adv. Res. Technol. 3(6), 1661–1668 (2014) 2. Shanmukhi, M., Lakshmi Durga, K., Mounika, M., Keerthana, K.: Convolutional neural network for supervised image classification. Int. J. Pure Appl. Math. 119(14), 77–83 (2018) 3. Guo, T., Dong, J., Li, H., Gao, Y.: Simple convolutional neural network on image classification. In: 2017 IEEE 2nd International Conference on Big Data Analysis (ICBDA), Beijing, pp. 721–724 (2017) 4. Jain, R., Jain, N., Aggarwal, A., Jude Hemanth, D.: Convolutional neural network based Alzheimer’s disease classification from magnetic resonance brain images. Cogn. Syst. Res. 57, 147–159 (2019)
320
S. Tamuly et al.
5. Nasiri, A., Taheri-Garavand, A., Zhang, Y.-D.: Image-based deep learning automated sorting of date fruit. Postharvest Biol. Technol. 153, 133–141 (2018) 6. Mohan, V.S., Sowmya, V., Soman, K.P.: Deep neural networks as feature extractors for classification of vehicles in aerial imagery. In: 2018 5th International Conference on Signal Processing and Integrated Networks (SPIN), Noida, pp. 105–110 (2018) 7. Sandoval, C., Pirogova, E., Lech, M.: Two-stage deep learning approach to the classification of fine-art paintings. IEEE Access 7, 41770–41781 (2019) 8. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the NIPS, pp. 1097–1105 (2012) 9. Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database. In: Proceedings of the NIPS, Montreal, QC, Canada, pp. 487–495 (2014) 10. Rosenblum, M., Yacoob, Y., Davis, L.S.: Human expression recognition from motion using a radial basis function network architecture. IEEE Trans. Neural Netw. 7, 1121–1138 (1996) 11. Xin, M., Wang, Y.: Research on image classification model based on deep convolution neural network. EURASIP J. Image Video Process. (2019) 12. Tamuly, S., Jyotsna, C., Amudha, J.: Tracking eye movements to predict the valence of a scene. In: 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT) (2019, presented) 13. Cetinic, E., Lipic, T., Grgic, S.: Fine-tuning convolutional neural networks for fine art classification. Expert Syst. Appl. 114, 107–118 (2018) 14. Chu, W.-T., Wu, Y.-L.: Image style classification based on learnt deep correlation features. IEEE Trans. Multimed. 20(9), 2491–2502 (2018) 15. Venugopal, D., Amudha, J., Jyotsna, C.: Developing an application using eye tracker. In: IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT). IEEE (2016) 16. Jyotsna, C., Amudha, J.: Eye gaze as an indicator for stress level analysis in students. In: International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 1588–1593. IEEE (2018) 17. Pavani, M.L., Prakash, A.B., Koushik, M.S., Amudha, J., Jyotsna, C.: Navigation through eye-tracking for human–computer interface. In: Information and Communication Technology for Intelligent Systems, pp. 575–586. Springer, Singapore (2019) 18. Uday, S., Jyotsna, C., Amudha, J.: Detection of stress using wearable sensors in IoT platform. In: Second International Conference on Inventive Communication and Computational Technologies (ICICCT). IEEE (2018)
Optimized Machine Learning Approach for the Prediction of Diabetes-Mellitus Manoj Challa1(&) and R. Chinnaiyan2 1
2
Department of Computer Science and Engineering, CMR Institute of Technology, Bengaluru, India [email protected] Department of Information Science and Engineering, CMR Institute of Technology, Bengaluru, India [email protected]
Abstract. Diabetes is one of the most common disorders in this modern society. In general, Diabetes-mellitus refers to the metabolic disorder by means of malfunction in insulin secretion and action. The proposed optimized machine learning models both decision tree and random forest models presented in this paper must predict the diabetes mellitus based the factors like BP, BMI and GL. The results build from the data sets are more precise, crisp and can be applied for health care sectors. This proposed model is more suitable for optimized decision making in health care environment. Keywords: Diabetes Metabolic disorder learning Health care
Insulin Malfunction Machine
1 Introduction In recent times, Machine Learning methods are often used in diabetes prediction. Decision tree is one of the machine learning methods in health care sector, which has the highest accuracy of classification. Random forest generates many decision trees. In this research paper optimized decision tree and random forest (RF) machine learning methods are adopted for predicting diabetes mellitus.
2 Relate Works In this section we will be carrying out a systematic literature review of existing literatures related to diabetes prediction. According Kumari and Singh [4] clearly explained the hi-tech approaches for prediction of diabetes Mellitus. They emphasized the role of data mining and how effectively it helps to predict that disease. The role of software like MATLAB and neural networks used to build the model to detect the whether the person is suffering from diabetics or not. As per Sankaranarayanan and Pramananda Perumal [6] mentioned that the patient primary datasets can be processed for predicting the diabetes. They also referred that rule classification and decision tree being used as primary method for their prediction model. © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 321–328, 2020. https://doi.org/10.1007/978-3-030-37218-7_37
322
M. Challa and R. Chinnaiyan
3 Architecture The below diagram (Fig. 1) explains the high-level framework of building and processing data model for diabetes. • • • •
Data Data Data Data
Source Munging Processing Modeling
Fig. 1. High level framework
3.1
Data Source
The data may be in either direct database or flat file which can feed the data analytic tool. 3.2
Data Munging
The process of cleansing the data prior to analysis is called data munging. It helps to remove the anomalies and data quality errors before we building the model. 3.3
Data Processing
In this process we need to normalize and do label encoding (transform the data from non-numerical or categorical to numerical labels). 3.4
Data Modeling
This process is an integration of mining and probability for classifying the labels based on pre-determined labels. The models are composed of predictors, which the attributes which may influence future outputs. Once the dataset is procured, the applicable forecasters and numerical models will be formulated.
4 Proposed Methodology 4.1
Decision Tree
Decision Tree is one of the very wide spread machine learning approach. Decision Tree is a Supervised Machine Learning algorithm which resembles like an inverted tree. Every node in tree represents a predictor variable (feature), link between the nodes denotes a Decision and each leaf node denotes an outcome (response variable).
Optimized Machine Learning Approach
323
Optimized Decision-Tree Algorithm INPUT: Diabetes Database OUTPUT: Optimized Decision Tree for Diabetes Prediction Decision-Tree: 1: Diabetes dataset is initially processed using R tool 2: The Missing values are replaced and Normalization has been done. 3: The Processed diabetes data is moved via FS where the unnecessary attributes are removed. 4: The final processed dataset is uploaded in R studio 5: The Decision Tree algorithm is executed 6: Model Creation is done.
Fig. 2. Flowchart for optimized decision tree
4.2
Random Forest
Random Forest is classification approach that comprises of several decision trees. Random Forests can be used when the problem is having enormous volume of training datasets and input variables.
324
M. Challa and R. Chinnaiyan
Optimized Random Forest Algorithm INPUT: Diabetes Database OUTPUT: Optimized Random Forest for Diabetes Prediction Random Forest (RF) 1: Diabetes dataset mine fresh sample set with Bootstrap technique repetitive for N times 2: Construct a decision-tree with results from step 1. 3: Repeat steps 1, 2 and outputs in many trees comprise a random-forest. 4: Assume each tree from forest, vote for Xi. 5: Compute average votes for each class and class who is having more number of votes is classification label for X. 6: Accuracy of RF is percentage of correct classification
Fig. 3. Flowchart for optimized random forest
4.3
Data Set
The Pima Indians Diabetes Data set is imported from Machine Learning Repository and R Programming Tool is used for Implementation of Classification. The number of patients (n) in the database is 768, each with 9 attributes as depicted in Table 1.
Optimized Machine Learning Approach
325
Table 1. Pima Indian Diabetic data set attributes Attribute Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigree Age Outcome
(See Table 2). Table 2. Min, Max, Mean and Median of PID dataset Attribute 1 2 3 4 5 6 7 8
Minimum 0 0 0 0 0 0 0.780 21
Maximum 17 199 122 99 84 67.10 2.4200 81
Mean 3.845 120.9 69.11 20.54 7 31.99 0.4719 33.24
Median 3.000 117.0 72.00 23.00 3 32.00 0.3725 29.00
5 Results and Discussion The classifiers performances are assessed by accuracy, sensitivity, specificity and were calculated by following formulas Accuracy ¼ TP þ TN=TP þ TN þ FP þ FN: Sensitivity ¼ TP=TP þ FN: Specificity ¼ TN=TN þ FP Confusion matrix performance of various machine learning classification algorithms on the Pima dataset is shown in the Table 3 and are depicted in Fig. 2.
326
M. Challa and R. Chinnaiyan Table 3. Confusion Matrix of PID dataset Confusion matrix Decision tree Positive Negative SVM Positive Negative KNN Positive Negative Radom forest Positive Negative
Predicted class Positive Negative 242 26 131 369 239 29 142 358 208 60 60 387 417 83 110 158
Fig. 4. Confusion matrix
The performance measures of various machine learning classification algorithms on the Pima dataset is shown in the Table 4 and are depicted in Figs. 3 and 4.
Optimized Machine Learning Approach Table 4. Accuracy, Sensitivity and Specificity of PID dataset Classification algorithms Decision tree SVM KNN Random forest
Accuracy 78.25 77.73 77.47 77.90
Sensitivity 65.31 43.75 64.10 71.3
Specificity 80.00 77.33 78.90 79.90
(See Figs. 5 and 6).
Fig. 5. Sensitivity and specificity
Fig. 6. Accuracy of classification approaches
327
328
M. Challa and R. Chinnaiyan
6 Future Scope and Conclusion The Accuracy for classification models Decision Tree, SVM, KNN and Random Forest were done and it is found that 78.25 for Decision Tree, 77.73 for SVM, 77.47 for KNN and 77.90 for Random Forest. Thus the Decision Tree and Random Forest Classification Models were chosen as the best models for predicting the occurrence of diabetes in PIMA database. The benefit of this optimized machine learning model is suitable for patients with diabetes mellitus and it can be applied for all the fields of health care environments. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest. ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. National Diabetes Information Clearinghouse (NDIC). http://diabetes.niddk.nih.gov/dm/ pubs/type1and2/#signs 2. Global Diabetes Community. http://www.diabetes.co.uk/diabetes_care/blood-sugar-levelranges.html 3. Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann Publishers, Burlington (2001) 4. Kumari, S., Singh, A.: A data mining approach for the diagnosis of diabetes mellitus. In: Proceedings of Seventh International Conference on Intelligent Systems and Control, pp. 373–375 (2013) 5. Velu, C.M., Kashwan, K.R.: Visual data mining techniques for classification of diabetic patients. In: 3rd IEEE International Advance Computing Conference (IACC) (2013) 6. Sankaranarayanan, S., Pramananda Perumal, T.: Predictive approach for diabetes mellitus disease through data mining technologies. In: World Congress on Computing and Communication Technologies, pp. 231–233 (2014) 7. Ganji, M.F., Abadeh, M.S.: Using fuzzy ant colony optimization for diagnosis of diabetes disease. In: Proceedings of ICEE 2010, 11–13 May 2010 (2010) 8. Jayalakshmi, T., Santhakumaran, A.: A novel classification method for diagnosis of diabetes mellitus using artificial neural networks. In: International Conference on Data Storage and Data Engineering, pp. 159–163 (2010) 9. Kumari, S., Singh, A.: A data mining approach for the diagnosis of diabetes mellitus. In: Proceedings of 71st lnternational Conference on Intelligent Systems and Control (ISCO 2013) (2013) 10. Bhargava, N., Sharma, G., Bhargava, R., Mathuria, M.: Decision tree analysis on J48 algorithm for data mining. Proc. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 3(6) (2013) 11. Feld, M., Kipp, M., Ndiaye, A., Heckmann, D.: Weka: practical machine learning tools and techniques with Java implementations 12. White, A.P., Liu, W.Z.: Technical note: bias in information-based measures in decision tree induction. Mach. Learn. 15(3), 321–329 (1994)
Performance Analysis of Differential Evolution Algorithm Variants in Solving Image Segmentation V. SandhyaSree(&) and S. Thangavelu Department of Computer Science and Engineering, Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, Coimbatore, India [email protected], [email protected]
Abstract. Image segmentation is an activity of dividing an image into multiple segments. Thresholding is a typical step for analyzing image, recognizing the pattern, and computer vision. Threshold value can be calculated using histogram as well as using Gaussian mixture model. but those threshold values are not the exact solution to do the image segmentation. To overcome this problem and to find the exact threshold value, differential evolution algorithm is applied. Differential evolution is considered to be meta-heuristic search and useful in solving optimization problems. DE algorithms can be applied to process Image Segmentation by viewing it as an optimization problem. In this paper, Different Differential evolution (DE) algorithms are used to perform the image segmentation and their performance is compared in solving image segmentation. Both 2 class and 3-class segmentation is applied and the algorithm performance is analyzed. Experimental results shows that DE/best/1/bin algorithm out performs than the other variants of DE algorithms Keywords: Image segmentation evolution Mutation strategies
Gaussian mixture model Differential
1 Introduction Image segmentation is an activity of subdividing an image into the constituent parts. The reason for subdividing an image is to detect an object which will be useful for the highlevel process. Image segmentation is important study in pattern recognition and computer vision that is useful in understanding the image. The tonal distribution of a digital image can be represented graphically by Image histogram [1]. Thresholding is the fundamental task to achieve segmented image. Determination of threshold point is found to minimize the average segmentation error. Thresholding can be either Bi-level or Multi-level [2]. In Bi-level thresholding, segmentation of an image is performed based on the selected limit value. Multilevel thresholding isolates the given image into different regions and determines more than one threshold value [3]. Gaussian mixture model is a clustering algorithm that is used to group the same pixels and each group is considered as a class [4]. In recent years, Differential Evolution algorithms are useful in solving problems in various domains and many attempts made to improve the algorithm performance [5, 6]. © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 329–337, 2020. https://doi.org/10.1007/978-3-030-37218-7_38
330
V. SandhyaSree and S. Thangavelu
Differential Evolution is a population centric, efficient and robust searching method. In this paper segmenting an image is performed using DE algorithm. It is selected for the faster convergence towards the threshold point. Different DE algorithms are used to perform the image segmentation and the performance of those algorithms in image segmentation process is analyzed in this paper. Segmenting the foreground and background objects in an image is achieved through bi-level thresholding. Traditional approaches and intelligent approaches have some problems [7] when expanding their schemes to multi-level thresholding. The DE approach resolves these problems in finding the threshold values. DE algorithm completely depends on the mutation operation. DE has faster convergence rate when compared to other evolutionary algorithm. Binomial and Exponential are the two crossover methods of DE. Binomial and Exponential can be called as uniform method and two point modulo respectively [8, 9]. Performance of the binomial method is based on uniformly distributed variables and randomly generated numbers between 0 and 1 and this should be less than or equal to fixed predefined parameter called Crossover rate (Cr). In this case the mutant vector content copied into trial vector. Otherwise, the target vector content is copied into trial vector. This scheme has been used in this paper Eq. (1.6). Exponential crossover, copy the content of mutant vector into trial verctor as long as the generated random number is less than the Cr. If it is not so, thereafter the rest of the contents from target vector is copied into trial vector [10]. Parameter that ends up with the critical problem while minimizing the squared and nature of equations are non-linear. Since analytical solution is not available, So DE algorithm uses numerical method which has grade information based on iterative approach. Nevertheless accepting the gradient descent search method has the problem to fall into local minima, but the initial values play a major role in finding the final solution. Evolutionary based algorithm provides gratification performance in solving the various problems of image processing. Hence finding the suitable parameters and mapping them to their respective threshold values is achieved through DE algorithm. DE algorithms are used for solving variety of applicants in image processing [11], Data mining etc. The rest of the sections are described as follows. Various concepts of Image segmentation related work is described in Sect. 2 followed by the proposed work at Sect. 3. Results and Analysis are discussed in Sect. 4 and finally Sect. 5 conclude the work.
2 Related Work There are various segmentation techniques in image processing [12, 13]. Those techniques are accommodated into two categories such as Contextual technique and Noncontextual technique. In contextual technique includes different types viz. Simple thresholding, Adaptive thresholding and Colour thresholding. Non-contextual technique types includes Pixel segmentation, Region segmentation, Edge segmentation and Model segmentation. Region segmentation again has two types [14] such as Region growing, Region splitting and Merging. Using machine learning, Segmentation can be
Performance Analysis of Differential Evolution Algorithm Variants
331
achieved by supervised and unsupervised algorithm where K-means clustering algorithm is widely used [15]. Image segmentation falls under two approaches. Discontinuity based and Similarity based. Discontinuity approach is based on intensity level and gray level. This approach is mainly interested in identification of isolated points, lines, and edges and all these together is called as masking and this depends on the coefficient values which is given and based on that decision is taken where operation is performed. In Similarity based approach grouping is done for those pixels which are similar [16]. Thresholding is considered as pre-processing step for image segmentation. Automatic thresholding is an iterative process, where image will be divided into two groups, the first is based on intensity values which are similar to each other and other will be in different group [17]. Based on these groups mean, standard deviation, and variance is calculated. To attain optimal threshold, Probability Distribution Function (PDF) model has fit to histogram data using mean-squared error approach. hð g Þ ¼
XL1 ng ; hðgÞ 0; N ¼ n g¼0 g N
and
XL1 g¼0
hð gÞ ¼ 1
ð1:1Þ
In image processing the tonal distribution of the image has L gray levels [0 … L − 1]. Considering the histogram as a PDF. Normalization of histogram is applied to each pixels in image. The histogram h(g) can be contained in a mix of Gaussian mixture model in this way as defined by [18] pð x Þ ¼
K X i¼1
Pi piðxÞ ¼
K X i¼1
" # Pi ðx li Þ2 pffiffiffiffiffiffiffiffiffi exp 2r2i 2pri
ð1:2Þ
In Eq. (1.2), K represents the total number of classes in the image, i = 1, …, k, the priori probability Pi of class i, the PDF of gray-level random variable x, in class i is pi (x). li is the mean and ri is the standard deviation of the ith PDF. In addition, Pk i¼1 Pi ¼ 1 must be contended. The estimation of the three parameters ðPi ; li ; ri Þ, i = 1, …, K classes is done by using Mean square error. The mean square error between pðxj Þ, the Gaussian mixture model function and hðxj Þ, the histogram function of the experimental values is defined in Eq. (1.3). [19] ! X n K 2 1X E¼ p xj hðxj Þ þ x Pi 1 i¼1 n i¼1
ð1:3Þ
P Here x is penalty related with the constraint ki¼1 Pi ¼ 1. For every candidate of a population, the above formula is applied, to calculate the fitness value. DE is an evolutionary algorithm. It starts with an initial population of size Np (vectors). The dimensional vector values are distributed randomly between the two
332
V. SandhyaSree and S. Thangavelu
boundaries, lower initial predefined parameter bound xj;low and the upper initial predefined parameter bound xj;high . xj;i;t ¼ xj;low þ rand ð0;1Þ ðxj;high xj;low Þ; j ¼ 1; 2; . . .D; i ¼ 1; 2; . . .; Np ; t ¼ 0:
ð1:4Þ
Considering t as the index of a generation, the parameter and population indexes as j, i respectively, the generation of mutant vector vi;t is achieved by Eq. (1.5). Vi;t ¼ Xbest;t þ F ðXr1;t Xr2;t Þ; r1; r2 1; 2; . . .; Np :
ð1:5Þ
In Eq. (1.5), xbest;t is the best candidate of current population and F is the scaling control parameter, usually between 0 and 1. Considering the index vectors that have been selected randomly, namely r1, r2 these should be distinct from each other and also from population index (specifically, r1 6¼ r2 6¼ i). Next, the trial vector uj;i;t is developed from the target vector elements xj;i;t and the donor vector elements vj;i;t . Hence the mixture of target vector and donor vector is a trial vector shown in Eq. (1.6).
uj;i;t ¼
if rand ð0; 1Þ Cr or j ¼ jrand; otherwise
vj;i;t; xj;i;t;
ð1:6Þ
The contents of donor vector enter into the trial vector uj;i;t with the probability Cr (0 < Cr < 1). Here Cr is the crossover constant, jrand is a random integer from (1, 2, …, D). jrand ensure that vj;i;t 6¼ xj;i;t . The uj;i;t take over the donor vector parameter with random index jrand to make sure that the uj;i;t differs by at least one parameter from the vector to be compared to xj;i;t [20]. The target vector xj;i;t and the trial vector uj;i;t are compared and the one with the lowest function value is admitted to the next generation as shown in Eq. (1.7).
Xi;t þ 1 ¼
Ui;t; Xi;t ;
if f Ui;t; f Xi;t ; otherwise
ð1:7Þ
Representing f as the cost function and if the f of uj;i;t is less or equal to the f of xj;i;t , then uj;i;t replaces xj;i;t in next generation. xj;i;t remains in the population of next generation otherwise. Threshold value is calculated by accepting the data classes sorted by their mean values i.e. l1 \l2 \. . .. . .. . .\lk . The overall probability error for the 2 adjacent Gaussian mixture model function is as follows [21] EðTi Þ ¼ Pi þ 1 E1 ðTi Þ þ Pi E2 ðTi Þ; Considering Eqs. (1.9) and (1.10)
i ¼ 1; 2; . . .; K 1
ð1:8Þ
Performance Analysis of Differential Evolution Algorithm Variants
Z E1 ðTi Þ ¼
Ti
pi þ 1 ð xÞdx
1
Z
E2 ðTi Þ ¼
333
ð1:9Þ
1
ð1:10Þ
piðxÞdx; Ti
Probability of misclassified pixels in the (i + 1)th class to the ith class is given as E1 ðTi Þ and the probability of erroneusly classified pixels in the ith class to the (i + 1)th class is given as E2 ðTi Þ. The threshold value between the ith and the (i + 1)th classes is specified as Ti . One Ti is selected in such a way that E(Ti) is minimal. P0j s are the prior probabilities within the combined PDF. Differentiating E ðTi Þ with respect to Ti , Eq. (1.11) provide the Ti , the optimum threshold value. ATi2 þ BTi þ C ¼ 0
ð1:11Þ
where A ¼ r2i r2i þ 1 B ¼ 2 ðli r2i þ 1 li þ 1 r2i Þ 2
2
2
C ¼ ðri li þ 1 Þ ðri þ 1 li Þ þ 2 ðri ri þ 1 Þ ln
r i þ 1 Pi ri Pi þ 1
ð1:12Þ
The quadratic equation has two solutions, only one is viable (positive and is within the mean bound).
3 Proposed Work Essentially, the problem of image segmentation can be attained by finding the threshold value using the DE algorithm. Initially find foreground and background values using histogram. Histogram outputs only two classes. To rectify that, the priori probability, mean and standard deviation is calculated using Gaussian mixture model from Eq. (1.2) [finds out the value for each pixel (0 to 256)]. By using Gaussian mixture model multiple classes has been found and the output of the Gaussian mixture model does not give the exact threshold value for every iteration. To overcome this differential evolution algorithm is used. To find the exact threshold value for each class, limit bound is set within the class. Fix the bound values by using the groups which is isolated in Gaussian mixture model cluster based on the intensity distribution. Histogram normalization is calculated by the Eq. (1.1). Mean square error is calculated by taking the difference of gaussian function and histogram. This is considered as the fitness function that is shown in Eq. (1.3). DE initially begins by initializing the population with Np candidate and D as the number of classes, in which the values are randomly distributed between lower bound and upper bound by using the Eq. (1.4). For each candidate of the population the gaussian mixture is applied to cluster the image pixels and the histogram is generated.
334
V. SandhyaSree and S. Thangavelu
A child is generated for every member of the population by following the DE algorithm steps (viz. Mutation and Crossover). The better one among the parent and child moves to the next generation. This process is repeated for 10 generations and the best two candidates from the last two generations are the leading candidates. The whole steps are repeated for 30 times and the average of all best candidates are used to find the optimum threshold value to do the image segmentation. The various mutation strategies [22] used in this work to do the comparative analysis is listed below.
“
”
“ “ “
”
“
” ” ”
4 Results and Analysis The performance of the presented algorithms are estimated and compared with each other. Considering the image “The Cameraman” shown in the Fig. 1(a). The goal is to perform the image segmentation using different types of mutation strategies of DE algorithm. In this process of DE, population size is fixed as 90 (Np) individuals, and every candidate holds 6 dimensions for class 2 and 9 dimensions for class 3, as shown in Eq. (1.13).
Fig. 1. (a) Image of “The Cameraman” [7] and (b) its concurrent histogram
IN ¼ fPN1 ; rN1 ; lN1 ; PN2 ; rN2 ; lN2 ; PN3 ; rN3 ; lN3 g
ð1:13Þ
Performance Analysis of Differential Evolution Algorithm Variants
335
The parameters (P, µ, r) are initialized in random where µ must be between 0 and 255, the DE algorithm parameters are set with the values as: x = 1.5, F = 0.25, Cr = 0.8 After several iterations, the Gaussian mixture model provide the output a shown in Fig. 2(a) and (b), by using the Eq. (1.2). From this P, µ, r is calculated for each pixel but the exact threshold value is not obtained hence the output calculated from this equation cannot be considered as the exact solution. Using the DE with Gaussian mixture model improve the performance.
Fig. 2. (a) segmented image using Gaussian mixture model for 2 classes (b) segmented image using Gaussian mixture model for 3 classes
Table 1 shows the results of image segmentation for the five different DE algorithmic variants. Both 2 class and 3 class segmentation were done using the algorithms. It is clear from the results shown in Table 1, that DE/best/1 performs well in both 2 class and 3 class image segmentation with less error value obtained from Eq. (1.8). Their respective threshold values obtained to do the image segmentation were also shown in the Table 1. The resultant image is shown in Fig. 3 for both 2 class and 3 class segmentations along with their historgram. The performance/ outcome can be improved further by mixing the algorithms in a distributed environment by exchanging the best values between them [23]. Table 1. Comparative analysis of all mutation strategies Mutation algorithms DE/rand/1
Error values (class 2) 0.00093
DE/best/1
0.00050
DE/current-tobest/1 DE/best/2
0.00077
DE/rand/2
0.00054
0.00077
Error values (class 3) 0.0011, 0.0008 0.0003, 0.0011 0.0004, 0.0002 0.0009, 0.0002 0.0007, 0.0003
Threshold value (class 2) 30
Threshold value (class 3) 25,140
33
26,135
32
26,137
38
30,139
34
26,142
336
V. SandhyaSree and S. Thangavelu
Fig. 3. (a) segmented image by using DE/best/1 for 2 classes, (b) its concurrent histogram and (c) segmented image using DE/best/1 for 3 classes, (d) its concurrent histogram
5 Conclusion An effective method for segmentation of image using Differential evolution is employed in this paper. The method involves different mutation strategies. Essentially, histogram is used for 2 class problems and gaussian mixture model cluster with DE algorithm is used for multi class problem. Results shows that DE/best/1 algorithms out performs than other algorithms. Compliance with Ethical Standards. ✓ All authors declare that there is no conflict of interest. ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Kaur, H., Sohi, N.: A study for applications of histogram in image enhancement. Int. J. Eng. Sci. 6, 59–63 (2017) 2. Kotte, S., Kumar, P.R., Injeti, S.K.: An efficient approach for optimal multilevel thresholding selection for gray scale images based on improved differential search algorithm (2016) 3. Chen, B., Zeng, W., Lin, Y., Zhong, Q.: An enhanced differential evolution based algorithm with simulated annealing for solving multiobjective optimization problems (2014) 4. Farnoosh, R., Yari, G., Zarpak, B.: Image segmentation using Gaussian mixture model. Int. J. Eng. Sci. 19, 29–32 (2008)
Performance Analysis of Differential Evolution Algorithm Variants
337
5. Tang, L., Dong, Y., Liu, J.: Differential evolution with an Individual-dependent mechanism. IEEE Trans. Evol. Comput. 19(4), 560–574 (2015) 6. Huang, Z., Chen, Y.: An improved differential evolution algorithm based on adaptive parameter. J. Control Sci. Eng. 2013, 5 (2013). Article ID 462706 7. Cuevas, E., Zaldívar, D., Perez-Cisneros, M.A.: Image Segmentation Based on Differential Evolution Optimization, pp. 9–21. Springer International Publishing, Switzerland (2016) 8. Tvrdik, J.: Adaptive differential evolution and exponential crossover. IEEE (2008) 9. Weber, M., Neri, F.: Contiguous Binomial Crossover in Differential Evolution. SpringerVerlag, Heidelberg (2012) 10. Das, S., Mullick, S.S., Suganthan, P.N.: Recent advances in differential evolution–an updated survey. Swarm Evol. Comput. 27, 1–30 (2016) 11. Haritha, K.C., Thangavelu, S.: Multi-focus region-based image fusion using differential evolution algorithm variants. In: Computational Vision and Biomechanics. LNCS, vol. 28, pp.579–592. Springer, Netherlands (2018) 12. Suganya, M., Menaka, M.: Various segmentation techniques in image processing: a survey. Int. J. Innov. Res. Comput. Commun. Eng. 2(1), 1048–1052 (2014) 13. Kaur, A., Kaur, N.: Image segmentation techniques. Int. Res. J. Eng. Technol. 02(02), 944– 947 (2015) 14. Zaitouna, N.M., Aqelb, M.J.: Survey on image segmentation techniques. In: International Conference on Communication Management and Information Technology. Elsevier (2015) 15. Choudhary, R., Gupta, R.: Recent trends and techniques in image enhancement using DE – a survey. Int. J. Adv. Res. Comput. Sci. 7(4), 106–112 (2017) 16. Kaur, B., Kaur, P.: A comparitive study of image segmentation techniques. Int. J. Comput. Sci. Eng. 3(12), 50–56 (2015) 17. Rahnamayan, S., Tizhoosh, H.R., Salama, M.M.: Image thresholding using differential evolution. In: Proceedings of International Conference on Image Processing, Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 244–249 (2006) 18. Osuna-Enciso, V., Cuevas, E., Sossa, H.: A Comparison of nature inspired algorithms for multi-thresholding image segmentation. Expert Syst. Appl. 40(4), 1213–1219 (2013) 19. Ochoa-Monitel, R., Carrasco Aguliar, M.A., Sanchez-Lopez, C.: Image segmentation by using differential evolution with constraints handling. IEEE (2017) 20. Storn, R., Price, K.: Differential evolution – a simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 11, 341–359 (1997) 21. Ali, M., Siarry, P., Pant, M.: Multi-level image thresholding based on hybrid DE algorithm. Application of medical images. Springer-Verlag Gmbh, Germany (2017) 22. Leon, M., Xiong, N.: Investigation of mutation strategies in differential evolution for solving global optimization problems, vol. 8467, pp. 372–383. Springer International Publishing, Switzerland (2014) 23. Thangavelu, S., ShanmugaVelayutham, C.: An investigation on mixing heterogeneous differential evolution variants in a distributed framework. Int. J. Bio-Inspired Comput. 7(5), 307–320 (2015)
Person Authentication Using EEG Signal that Uses Chirplet and SVM Sonal Suhas Shinde(&) and Swati S. Kamthekar Department of Electronics and Telecommunication Engineering, Pillai HOC College of Engineering and Technology, Rasayani, India [email protected],[email protected]
Abstract. The brain wave signal which is result of neurons activity basically used for authentication due to its benefits over traditional biometric system. However lots of work has been done in brainwave based authentication, also numerous pre-processing, numbers of features were extracted and different classification methods have been investigated for authentication system. This study focuses on EEG (Electroencephalography) signal authentication with excellent precision. This paper consist of identification and authentication of human based on the EEG signal, here the database is used which consist of EEG signal of 29 subject and five trail of each subject. The transform named as Chirplet is used for extraction of the feature and support vector machine is used to improve the efficiency of those EEG-based recognition systems. The paper’s goal is to achieve excellent classification rates and consequently assist create a secure and reliable bio-metric scheme using brain wave signal as a basis. This study achieves 99% precision. Keywords: Brain wave Biometric authentication Neurons EEG electrodes Wavelet transform Alpha Beta Gama Theta Delta
1 Introduction The billions of neurons are found in human brain that are accountable for electrical charge when brain operates the sum of all these charges generates an electric field which is in lV range, this field is measured using sensors named as electrode, changes in thinking will alter the interconnections of neurons that generate alterations in brain waves pattern. Now a day’s person recognition system is very popular researcher topic. Old authentication systems provide authentication for example passwords, pin, codes etc. However some biometrics mostly like fingerprints and retina are also used for authentication purposes, brain wave based authentication is new one in authentication history, in this the EEG signal is used to confirm the personal identity. Biometrics systems which are based on fingerprints, retina, and face etc. are suffers from attacks which degrade its performance, compared to the existing biometric, EEG offers better robustness and safety, as it is very difficult to replicate. EEG-based technique of authentication includes extraction and pattern recognition of information preprocessing features, where data collection is carried out with the help of Electrodes, collected data has EEG signal as well as noise usually artifacts so in preprocessing noise is removed. © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 338–344, 2020. https://doi.org/10.1007/978-3-030-37218-7_39
Person Authentication Using EEG Signal that Uses Chirplet and SVM
339
EEG signal has gamma, beta, alpha, theta and delta signals from which characteristics are obtained and compared with the pattern recognition database. This research organized as follows Sect. 2 as related work. General architecture of EEG based authentication system is discussed in Sect. 3. Proposed system is discussed in Sect. 4. Section 5 deals with the result where as proposed system conclude in Sect. 6.
2 Related Work Eye closed and eye open condition were used for authentication process where signal stability was major issue. Performance is good by considering EC (Eye Closed) condition only [1]. This paper proposed analysis of EEG signal with the help of power spectral density as a feature which further used k-nearest neighbors algorithm classifier for classification purpose. Comparing alpha band (8–13 Hz), beta band (13–30 Hz), mixed bands such as alpha and beta (8–30 Hz) and combining delta, alpha, beta and gamma bands (4–40 Hz) compares for the precision outcomes. Overall accuracy in terms of percentage was obtained above 80%. This paper concludes that highest accuracy obtained by combination of theta, alpha, beta, and gamma which provide highest accuracy about 89.21% [2]. Furthermore, it is common to get brain waves in the research area under relaxed or eye-closed conditions. However, evaluating the verification performance using such brain waves is not realistic [5]. Convolution neural network (CNN) is used to dig the characteristic for separating persons [6]. A wavelet based feature extraction technique is suggested to analyze the Potentials evoked by visual and auditory EEG signal [7]. In the proposed system, sample entropy features obtained from 64 channel EEG signal whereas frequency bands were evaluated for identification [8]. Changes in alpha action (8–13 Hz) during eyes open and closed were collected for each channel as features this results into alpha power increases when subject close their eyes which should be improved with more advanced authentication algorithms in future studies [9]. In proposed approach, the user is requested to visualize a pre decided number and respective EEG signal is captured. This study particularly focuses on the combined frequency bands like Alpha and Beta through all EEG channels, the maximum accuracy obtained was 96.97% [10]. Paper seeks to explore the best set of number of electrode for human authentication using VEP (Visual Evoked Potention) extraction techniques such as cross-correlation, coherence and amplitude mean used for classification purposes [11].
3 General Architecture of EEG Based Authentication Figure 1 shows general architecture of EEG based authentication system where EEG signal is captured with the help of Brain computer Interface (BCI) usually known as electrodes which are placed on the scalp, raw EEG signal consist noise and artifacts which are need to be process in preprocessing stage this artifacts are removed using
340
S. S. Shinde and S. S. Kamthekar
filter. In feature extraction section different features like mean, power spectral density, energy etc. are extracted from the filtered signal and further they are classified with the help of classifier. Finally classified output is compared with data base which is already stored in system.
Fig. 1. General architecture of EEG based authentication system
4 Proposed System 4.1
Signal Acquisition
Dataset Description The dataset is downloaded from UCI Machine Learning Repository; this database was obtained by performing the experiments using a commercial device for 29 subjects. The experiment was conducted with the highest ethical norms. The tests are visual experiments, so that the most interesting signals are the O1 and O2 electrodes. Each subject performed tests and respective EEG signal were capture with the help of 16 electrodes (Fig. 2).
Fig. 2. Proposed system.
Person Authentication Using EEG Signal that Uses Chirplet and SVM
341
Attribute Information Total 16 attributes are available, the last 14 of which are the signals from the electrodes and the first two are the time domain and interpolated signal. 4.2
Preprocessing
As EEG signals recorded non-invasively, the signal is affected by noise which is removed with the help of low pass filter with sampling frequency 173 Hz. Filtering provide noise free EEG signal having range 0–80 Hz. Further Notch filter is used to get EEG signal having the highest frequency 50 Hz. 4.3
Feature Extraction
EEG signal pattern changes with change in thought which changes neuron activity. Wavelet transform is capable to provide time - frequency information simultaneously. EEG signal has five basic component Gamma (c) – [30–40] Hz, Beta (b) – [13–30] Hz, Alpha (a) – [8–13] Hz, Theta (h) – [4–8] Hz and Delta (d) – [0.5–4] Hz. In this work the wavelet transform is used to separate EEG signal into five distinct frequency bands. The decomposition levels are chosen as 4. As a result, the EEG signals are broken down into D1–D4 as detail coefficient and A4 as a final approximation. The idea of a stationary signal cannot always consider, practically non stationary signal should be considered [12]. Non stationary is much nearer to the existence, such as the EEG signal, signal of rotary equipment during nonstationary events, human speech signals, etc. In non-stationary event, the general time frequency representation of signal is inefficient to describe non-stationary signals and the word instantaneous frequency (IF) plays a crucial role in characterized time-varying signals. As recent advanced time frequency analysis (TFA) technologies can provide more accuracy, however their limitations cannot be omitted by the time frequency analysis method should have following capabilities 1. Well-representing a linear multi-component signal, 2. Depending on the model of mathematics and the original time frequency analysis (TFA) technique, 3. Enabling for the mathematical model. The general form of Fourier transformation and short time transformation is Chirplet transform, which provide flexible frequency time window, linear chirplet transformation which provides greater time frequency representation and exhibit the best technique to characterize nonlinear information as well as being robust against noise [3]. This research uses Chirplet transform to extract the features from the alpha, beta, gama, theta and delta frequency band where frequency components are analysed for particular time interval. 4.4
Classification
Extracted features are applied to the classifier. Support vector machine (SVM) is a classifier which is less complex, nonlinear conditions are well treated in SVM. In this study total 725 features are extracted (29 subjects 5 trail 5 frequency bands) to classify the data, proposed study uses SVM classifier, SVM’s primary function is to convert input information into high dimensional space which results into the separation of data by providing hyper plane between different types of data. Hence the largest
342
S. S. Shinde and S. S. Kamthekar
margin hyperplane is selected which consider as a decision boundary to differentiate different type of data.
5 Result and Discussion The database which obtained by Machine Learning Repository consist VEP signal of 29 subject, which is filter out with sampling frequency 173 Hz, it has been observed noise is neglected and produce EEG signal which is further decomposed into five bands (Fig. 3).
Fig. 3. Input signal.
It is very essential to select the no. of decomposition stage in wavelet Transform. The concentrations are selected so that the wavelet coefficients retain those components of the signal which are used to classify the signal. In this research, as the EEG signals have no helpful frequency elements above 50 Hz, the number of concentrations of decomposition was selected as 4. As a result, the EEG signals broken down into D1– D4 details and A4, a final approximation, as shown in Fig. 4.
Fig. 4. Decomposition of signal
Person Authentication Using EEG Signal that Uses Chirplet and SVM
343
Further decomposed signal classified with the help of classifier. The proposed study use Support vector machine as a classifier. Due to their precision and capacity to handle a big amount of data, SVM is used for information classification. Without much additional computational effort, nonlinear limits can be used in SVM. In addition, SVM efficiency is highly competitive with other techniques, by using SVM this research achieve 99% accuracy (Table 1). Table 1. Camparision with existing work Existing method 7 subjects, EC condition, 7 features, NN [2] 26 subjects, music included potential, 60 features, SVM [3] 4 subjects, VEP, P300 classification [4] Serial visual presentation (RSVP), CNN [6] 109 subjects, sample entropy feature, cross-validation framework [8] 101 subjects, VEP, Fuzzy-Rough Nearest Neighbour (FRNN) [11] Proposed system: 29 subject+5 trails, VEP, LCT, SVM
Accuracy (%) 95 92.73 89 91.27 98.31 91 99
6 Conclusion Old biometric system such as Finger print, retina scan and PIN lock are suffer from different attacks, this study proposed authentication of person using EEG signal which uses the physiological and behavioral features of individual in order to authentication of individual. The precision output is very nice, which authenticates the individual known and dismisses the unknown. This paper de-scribes the technique for improving the effectiveness of current EEG-based person identification technologies and thereby encouraging the future development of EEG-based person recognition for security control. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest. ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Dmitriev, N.A., Belik, D.V., Zinevskaya, M.S.: Bioengineering system used to restore the patient’s brain fields activity after stroke (2016) 2. Lee, C., Kang, J.H., Kim, S.P.: Feature selection using mutual information for EEG-based biometrics (2016) 3. Lin, Y.P., Wang, C.H., Wu, T.L., Jeng, S.K., Chen, J.H.: Support vector machine for EEG signal classification during listening to emotional music (2008) 4. Moonwon, Y., Kaongoen, N., Jo, S.: P300-BCI-based authentication system (2016)
344
S. S. Shinde and S. S. Kamthekar
5. Nakanishi, I., Yoshikawa, T.: Brain waves as unconscious biometrics towards continuous authentication. In: 2015 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS), 9–12 November 2015 6. Wu, Q., Zeng, Y., Lin, Z., Wang, X.: Real-time EEG-based person authentication system using face rapid serial visual presentation. In: 8th International IEEE EMBS Conference on Neural Engineering Shanghai, China, 25–28 May 2017 7. Sundararajan, A., Pons, A., Sarwat, A.I.: A generic framework for EEG-based biometric authentication. In: 12th International Conference on Information Technology - New Generations (2015) 8. Thomas, K.P., Vinod, A.P.: Biometric identification of persons using sample entropy features of EEG during rest state. In: IEEE International Conference on Systems, Man, and Cybernetics, 9–12 October 2016 9. Toshiaki, K.A., Mahajan, R., Marks, T.K., Wang, Y.: High-accuracy user identification using EEG biometrics (2016) 10. Virgílio, B., Paula, L., Ferreira, A., Figueiredo, N.: Advances in EEG-based brain-computer interfaces for control and biometry (2008) 11. Siaw, H., Yun, H.C., Yin, F.L.: Identifying visual evoked potential (VEP) electrodes setting for person authentication. Int. J. Adv. Soft Comput. Appl. 7(3), 85–99 (2015) 12. Gang, U., Yiqui, Z.: Mechanical systems and signal processing (2015)
Artificial Neural Network Hardware Implementation: Recent Trends and Applications Jagrati Gupta1(&) and Deepali Koppad2 1
2
CMR Institute of Technology, Bengaluru, India [email protected] Ramaiah Institute of Technology, Bengaluru, India [email protected]
Abstract. The human brain far exceeds contemporary supercomputers in tasks such as pattern recognition while consuming only a millionth of the power. To bridge this gap, recently proposed neural architectures such as Spiking Neural Network Architecture (SpiNNaker) use over a million ARM cores to mimic brain’s biological structure and behaviour. Such alternate computing architectures will exhibit massive parallelism and fault tolerance in addition to being energy efficient with respect to traditional Von-Neuman architecture. This leads to the emergence of neuromorphic hardware that can exceed performance in Artificial Intelligence (AI) driven remote monitoring embedded devices, Robotics, Biomedical devices and Imaging systems etc. with respect to traditional computing devices. In this review paper, we focus on the research conducted in the field of neuromorphic hardware development and the applications. This paper consists of the survey of various training algorithms utilized in hardware design of neuromorphic system. Further, we survey existing neuromorphic circuits and devices and finally discuss their applications. Keywords: Artificial neural network Neuromorphic Hardware implementation Memristor Spintronics Brain machine interface
1 Introduction Neuromorphic engineering is the field of engineering where function of artificial unit called neuron made up of various analog and digital structures imitates biological neuron. Human brain has billions of neuron cells which process the information and works as a processor. In biological neuron, interaction between two neurons takes place at the junction called as synapse. Recently a lot of effort has been made for modeling these synapses using latest discovered memristors [1, 2]. Memristor can be used as synapses and thus utilized to shrink the area and complexity of neuromorphic circuits [3]. These Artificial Neuron Networks operate similar to the brain in terms of intelligence and while processing any information. In fact, human intelligence keeps evolving because of past experience. Similarly in artificial neural network, there is requirement of learning and training algorithm so that it can mimic human intelligence. The need arise as there is requirement of computing and processing large amount of data. In this © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 345–354, 2020. https://doi.org/10.1007/978-3-030-37218-7_40
346
J. Gupta and D. Koppad
review paper, the reference is basically to the research conducted in the domain of hardware implementation of neuromorphic systems. A lot of advancement and work has happened in this field in a decade in order to deal with complexity of calculations as power budget of human brain is far better than best supercomputers [4].
2 Training Algorithms In this section, study of some of the training algorithms has been presented which is widely used in implementation of neural network hardware. 2.1
Backpropagation (BP)
The BP targets for the minimum value of error using the method of gradient descent. Weights combination that reduces the error function is accepted as solution of the training problem. Neural networks are categorized as Static and Dynamic Networks. Static networks contains no feedback elements and have no delays where as in dynamic networks, the output is dependent on the current input to the network and also on the present or past inputs, outputs. A general framework for describing dynamic neural networks has been proposed in [5]. Backpropagation algorithms basically contains three steps (i) Forward Propagation (ii) Backward Propagation (iii) Calculation of updated weight values. In backpropagation, the weight values are wij pointing from i to j node. The information fed into the subnetwork in the forward propagation was xi wij, here xi refers to buffered output of unit i. BP step calculates the derivative of E, i.e., @E @ xi wij . We finally have, @E @E ¼ xi @wij @xi wij
ð1Þ
Error that is backpropagated at the j-th node is dj, now the partial derivative of E with respect to wij is given by xi dj. Now, wij is recalculated by adding error term, Dwij ¼ c:xi:dj hence, this transforms training of neural network. Prior work proposes a hardware architecture to implement both recall and training for Multilayer Feed Forward Spiking Neural Networks [6]. In Spiking Neural Networks information among the neurons is passed via spikes or pulses. Learning process can be categorized as off and on-the chip learning. Off chip learning is slower and thus speed up using on chip implementation of training hardware. Recently, analog backpropagation learning hardware has been proposed for various architectures based on memristors [7]. One of the applications of backpropagation is Sensor interpreter based on neural network where backpropagation is utilized for training the data generated from the CHEMFET (chemically field-effect transistor) sensor [8]. Implementing backpropagation using analog circuits is still a problem so recently efforts have been made to design analog circuits for realizing backpropagation algorithm that can be used with memristive crossbar arrays in neural networks [9]. On chip training is faster so recent work has been presented for on-chip back-propagation training algorithm to implement FPGA
Artificial Neural Network Hardware Implementation
347
based 2 2 l neural network architecture [10]. A new memristor crossbar architecture has been proposed where the backpropagation algorithm has been used to train a 3-bit parity circuit [11]. Efforts have been made to develop mathematical models of various non idealities that are found in crossbar implementations such as neuron and source resistance, device difference and analysis of their effects on fully connected network and convolutional neural network which are trained with backpropagation algorithm [12]. Thus, prior works have utilized backpropagation for training various neural networks as it is simple, fast and easy to program. 2.2
Spike Timing Dependent Plasticity (STDP)
Biologically, STDP is a phenomenon where the spikes timings directly cause variation in the strength of synapse [13]. This is an emerging area of research to work upon and is being implemented in neuromorphic networks. This learning algorithm can be modeled as below [14]: KðDtÞ ¼fS þ expðDt=Ds þ Þ S expðDt=Ds Þ
if Dt\0 if Dt 0g
ð2Þ
where K(Δt) determines the synaptic modification. Here, s+ and s- depicts the ranges of pre-to-postsynaptic interspike intervals. It simply implies that a neuron embedded in a neural network can point out which neighboring neurons are worth listening to by potentiating those inputs that predict its own spiking activity [15]. STDP has been implemented on neuromorphic hardware like SpiNNaker. A pre-synaptic scheme and event-driven model has been used to modify the weights only when pre-synaptic spike appears [16]. STDP has been applied for various applications recently in Dynamic Vision sensor cameras for visual feature training [17]. A new method has been proposed to design causal and acausal weight updation using only forward lookup access of the synaptic connectivity table that is memory-efficient [18]. STDP weight update rules has been shown in Fig. 1 where the leftmost block shows acausal updates produced by new presynaptic input events. Middle block shows processing of causal updates just before acausal updates. Rightmost block depicts that causal updates are produced due to completion events.
Fig. 1. STDP weight update rule [18].
348
2.3
J. Gupta and D. Koppad
Spike Driven Synaptic Plasticity (SDSP)
This training rule has advantage over Spike Time Dependent Plasticity (STDP) as it can be utilized for implementation of condensed digital synapses. Recent research proposes SDSP-based 3-bit asynchronous programmable synapse [19] with an area of 25 µm2. Synaptic efficacy J(t) is defined using analog internal variable X(t), given as the variation in post-synaptic depolarization, or caused by one pre-synaptic spike [20]. Some research also describes circuit implementation of various plasticity rules which includes SDSP [21]. Spike driven threshold training algorithm has been proposed for mixed memristor based systems [22]. Table 1 compares various training algorithms for neural network hardware implementation. Table 1. Comparison of training algorithms used for implementing neural network hardware. Ref. [7]
Training algorithm BP
Advantage/Accuracy
Hardware
Mode
Applications
MNIST-93% YALE-78.9%
Memristive Crossbar Arrays CHEMFET (Sensor)
Non Spiking
Near Sensor Processing
[8]
BP
R factor = 0.9106
Non Spiking
FPGA
Spiking
FPGA
Spiking
Classification of target ion concentration in mixed solution Dynamic Vision Sensor Cameras –
[17]
STDP
[18]
STDP
[19]
SDSP
Small delay (9.9 µs) for online learning Reconfigurable synaptic connectivity Area of 25 µm2
FDSOI CMOS Process
Spiking
Digital synapse Implementation
3 Neuromorphic Circuits and Devices 3.1
Memristors
It is a bi- terminal solid state electrical component that controls the flow of current in a circuit and non volatile in nature which mimics the synaptic behaviour. Prior text discusses the design of various types of memristor based neural architecture such as Spiking Neural Network, Convolutional, Multilayer, and Recurrent Neural Network [23]. Other work proposes learning algorithm for various kinds of memristive architecture [6]. Full circuit level design using Crossbar memristive structures can be simulated for Convolutional Neural Networks [24]. Recently much research focuses on device level implementations and fabrication of memristor crossbar arrays [25], MoS2/Graphene based memristors to mimic artificial neuron [26]. Other research
Artificial Neural Network Hardware Implementation
349
shows that memristor is mainly used for implementation of synapses [27]. Crossbar arrays of memristors has disadvantage of leakage of data and thus recently synapses has been developed using nanoporous memristor synapses where there is possibility of self mending [28]. Usage of memristor for implementation of circuits has shortcomings also since fabrication process is still in development phase [29]. 3.2
Spintronics Devices
These are nanoelectronic devices which have been implanted with magnetic impurities and materials that allows the usage of spin of electrons and charge associated with it required for transfer of information. Recently spin-orbit torque (SOT) device has been used which shows resistive switching and thus can act as an artificial synapse in neuromorphic systems [30]. Recent research work proposes compound spintronics synapse made up of multiple Magnetic Tunnel Junctions (MTJ) which has the advantage to attain stable multiple resistance states and focuses on the structure of All Spin Artificial Neural Network [31]. The most recent research shows development of spintronic devices which can be used as synapse in neural network and thus reviewed on spintronics based synapse [32]. A neural computing system has been proposed in a work which models neural transfer function for MTJ using its device physics [33]. Problems that occur with nanoscale devices are unstability and noise so experimental research is carried out to prove that nanoscale MTJ’s oscillator can help to achieve good accuracy [34]. Some papers review application of spintronics device in Bioinspired computing [35] and various spintronics structures opening the path for implementation of neurons and synapses which are efficient in area [36]. 3.3
Floating Gate MOS Devices
Motivation to use Floating Gate MOS (FGMOS) in implementation of neural networks is low power dissipation, slow memory discharge due to charge retention property, use of standard fabrication process and compatibility with CMOS technology. Prior work includes implementation of an Operational Transconductance Amplifier (OTA) using Multiple Input FGMOS which can be used as programmable neuron activation function [37], design and implementation of Neural network using FGMOS [38], design of an analog synapses using complementary Floating gate MOSFETs [39] and excitatory synapse circuit using quasi floating MOS Transistor [40]. Non-Linear Neural Networks has been implemented using FGMOS based neurons which is basically used for image processing application [41]. Several other research discuss Convolutional Neural Network and its hardware implementation using FGMOS at an architecture level [42]. Whereas at circuit level Multiple input FGMOS is used in differential pair to be used in cellular neural network which helps in reducing power consumption [43]. For pattern recognition application hardware for generalized n-input neuron circuit is implemented which utilizes FGMOS based neuron model [44]. Table 2 compares neuromorphic devices and circuits.
350
J. Gupta and D. Koppad Table 2. Comparison of neuromorphic devices and circuits.
Ref. Device [26] Memristor
Mode Spiking
Level Applications Device Can be used for Random Number generation and hardware security
[27] Memristor
Non-Spiking Circuit Pattern Recognition
[28] Memristor
Spiking
Device Pattern Recognition
[34] Spintronics (MTJ)
Spiking
Circuit Spoken digit Recognition
[44] Floating Gate MOS
Non-Spiking Circuit Pattern Recognition
Remarks Novelty: Fabrication of device by forming V-MoS2 on a CVD graphene monolayer Proposed CMOS memristor synapses Fabricated Crossbar array, Accuracy of 89.08% after 15 epochs Experiment conducted to recognize digits independent of speaker Designed FGMOS based neuron model
4 Applications ANN hardware is being implemented that can be used for various applications. Neural network applications includes biomedical applications, pattern recognition, robotics and brain machine interface etc. 4.1
Biomedical
Prior work proposed a neural network which can configure itself automatically called as (ARANN) architecture to deal with unexpected errors similar to self healing and self recovery concept in nervous system of humans [45]. Another work proposes FPGA implementation based hardware that can be used for detection of nematodes in sheep flocks [46]. Other application includes hardware implementation of convolutional neural network for seizure detection [47] and network-on-chip design for real cerebellum [48]. 4.2
Signal Processing and Classification/Pattern Recognition
Prior research work demonstrates development of Pattern recognition system for detection of sulphate reducing bacteria [49]. Vibroacoustic signal classifier was designed and tested for FPGA implemented neural network [50]. A lot of work has been done that focuses on application called as Pattern Recognition [23, 25, 27, 28, 30, 31, 33, 40, 44]. Some work also shows that developed design can be used for spoken digit recognition [34]. Another application is for image processing where the designed circuit is meant for gray scale image processing [41].
Artificial Neural Network Hardware Implementation
4.3
351
Robotics
Hardware based implementation of an artificial neural network has been validated for a task for controlling locomotion in a quadrupled walking robot [43]. Other work demonstrates a low power silicon spiking neuron for real time controlling of articulated robot [51], and self repairing robot vehicle using FPGA [52]. 4.4
Brain Machine Interface
Some recent research focuses on proposal of Hardware processing unit for brain machine interface using implementation on FPGA [53] while another application is development of Brain machine interface controller in real time [54]. Recently, a Brain Machine Interface system has been implemented on FPGA for simulating mental actions [55].
5 Conclusion In this review paper, we focusssed on the research work done in the field of hardware implementation of neuromorphic systems. We introduced the concept of neuromorphic in Sect. 1. Further sections showed survey of various training algorithms. Recent literature shows that much research is being carried out on neural systems applying the variants of backpropagation and synaptic plasticity rules. Further, neuromorphic circuits and devices section focused on various devices that can be used for hardware implementation of neural network. A lot of work has been done using memristive devices and thus leading to application in low power, adaptive and compact neuromorphic systems. Systems using FGMOS has very good compatibility with CMOS circuits. Research on implementation of artificial neural network using spintronics device is in pace. Further, these designs and their hardware implementation are being used for validating the model of neural network but many more applications are yet to come in the future in the field of robotics, Biomedical, Signal processing, Healthcare system, Brain Machine Interface and Image processing.
References 1. Hu, M., Li, H., Chen, Y., Wu, Q., Rose, G.S., Linderman, R.W.: Memristor crossbar-based neuromorphic computing system: a case study. IEEE Trans. Neural Netw. Learn. Syst. 25, 1864–1878 (2014) 2. Liu, C., Hu, M., Strachan, J.P., Li, H.: Rescuing memristor-based neuromorphic design with high defects. In: IEEE Design Automation Conference, June 2017 3. Nafea, S.F., Dessouki, A.A., El-Rabaie, S., El-Sayed.: Memristor Overview up to 2015. Menoufia J. Electron. Eng. Res. (MJEER), 79–106 (2015) 4. Camilleri, P., Giulioni, M., Dante, V., Badoni, D., Indiveri, G., Michaelis, B.: A neuromorphic a VLSI network chip with configurable plastic synapses. In: International Conference of HIS, September 2007
352
J. Gupta and D. Koppad
5. Jesus, O.D., Hagan, M.T.: Backpropagation algorithms for a broad class of dynamic networks. IEEE Trans. Neural Netw. 18, 14–27 (2017) 6. Nuno-Maganda, M.A., Arias-Estrada, M., Torres-Huitzil, C., Girau, B.: Hardware ımplementation of spiking neural network classifiers based on backpropagation-based learning algorithms. In: IEEE International Joint Conference on Neural Networks, June 2009 7. Krestinskaya, O., Salama, K.N., James, A.P.: Learning in memristive neural network architectures using analog backpropagation circuits. IEEE Trans. Circuits Syst. I 66(2), 719– 732 (2019) 8. Aziz, N.A., Latif, M.A.K.A., Abdullah, W.F.H., Tahir, N.M., Zolkapli, M.: Hardware ımplementation of backpropagation algorithm based on CHEMFET sensor selectivity. In: IEEE International Conference on Control System, Computing and Engineering, January 2014 9. Krestinskaya, O., Salama, K.N., James, A.P.: Analog backpropagation learning circuits for memristive crossbar neural networks. In: IEEE International Symposium on Circuits and Systems (ISCAS), May 2018 10. Vo, H.M.: Implementing the on-chip backpropagation learning algorithm on FPGA architecture. In: IEEE International Conference on System Science and Engineering (ICSSE), July 2017 11. Vo, H.M.: Training on-chip hardware with two series memristor based backpropagation algorithm. In: IEEE International Conference on Communications and Electronics (ICCE), July 2018 12. Chakraborty, I., Roy, D., Roy, K.: Technology aware training in memristive neuromorphic systems for nonideal synaptic crossbars. IEEE Trans. Emerg. Top. Comput. Intell. 2(5), 335– 344 (2018) 13. Shouval, H.Z., Wang, S.S.-H., Wittenberg, G.M.: Spike timing dependent plasticity: a consequence of more fundamental learning rules. Front. Comput. Neurosci. 4, 19 (2010) 14. Song, S., Miller, K.D., Abbott, L.F.: Competitive Hebbian learning through spike-timingdependent synaptic plasticity. Nat. Neurosci. 3(9), 919–926 (2000) 15. Markram, H., Gerstner, W., Sjostrom, P.J.: Spike-timing-dependent plasticity: a comprehensive overview. Front. Synaptic Neurosci. 4(2) (2012) 16. Jin, X., Rast, A., Galluppi, F., Davies, S., Furber, S.: Implementing spike-timing-dependent plasticity on SpiNNaker neuromorphic hardware. In: IEEE World Congress on Computational Intelligence (WCCI), July 2010 17. Yousefzadeh, A., Masquelier, T., Serrano-Gotarredona, T., Linares-Barranco, B.: Hardware ımplementation of convolutional STDP for on-line visual feature learning. In: IEEE International Symposium on Circuits and Systems (ISCAS), May 2017 18. Pedroni, B.U., Sheik, S., Joshi, S., Detorakis, G., Paul, S., Augustine, C., Neftci, E., Cauwenberghs, G.: Forward table-based presynaptic event-triggered spike-timing-dependent plasticity. In: IEEE Biomedical Circuits and Systems Conference (BioCAS), October 2016 19. Frenkel, C., Indiveri, G., Legat, J.-D., Bol, D.: A fully-synthesized 20-gate digital spikebased synapse with embedded online learning. In: IEEE International Symposium on circuits and systems (ISCAS), May 2017 20. Fusi, S., Annunziato, M., Badoni, D., Salamon, A., Amit, D.J.: Spike-driven synaptic plasticity: theory, simulation, VLSI implementation. J. Neural Comput. 12(10), 2227–2258 (2000) 21. Azghadi, M.R., Iannella, N., Al-Sarawi, S.F., Indiveri, G., Abbott, D.: Spike-based synaptic plasticity in silicon: design, implementation, application, and challenges. Proc. IEEE 102, 717–737 (2014)
Artificial Neural Network Hardware Implementation
353
22. Covi, E., George, R., Frascaroli, J., Brivio, S., Mayr, C., Mostafa, H., Indiveri, G., Spiga, S.: Spike-driven threshold-based learning with memristive synapses and neuromorphic silicon neurons. J. Phys. D Appl. Phys. 51(34), 344003 (2018) 23. Zhang, Y., Wang, X., Friedman, E.G.: Memristor-based circuit design for multilayer neural networks. IEEE Trans. Circuits Syst.–I 65(2), 677–686 (2018) 24. Alom, M.Z., Taha, T.M., Yakopcic, C.: Memristor crossbar deep network ımplementation based on a convolutional neural network. In: International Joint Conference on Neural Networks (IJCNN), July 2016 25. Bayat, F.M., Prezioso, M., Chakrabarti, B., Nili, H., Kataeva, I., Strukov, D.: Implementation of multilayer perceptron network with highly uniform passive memristive crossbar circuits. Nat. Commun. 9, 2331 (2018) 26. Krishnaprasad, A., Choudhary, N., Das, S., Kalita, H., Dev, D., Ding, Y., Tetard, L., Chung, H.-S., Jung, Y., Roy, T.: Artificial neuron using vertical MoS2/Graphene threshold switching memristors. Sci. Rep. 9, 53 (2019) 27. Rosenthal, E., Greshnikov, S., Soudry, D., Kvatinsky, S.: A fully analog memristor-based neural network with online gradient training. In: IEEE International Symposium on Circuits and Systems (ISCAS), May 2016 28. Choi, S., Jang, S., Moon, J.H., Kim, J.C., Jeong, H.Y., Jang, P., Lee, K.J., Wang, G.: A selfrectifying TaOy/nanoporous TaOx memristor synaptic array for learning and energy-efficient neuromorphic systems. NPG Asia Mater. 10, 1097–1106 (2018) 29. Chen, Y., Li, H., Yan, B.: Challenges of memristor based neuromorphic computing system. Sci. China Inf. Sci. 61, 060425 (2018) 30. Fukami, S., Borders, W.A., Kurenkov, A., Zhang, C., DuttaGupta, S., Ohno, H.: Use of analog spintronics device in performing neuro-morphic computing functions. In: IEEE Berkeley Symposium on Energy Efficient Electronic Systems and Steep Transistor Workshop (E3S), October 2017 31. Zhang, D., Zeng, L., Cao, K., Wang, M., Peng, S., Zhang, Y., Zhang, Y., Klein, J.-O., Wang, Y., Zhao, W.: All spin artificial neural networks based on compound spintronic synapse and neuron. IEEE Trans. Biomed. Circuits Syst. 10(4), 828–836 (2016) 32. Fukami, S., Ohno, H.: Perspective: spintronic synapse for artificial neural network. J. Appl. Phys. 124(15), 151904 (2018) 33. Sengupta, A., Parsa, M., Han, B., Roy, K.: Probabilistic deep spiking neural systems enabled by magnetic tunnel junction. IEEE Trans. Electron Device 63(7), 2963–2970 (2016) 34. Torrejon, J., Riou, M., Araujo, F.A., Tsunegi, S., Khalsa, G., Querlioz, D., Bortolotti, P., Cros, V., Yakushiji, K., Fukushima, A., Kubota, H., Yuasa, S., Stiles, M.D., Grollier, J.: Neuromorphic computing with nanoscale spintronic oscillators. Nat. Lett. 547, 428–431 (2017) 35. Grollier, J., Querlioz, D., Stiles, M.D.: Spintronic nano-devices for bio-inspired computing. Proc. IEEE 104, 2024–2039 (2016) 36. Sengupta, A., Yogendra, K., Roy, K.: Spintronic devices for ultra-low power neuromorphic computation. In: IEEE International Symposium on Circuits and Systems (ISCAS), May 2016 37. Babu, V.S., Rose Katharine, A.A., Baiju, M.R.: Adaptive neuron activation function with FGMOS based operational transconductance amplifier. In: IEEE Computer Society Annual Symposium on VLSI, May 2016 38. Keles, F., Yildirim, T.: Low voltage low power neuron circuit design based on subthreshold FGMOS transistors and XOR implementation. In: International Workshop on Symbolic and Numerical Methods, Modeling and Applications to Circuit Design (SM2ACD) (2010) 39. Sridhar, R., Kim, S., Shin, Y.-C., Bogineni, N.C.: Programmable Analog Synapse and Neural Networks Incorporating Same, United States (1994)
354
J. Gupta and D. Koppad
40. Fernandez, D., Villar, G., Vidal, E., Alarcon, E., Cosp, J., Madrenas, J.: Mismatch-tolerant CMOS oscillator and excitatory synapse for bioinspired ımage segmentation. In: IEEE International Symposium of Circuits and Systems, May 2005 41. Flak, J., Laihot, M., Halonen, K.: Binary cellular neural/nonlinear network with programmable floating-gate neurons. In: IEEE International Workshop on Cellular Neural Networks and their Applications, May 2005 42. Lu, D.D., Liang, F.-X., Wang, Y.-C., Zeng, H.-K.: NVMLearn: a simulation platform for non-volatile-memory-based deep learning hardware. In: IEEE International Conference on Applied System Innovation (ICASI), May 2017 43. Nakada, K., Asai, T., Amemiya, Y.: Analog CMOS ımplementation of a CNN-based locomotion controller with floating-gate devices. IEEE Trans. Circuits Syst.–I 52(6), 1095– 1103 (2005) 44. Kele, F., Yldrm, T.: Pattern recognition using N-input neuron circuits based on floating gate MOS transistors. In: IEEE EUROCON, May 2009 45. Jin, Z. Cheng, A.C.: A self-healing autonomous neural network hardware for trustworthy biomedical systems. In: IEEE International Conference on Field Programmable Technology, December 2011 46. Rahnamaei, A., Pariz, N., Akbarimajd, A.: FPGA ımplementation of an ANN for detection of anthelmintics resistant nematodes in sheep flocks. In: IEEE Conference on Industrial Electronics and Applications (ICIEA), May 2009 47. Heller, S., Hugle, M., Nematollahi, I., Manzouri, F., Dumpelmann, M., Schulze-Bonhage, A., Boedecker, J., Woias, P.: Hardware implementation of a performance and energyoptimized convolutional neural network for seizure detection. In: IEEE Annual International Conference of IEEE Engineering in Medicine and Biology Society (EMBC), July 2018 48. Luo, J., Coapes, G., Mak, T., Yamazaki, T., Tin, C., Degenaar, P.: Real-time simulation of passage-of-time encoding in cerebellum using a scalable FPGA-based system. IEEE Trans. Biomed. Circuits Syst. 10(3), 742–753 (2016) 49. Tan, E.T., Halim, Z.A.: Development of an artificial neural network system for sulphatereducing bacteria detection by using model-based design technique. In: IEEE Asia Pacific Conference on Circuits and Systems, December 2012 50. Dabrowski, D., Jamro, E., Cioch, W.: Hardware ımplementation of artificial neural networks for vibroacoustic signals classification. Acta Physica Polonica Ser. A 118(1), 41–44 (2010) 51. Menon, S., Fok, S., Neckar, A., Khatib, O., Boahen, K.: Controlling articulated robots in task-space with spiking silicon neurons. In: IEEE International Conference on Biomedical Robotics and Biomechatronics RAS/EMBS, August 2014 52. Liu, J., Harkin, J., McDaid, L., Halliday, D.M., Tyrrell, A.M., Timmis, J.: Self-repairing mobile robotic car using astrocyte neuron networks. In: IEEE International Joint Conference on Neural Networks (IJCNN) (2016) 53. Wang, D., Hao, Y., Zhu, X., Zhao, T., Wang, Y., Chen, Y., Chen, W., Zheng, X.: FPGA ımplementation of hardware processing modules as coprocessors in brain-machine ınterfaces. In: IEEE Annual International Conference of Engineering in Medicine and Biology Society (EMBS) (2011) 54. Kocaturk, M., Gulcur, H.O., Canbeyli, R.: Toward building hybrid biological/in silico neural networks for motor neuroprosthetic control. Front. Neurorobotics 9(8), 496 (2015) 55. Malekmohammadi, A., Mohammadzade, H., Chamanzar, A., Shabany, M., Ghojogh, B.: An efficient hardware implementation for a motor imagery brain computer interface system. Scientia Iranica 26, 72–94 (2019)
A Computer Vision Based Fall Detection Technique for Home Surveillance Katamneni Vinaya Sree(&) and G. Jeyakumar Department of Computer Science and Engineering, Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, Coimbatore, India [email protected], [email protected]
Abstract. In this modern era where the population and life expectancy are continuously increasing, the demand for an advanced healthcare system is increasing at an unprecedented rate. This paper presents a novel and costeffective fall detection system for home surveillance which uses a surveillance video to detect the fall. The advantage of the proposed system is that it doesn’t need the person to carry or wear a device. The proposed system uses background subtraction to detect the moving object and marks it with a bounding box. Furthermore, few rules are based on the measures extracted from the bounding box and contours around the moving object. These rules are used with the transitions of a finite state machine (FSM) to detect the fall. It is done using the posture and shape analysis with two measures viz height and speed of falling. An alarm is sent when the fall is confirmed. The proposed approach is tested on three datasets URD, FDD and multicam. The obtained results show that proposed system works with an average accuracy of 97.16% and excels the previous approaches. Keywords: Fall detection Home surveillance Finite state machine Health care
1 Introduction The part of the population consisting of elderly people is growing day by day. This increase in the population demands improvement in health care. [1] From 2011 to 2028 the deaths due to fall are expected to increase by 138% [2]. One of the major care that can be taken about elderly people is their injuries due to fall. Based on the WHO statistics accidental falls have an effect of 35% to 42% on elderly people each year. People experience frequent falls as they are infirm and weak from age-related biological challenges. According to United States international organization statistics almost 40% of the old people are currently living independently. The increase in this aged population is estimated quite a pair of billions by 2050. For such growing population, some assistance is needed to lead an independent life. Such assistance, if unmanned, can make their life easier and even surveys say that these elderly people are ready to welcome the concept of such surveillance for their improved security and
© Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 355–363, 2020. https://doi.org/10.1007/978-3-030-37218-7_41
356
K. V. Sree and G. Jeyakumar
safety. So, the amount spent for these injuries will increase to 70% over a period from 2018 to 2035. When developing a fall detection system, one should understand “when people fall” and in “what kind of scenarios” it happens. So “when people fall” is vital to understand while we design the sample cases as it impacts the performance of the designed system. In [3] they state reasons for the above two questions, their study, contains more than 75,000 falls from a hospital, they provide proper information to support the reasons derived for the above raised questions. Age, activities, type of floor they walk and footwear are the factors that are analyzed. In resident’s room 62% of the falls took place and 22% of the falls in common areas. But the data did not reveal how much time the resident has spent in each of these places. They state that an elderly person falls once or twice a year [7]. One important data observed in this study is most of the falls occur in the morning and there is 60% more risk from these falls as the caretakers are unavailable. And the other reason might be due to the medication that is used by them last night as most of them are sedative to give them good sleep. Studies related to fall can be divided as fall prevention, fall detection/identification and injury reduction. It is difficult to prevent people from fall as there will be very little time to react. But a solution can be drawn similar to a biker wearing a helmet to protect him/her in case of fall [4]. As we don’t have chances to prevent the falls all that can be done is identification/detecting a fall accurately and try to reduce the loss that happens. The rest of the paper is arranged as follows Sect. 2 discusses the previous approaches for fall detection. Section 3 introduces the design and implementation of the proposed approach. Next, Sect. 4 summarizes the results of the experiment and compares with the previous works. Conclusion and future work are briefed out in Sect. 5.
2 Related Works There are large verities of products that are available for detecting the fall. But almost all of them are designed to send some alerts after the fall occurred [5]. This section discusses three systems which are available to give a brief view to these kinds of systems. The most commonly used product is from Sweden which is designed as a wearable device for arm with a button to give alarm. When person falls they are required to press the alarm button so that they get help from the nearest base station available, but the drawback of this method is if the person is not able to press the button then the base station can’t be alarmed about the fall. Another company from USA has designed one system which is almost same as the above discussed product but instead of a security alarm they used an accelerometer [6]. Advantage of this product is that it detects the falls automatically, by analyzing whether the user is moving after the fall or not based on which it decides to give off the alarm. But the disadvantage of this is its limited range; it works up to 180 m only from the base station. A Swedish company also has a similar type of product but the only difference is that it works on the user’s mobile phone. So that it makes the user comfortable to leave the mobile in home with the system still functioning, it detects the person’s location and his actions using global positioning system (GPS).
A Computer Vision Based Fall Detection Technique for Home Surveillance
357
A company from Sweden has a similar one as above, with little difference as the system runs on a cellphone used by the user. This helps the user to have his/her own lifestyle as carrying or having a cellphone is very common now-a-days due to advancement in technology. But this system cannot detect the fall if the person is not carrying his/her cellphone. And additionally it keeps track of the GPS of the person using it. The working environment for the above discussed solutions differs. The solution presented in this paper has specific range only due to positioning of the camera. So this solution will not work on its own. The study carried out gave better understanding on how existing solutions for the fall detection problem work. This gave the novel idea used in the proposed system presented in this paper. A brief summary of the studied methodologies is discussed below and all the referred works are mentioned in the references at the end. The most common techniques use videos along with accelerometers. Some approaches [13, 14] and [15] use different type of sensing devices and detects the abnormal motions which relate to falls in daily life. But the drawback with sensors is it needs lot of data to determine normal behavior. To overcome these issues a vision based approach is used, in the proposed system, to monitor and detect the falls. This system tracks the moving person continuously from surveillance video and based on the parameters it detects the fall. On taking into view all the studied methods and approaches [8, 11] and [16] to solve the fall detection problem a novel approach with low cost based on vision computing is proposed. The design and implementation is discussed in the next section. The proposed method uses the video from a surveillance camera and detects occurrence of fall. The method is tested on three different datasets and the results are discussed.
3 Design and Implementation Figure 1 shows the proposed system architecture. Initially the surveillance video is taken and is divided into frames. For each of these frames background subtraction is applied and preprocessing is carried out. Next all the required features are extracted. Then, based on the features, conditions are framed and a finite state machine is formulated to decide the fall. The vital task in human fall detection is to detect human object accurately. The proposed system employs background subtraction and contour finding for this tracking. The system initially obtains the moving objects in the video using background subtraction and then does foreground segmentation, then eliminates if any noise is present using noise elimination such as morphological operations. Whenever a moving object is identified, the system continuously tracks the object. In the proposed system the object of interest is human being. The moving human being is continuously tracked and the detection is represented using a bounding box around the moving object. From [12, 17] and [18], studied how these moving objects can be detected in the video and formulated the procedure to detect and extract the required features from the bounding box.
358
K. V. Sree and G. Jeyakumar
Fig. 1. Architecture of proposed system
Background Subtraction - As video is the input to the proposed system, the input frames of the video are extracted and a background subtraction algorithm is applied in the proposed method. This system applies background subtraction MOG2 subtractor from OpenCv and employed a Gaussian mixture model so that it will have less variance in differentiating the background and foreground pixels. Based on the output, the pixels are partitioned as foreground and background and marked. Pre-Processing of the Extracted Frames - A morphological model is employed to remove any noise such as fills of holes in the moving objects. In the previous step major noise are removed by background subtraction. So the region of interest is identified which is a moving object in foreground. Feature Extraction - After the extraction of required frames with all the preprocessing done, the system extracts contours and all the required measures from the bounding box. The bounding box is a rectangle drawn around the moving object. The features considered are aspect ratio, horizontal and vertical gradients, fall angle, acceleration, height and falling speed. The usage of the measures for fall detection is explained below. • Aspect Ratio - Aspect ratio is the measure that is largely affected when a person posture is changed from standing to any other abnormal position. So, this measure is taken into consideration in this approach. • Horizontal and vertical gradients - The measure that is mostly affected when a person falls is their horizontal and vertical gradients in simple terms it is X and Y axes values so this measure is considered. • Fall Angle - Fall angle is the measure calculated from the centroid of the falling object with respect to the horizontal gradient. So when the fall angle is less than 45° it can be observed that there is major chance that the person ha fell down. But, this
A Computer Vision Based Fall Detection Technique for Home Surveillance
359
measure alone cannot determine the fall as sometimes when a person is sitting or bending the fall angle will be less than 45°. • Change in height - Whenever there is a change in the posture the height of the human body varies so considering this is worthy, it is calculated as a difference of the mean height of human body and the height of the detected human at that time frame. Based on the features extracted a rule based state machine is designed to capture and analyze the bounding box values continuously. The fall detection system is designed as a three state finite machine and framed 8 rules in total to decide the state transition. Figure 2 describes the finite state machine and the states used: Fall_detection, Posture _changed and Fall_confirmation.
Fig. 2. FSM for detection of fall
State Transition - The rules that are framed for the FSM are as follows 1. Rule1 - Let theta be the fall angle measure from the centroid coordinates (C1,C2) which is the center of mass coordinate of the object. This rule is satisfied if the theta value is less than 45°. 2. Rule2 - Acceleration is one of the good factor that can be considered when detecting fall. When a person falls the acceleration will be lower than 130 cm2 [7]. 3. Rule3 - Height is also a good factor as the height of the person will be decreasing during a fall so a threshold of 75 cm is considered and this factor is considered true when the detected person height is less that 75 cm. 4. Rule4 - This rule is to check the posture of the person and is based on the height measure of the person detected. In the earlier works [10], they differentiated the standing and lying postures using a bounding box, whereas this system considers ratio of the height of the person in frame at time t to the mean height of the person. This threshold ranges between 0.4 and 0.45. So if the measure is less than 0.45 it is considered as fall otherwise as lying posture. 5. Rule5 - The other measure for posture check is falling speed which is concerned a scenario like a person can simply lay down without falling. To identify this, the
360
K. V. Sree and G. Jeyakumar
proposed system measures the speed at which the fall occurs. This is calculated as the time difference of person standing or sitting to the time when the person detected to lay down. If the time difference is less than a threshold T it is considered as fall otherwise lying posture. 6. Rule6 - This rule is to confirm the fall and generate alarm. In this rule, the system checks for any movement after a fall. If there is little movement or no movement it is considered as fall otherwise no fall. At the beginning of the video, the FSM of the system is in an initial state. If any of the rules from rule 1 to rule 5 is satisfied, the FSM moves to Fall_detection state. From the Fall_detection state, the system analyzes to find any change in posture using rules 6 and 7. If there is no change i.e., 6 and 7 rules are not satisfied the FSM moves to the initial state. Else, if satisfied it moves to posture_changed state. Then checks for rule 8, if rule 8 is satisfied the FSM moves to Fall_confirmation state and trigger that a fall occurred else moves back to Fall_detection state.
4 Results and Discussion The three datasets on which the proposed method is applied are URD, FDD and Multicam from [18, 19] and [20]. In URD dataset there are 70 videos in total out of which 30 are fall events and other 40 are regular events. In multicam dataset there are 24 videos in which 22 have fall events and rest 2 has non-fall events. In FDD dataset there are videos for fall and non-fall events in different environments such as office, lecture room, coffee room and home. There are 30 videos in each of these environments out of which 20 in each have fall and 10 in each are non-fall events. Table 1 represents the results obtained on applying the proposed approach on these datasets. Here TP represents the number of fall events identified as fall events and TN represents the number of non-fall events identified as non-fall events. FP represents number of non-fall events identified as fall and FN represents number of fall events identified as non-fall. Table 1. Recognition results Dataset URD Multicam FDD Office FDD Lecture room FDD Coffee room FDD Home
No. of events TP FP 70 30 2 24 21 0 30 15 2 30 14 0 30 15 0 30 15 0
TN 38 2 13 15 15 15
FN 0 1 0 1 0 0
In all the above five sets three measures of evaluation are calculated and Table 2 represents those results:
A Computer Vision Based Fall Detection Technique for Home Surveillance
361
Table 2. Performance of the system Dataset URD Multicam FDD Office FDD Lecture room FDD Coffee room FDD Home
No. of events Accuracy 70 97.14 24 95.83 30 93.33 30 96.67 30 100 30 100
Sensitivity 100 95.45 100 93.33 100 100
Specificity 95 100 86.67 100 100 100
The above results show that the proposed approach can detect a fall with an average accuracy of 97.16% and with sensitivity of 98.13% and a specificity of 96.45%. Some of the frames of fall detection are as shown in Fig. 3.
Fig. 3. Sample screenshots of fall detection
From the results it is proven that the proposed system gave an accuracy of 97.16% in detecting the fall even in different environments and with different camera positions. This approach excels in performance when compared with the existing approaches [9] and [10].
5 Conclusion and Future Work The advantage of the proposed method is its high accuracy in detecting the fall even with any environmental change or camera positions. This approach doesn’t need the person to carry any wearable device; it can solely be done based on a surveillance camera. Moreover, this is a cost-effective system as it doesn’t need any special devices to achieve the objective. Future enhancement of this system can be employing some deep learning or any optimization techniques so as to speed up the computation. An addition of an alert system specific to person and the venue for necessary immediate action would be a better enhancement of this system.
362
K. V. Sree and G. Jeyakumar
Compliance with Ethical Standards • All authors declare that there is no conflict of interest. • No humans/animals involved in this research work. • We have used our own data.
References 1. Den aldrandebefolknigen. Karin Modig (2013). http://ki.se/imm/den-aldrande-befolkningen 2. Büchele, G., Becker, C., Cameron, I.D., Köning, H.-H., Robinovitch, S., Rapp, K.: Epidemiology of falls in residential aged care: analysis of more than 75,000 falls from residents of Bavarian nursing homes. JAMDA 15(8), 559–563 (2014) 3. Hövding. Hövding den nyacyklehjälmen (2015). http://www.hovding.se/ 4. Tao, J., Turjo, M., Wong, M., Wang, M., Tan, Y.: Fall incident detection for intelligent video surveillance. In: Fifth International Conference on Information, Communication and Signal Processing (2005) 5. Luo, S., Hu, Q.: A dynamic motion pattern analysis approach to fall detection. In: IEEE International Workshop in Biomedical Circuit and Systems (2004) 6. Rougier, C., Meunier, J.: Demo: fall detection using 3D head trajectory extracted from a single camera video sequence. J. Telemed. Telecare 11(4), 7–9 (2018) 7. Vishwakarma, V., Mandal, C., Sural, S.: Automatic detection of human fall in video. In: Pattern Recognition and Machine Intelligence, PReMI. Lecture Notes in Computer Science, vol. 4815. Springer, Heidelberg (2007) 8. Nasution, A.H., Emmanuel, S.: Intelligent video surveillance for monitoring elderly in home environments. In: Proceedings of the IEEE 9th International Workshop on Multimedia Signal Processing (MMSP 2007), Crete, Greece, pp. 203–206, October (2007) 9. Rougier, C., Meunier, J., St-Arnaud, A., Rousseau, J.: Fall detection from human shape and motion history using video surveillance. In: Proceedings of the 21st International Conference on Advanced Information Networking and Applications Workshops (AINAW 2007), pp. 875–880, May (2007) 10. Meunier, J., St-Arnaud, A., Cleveert, D., Unterthiner, T., Povysiil, G., Hochreiter, S.: Rectified factor networks for biclustering of omics data. Bioinformatics 33(14), i59–i66 (2017) 11. Sreelakshmi, S., Vijai, A., Senthil Kumar, T.: Detection and segmentation of cluttered objects from texture cluttered scene. In: Proceedings of the International Conference on Soft Computing Systems, vol. 398, pp. 249–257. Springer (2016) 12. Broadley, R.W., Klenk, J., Thies, S.B., Kenney, L.P.J., Granat, M.H.: Methods for the realworld evaluation of fall detection technology: a scoping review. Sensors (Basel) 18(7), 2060 (2018) 13. Xu, T., Zhou, Y., Zhu, J.: New advances and challenges of fall detection systems: a survey. Appl. Sci. 8, 418 (2018). https://doi.org/10.3390/app8030418 14. Birku, Y., Agrawal, H.: Survey on fall detection systems. Int. J. Pure Appl. Math. 118(18), 2537–2543 (2018) 15. Mastorakis, G., Makris, D.: Fall detection system using kinect’s infrared sensor. J. RealTime Image Process. 9(4), 635–646 (2014)
A Computer Vision Based Fall Detection Technique for Home Surveillance
363
16. Krishna Kumar, P., Parameswaran, L.: A hybrid method for object identification and event detection in video. In: National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG), Jodhpur, India, pp. 1–4. IEEE Explore (2013) 17. Houacine, A., Zerrouki, N.: Combined curvelets and hidden Markov models for human fall detection. In: Multimedia Tools and Applications, pp. 1–20 (2017) 18. Auvinet, E., Rougier, C., Meunier, J., St-Arnaud, A., Rousseau, J.: Multiple cameras fall dataset. Technical report 1350, DIRO - Université de Montréal, July (2010) 19. Charfi, I., Mitéran, J., Dubois, J., Atriand, M., Tourki, R.: Optimised spatio-temporal descriptors for real-time fall detection: comparison of SVM and Adaboost based classification. J. Electron. Imaging (JEI) 22(4), 17 (2013) 20. Kwolek, B., Kepski, M.: Human fall detection on embedded platform using depth maps and wireless accelerometer. Comput. Methods Programs Biomed. 117(3), 489–501 (2014). ISSN 0169-2607
Prediction of Diabetes Mellitus Type-2 Using Machine Learning S. Apoorva(&), K. Aditya S(&), P. Snigdha, P. Darshini, and H. A. Sanjay Nitte Meenakshi Institute of Technology, Bangalore, India [email protected],[email protected], [email protected], {adityashastry.k,sanjay.ha}@nmit.ac.in
Abstract. Around 400 million people suffer from diabetes around the world. Diabetes prediction is challenging as it involves complex interactions or interdependencies between various human organs like eye, kidney, heart, etc. The machine learning (ML) algorithms provide an efficient way of predicting the diabetes. The objective of this work is to build a system using ML techniques for the accurate forecast of diabetes in a patient. The decision tree (DT) algorithms are well suited for this. In this work, we have applied the DT algorithm to forecast type 2 diabetes mellitus (T2DM). Extensive experiments were performed on the Pima Indian Diabetes Dataset (PIDD) obtained from the UCI machine learning repository. Based on the results, we observed that the decision tree was able to forecast accurately when compared to the SVM algorithm on the diabetes data. Keywords: Prediction
Machine learning Decision tree Diabetes
1 Introduction Diabetes mellitus can harm several parts of the human body. The pancreas can be affected due to diabetes and hence adequate insulin will not be created in the body. Diabetes is of two types [1]: • TYPE -1: In type-1 diabetes, the pancreatic cells that produce insulin are destroyed. Hence, insulin injections need to be given to the patients affected by type-1 along with restricted nutritional diets and regular blood tests. • TYPE 2: In diabetes type 2, the numerous body organs turn out to be resistant to insulin, and this increases the insulin demand. On this fact, pancreas doesn’t produce the required amount of insulin. The patients suffering from type-2 diabetes need to follow a strict diet, routine exercise along with regular blood glucose monitoring. The main reasons for this are obesity and absence of physical exercise [2]. The diabetics suffering from type-2 have Pre-Diabetes condition which is a disorder in which the levels of glucose in blood are complex than ordinary but not more than the diabetic patient [3]. Machine learning (ML) represents a field of artificial intelligence (AI) which enables systems to improve by experience by learning automatically devoid of being © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 364–370, 2020. https://doi.org/10.1007/978-3-030-37218-7_42
Prediction of Diabetes Mellitus Type-2 Using Machine Learning
365
programmed explicitly [4]. The ML techniques emphasize on program development that are capable of self-learning from data [5]. Here we are using different ML Algorithms which are powerful in both classification and predicting. This innovative projected study trails the different machine learning techniques to predict diabetes Mellitus at primary stage to save human life [6]. Such algorithms are SVM, Decision Tree to predict and raise the prediction accuracy and performance [7]. In the historical 30 years, with increasing number of diabetics, people have started to understand that this chronic disease has deeply wedged every family and everyone’s daily life. In direction to research the high-risk group of DMs, we need to operate progressive information technology. Therefore, machine learning algorithm is suitable method for us. Diabetes sustaining glycemic control remains a challenge [8]. Due to its endlessly increasing existence, numerous are adversely affected by diabetes mellitus. Most of the diabetic patients are unaware of the pre-diabetes risks which lead to the actual disease [9]. In this work, our main objective is to build a model centered on machine learning methods for forecasting T2DM. The core difficulties that we are trying to resolve are to advance the accurateness of the estimate model, and to make the model adaptive to more than one dataset. The main tasks related to our work are gathering of data, Pre-processing of data and development of prediction model. The benchmark datasets related to diabetes may be obtained from standard websites such as UCI, KEEL, and Kaggle etc. The main contributions of this work are emphasized below: • Identification of patients suffering from T2DM. • Presented a Decision Tree model for forecasting diabetes existing in pregnant women. • Studied the analytical model using structures that are associated to the diabetes type 2 risk aspects. The structuring of the paper is as follows. Section 2 describes the literature survey done on the different methods and techniques used to analyse the characteristics of diabetes type 2 and decide if it is a diabetes or not. Section 3 describes the Pima Indian dataset along with its input and output attributes. Section 4 describes the system design and demonstrates step by step procedure of the proposed system. The experimental results are demonstrated in Sect. 5. This is followed by the conclusion and future work.
2 Related Work This section deliberates on some of the recent work carried out for diabetes prediction using ML algorithms and probabilistic approaches. In [2], the authors used two classification algorithms namely decision tree and Support Vector Machine (SVM) to predict which algorithm generates accurate results. In the work [3, 4] WEKA tool was used for executing the algorithms. The experiments were accomplished on the PIDD from UCI [5] having 768 medical records & 8 attributes of female patients. The datasets are given to the WEKA software to predict if the person has diabetes or not using the classification algorithms. Results are achieved by means of internal cross-validation of
366
S. Apoorva et al.
10-folds. Accuracy, F-Measure, Recall, Precision, and ROC (Receiver Operating Curve) measures were utilized for model evaluation. It was observed by the authors that the decision tree demonstrated better accuracy than the SVM. In [6], the authors utilized decision tree classifier and Bayesian network to predict if the person is diabetic or non-diabetic. Here the author is using two phases one is training and the other one is testing phases. In training phase individually record is related with a class label. Training process groups the data into classes 0 and 1. In testing Phase, the testing of algorithm is done on unseen dataset. Experiments were performed on PIDD of UCI [6]. In pre-processing they are changing the data into distinct values depending upon the series of each attribute. Class variable are classifying into two classes (0 or 1) where ‘1’ indicates a positive test for diabetes and ‘0’ indicates a negative test. Training dataset is the input. On validating test records, decision tree performed better than Bayesian network. In [7], the authors utilized Decision tree classifier and “J48” algorithm existing in the WEKA software to predict if the person is diabetic or non-diabetic. A web based health center diabetes control system located in Tabriz was utilized as the data source. The features such as gender, systolic, age, BP, past family history of diabetes, and BMI were considered for experimentation. Among 60,010 records, 32,044 records were considered as they matched the 6 input feature values in the diagnosis field. The rest of the fields were all omitted. They have implemented 10-fold cross validation method to evaluate the model and calculate the Precision, Recall and Accuracy of the model. After the pre-processing of data and its subsequent preparation, a whole of 22,398 accounts were used for developing the decision tree.
3 Dataset Description The dataset is taken for prediction is collected from “Pima Indians Diabetes Database” of UCI [5]. This dataset was acquired from the National Institute on diseases related to kidney, diabetes and digestive systems. The datasets are used to forecast the occurrence of diabetes in the patient based on certain parameters present in the dataset. It consists of 768 records of patient data. 80% of the data is taken for training and the remaining 20% is taken for testing. The output attribute to be predicted is a variable which is diagnostic and binary that indicates the occurrence of diabetes symptoms in the patient according to the World Health Organization. This standard states that diabetes occurs in a patient if the plasma glucose level during post load is at least 200 mg/dl. The pima diabetes dataset consists of 9 input attributes which are: Number of times pregnant, concentration of plasma glucose, blood pressure (Diastolic) in terms of mm Hg, thickness of triceps skin fold in mm, serum insulin measured over a 2 h period, Body mass index (in kg/m2), function of the diabetes pedigree, age in years. The output attribute represents a binary class which is 1 if the person has diabetes and 0 otherwise. The diabetes dataset is split into two parts i.e. test and train data sets. – test.csv – 206 – train.csv – 562
Prediction of Diabetes Mellitus Type-2 Using Machine Learning
367
Training Data: It is used to train an algorithm. The training data comprises of both input data and the equivalent output. Based on data, algorithm like decision tree, support vector mechanism can be applied to produce accurate decisions. Test Data: Using test data we determine the result accuracy. It compares whether the outcome matches with the train data or not and if it matches then how much is the accuracy.
4 Proposed Work This section demonstrates the stepwise procedure followed to build the system. In this work, we have made use of the decision tree. A DT represents a support tool for making decisions by utilizing tree-like structure which demonstrates a series of decisions and their possible outcomes. They are supervised learning algorithms and can be utilized for solving the problems related to regression and classification. The chief aim of DT algorithm is to achieve accurate classification through least number of decisions. The overall design of the diabetes prediction system is depicted in Fig. 1.
Diabetes patient dataset
Apply classification algorithm (Decision Tree)
Predicted data sets
Performance Evaluation using train and test data
Analysis of accuracy of overall dataset
Fig. 1. Diabetes prediction analysis model
In this diabetes prediction analysis model, the diabetes dataset is collected from PIMA Indian Diabetes Dataset [5] for predicting diabetes. Then, the algorithm is used for the prediction by taking decision on the basis feature selection method which evaluates taking Glucose and BMI as its highest scorer in feature importance. After that, both the attributes are judged taking decision and performance evaluation is done. We used confusion matrix for the analysis of the performance which takes actual values and predicted values. Based on the confusion matrix technique, the complete accuracy is calculated for the prediction. Finally, there is an outcome file which presents the outcome predicting diabetes (i.e. 0 or 1). Here, 0 means person is not having diabetes and 1 means a person is having diabetes. The detailed design is depicted in Fig. 2 below. The above given flowchart, starts with taking datasets on which feature selection is performed. Based on feature selection, we calculate feature importance score for all the attributes and the maximum scorer is considered for the further decision. Decision tree classifier divides the attributes in Healthy and Diabetic and finally predicts the diabetes.
368
S. Apoorva et al.
Diabetes Dataset
Training Data
Test Data
Compute feature importance of each attribute
Select feature with highest score for splitting
No Stopping condition met?
Yes Trained Model
Fig. 2. Detailed design of diabetes prediction system using decision tree
The confusion matrix is utilized to estimate the performance of the DT classifier. The steps in the implementation are as follows: • The training phase involves the training the DT on the training data. The training dataset comprises of examples that are used for learning and is able to fit the input attributes of the PIDD dataset. Most techniques that are used to search through training data are for realistic relationships tend to over fit the data, meaning that they can recognize and exploit obvious relations in the training data that do not embrace in general. • Feature importance provides a score for every data attribute. A high score indicates that the feature is significant with respect to the output attribute. In this work, we observed that BMI and glucose features were possessing higher feature importance scores when compared to other features. Feature importance is computed using Gini index shown in Eq.-1 GI ðT Þ ¼ 1
Xn i¼1
p2j
ð1Þ
Where, T is a training data set containing samples from n classes, n is the number of classes and pj is the comparative frequency of class j in T. • The testing phase consists of evaluating the DT model on the test dataset. The test dataset possesses similar distribution of samples as the training records but is independent of it. A better fit of the training samples as opposed to the test instances
Prediction of Diabetes Mellitus Type-2 Using Machine Learning
369
leads to over-fitting. A test set is represents a set of instances which is used to evaluate the classifier performance. Cross-validation (CV) is a technique of resampling which is used to assess ML models on less number of data samples. The parameter ‘k’ in the CV procedure represents the number of data splits. Due to this, the method is often referred to as the k-fold cross-validation. Test data is the 20% of the whole data and remaining is the train data. Train data is further split into sub-parts by applying the 5-cross validation procedure.
5 Experimental Results This section describes the experimental results and comparison between decision tree and SVM for diabetes prediction. The experiments were conducted using SPYDER on windows 10 platform. Figure 3 depicts the comparison between SVM and DT classifiers for diabetes prediction.
75.06
Accuracy (in %)
100 80
50.44
60 40 20 0 SVM
Decision Tree Classifiers
Fig. 3. Comparison between DT and SVM
The above graph represents the prediction of SVM and DT. Here, the comparison is done between both the algorithms. The accuracy of SVM is 50.44% and that of DT is 75.06%. Since the accuracy is high in case of DT, DT algorithm is better than SVM for the prediction of diabetes.
370
S. Apoorva et al.
6 Conclusions and Future Scope In conclusion, we would like to state that this work deals with the forecast of T2DM using ML algorithms. A model is developed with the intention to predict whether the person is having diabetes or not. Our work applies the DT classifier to detect diabetes. The DT algorithm has been used in many areas due to their ability to solve diverse problems, ease of use, and the model-free property. The main objective is to construct a diabetes prediction model that forecasts the output value based on various input variables. In DT algorithm, every leaf signifies a target attribute value while the input variable values are characterized by the path from root to leaf. Our predicting system based on decision tree is able to learn and detect whether the person is having diabetes or not. The DT classifier was shown to have an improvement of 24.62% over the SVM classifier with regards to diabetes prediction. In future, we plan to provide a cloud based platform for generalized prediction of diseases. Compliance with Ethical Standards • All authors declare that there is no conflict of interest. • No humans/animals involved in this research work. • We have used our own data.
References 1. Ahmed, A.B.E.D., Elaraby, I.S.: Data mining: a prediction for student’s performance using classification method. World J. Comput. Appl. Technol. 2(2), 43–47 (2014) 2. Altaher, A., BaRukab, O.: Prediction of student’s academic performance based on adaptive neuro-fuzzy inference. Int. J. Comput. Sci. Netw. Secur. 17(1), 165 (2017) 3. Acharya, A., Sinha, D.: Early prediction of student performance using machine learning techniques. Int. J. Comput. Appl. 107(1), 37–43 (2014) 4. Kaur, P., Singh, M., Josan, G.S.: Classification and prediction based data mining algorithms to predict slow learners in education sector. In: 3rd International Conference on Recent Trends in Computing 2015 (ICRTC-2015) (2015) 5. Guruler, H., Istanbullu, A., Karahasan, M.: A new student performance analysing system using knowledge discovery in higher educational databases. Comput. Educ. 55(1), 247–254 (2010). https://doi.org/10.1016/j.compedu.2010.01.010. ISSN 0360-1315 6. Vandamme, J.-P., Meskens, N., Superby, J.-F.: Predicting academic performance by data mining methods. Educ. Econ. 15(4), 405–419 (2007). https://doi.org/10.1080/ 09645290701409939 7. Abuteir, M., El-Halees, A.: Mining educational data to ımprove students’ performance: a case study. Int. J. Inf. Commun. Technol. Res. 2, 140–146 (2012) 8. Baradwaj, B.K., Pal, S.: Mining educational data to analyze students performance. Int. J. Adv. Comput. Sci. Appl. 2(6), 63–69 (2011) 9. Shanmuga Priya, K., Senthil Kumar, A.V.: Improving the student’s performance using educational data mining. Int. J. Adv. Netw. Appl. 04(04), 1680–1685 (2013)
Vision Based Deep Learning Approach for Dynamic Indian Sign Language Recognition in Healthcare Aditya P. Uchil1(&), Smriti Jha2, and B. G. Sudha3 1
The International School Bangalore (TISB), Bangalore, India [email protected] 2 LNM Institute of Information Technology, Jaipur, India [email protected] 3 Xelerate, Bangalore, India [email protected]
Abstract. Healthcare for the deaf and dumb can be significantly improved with a Sign Language Recognition (SLR) system, which is capable of recognizing the medical terms. This paper is an effort to build such a system. SLR can be modelled as a video classification problem and here we have used a vision based deep learning approach. The dynamic nature of sign language poses additional challenge to the classification. This work explores the use of OpenPose with convolutional neural network (CNN) to recognize the sign language video sequences of 20 medical terms in Indian Sign Language (ISL) which are dynamic in nature. All the videos are recorded using common smartphone camera. The results show that even without the use of recurrent neural networks (RNN) to model the temporal information, the combined system works well with 85% accuracy. This eliminates the need for specialized camera with depth sensors or wearables. All the training and testing was done on CPU with the support of Google Colab environment. Keywords: Indian sign language recognition Convolutional neural network OpenPose Video classification Pose estimation Gesture recognition Medical terms Healthcare
1 Introduction India’s National Association of the Deaf estimates that 1.8 to 7 million people in India are deaf [1]. To assist the needs of the deaf and dumb community worldwide different sign language dictionaries have been developed with American Sign Language (ASL) being the most popular and common of all. A lot of work has been done for ASL recognition so that deaf people can easily communicate with others and others can easily interpret their signs. For Indian Sign language, recently researchers have started developing novel approaches which can handle static and dynamic gestures. Signers have to communicate their conditions routinely at health care setups (hospitals). Signs of words used for conveying in health care settings comprise of a continuous sequence of gestures and are hence termed as dynamic gestures. A gesture © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 371–383, 2020. https://doi.org/10.1007/978-3-030-37218-7_43
372
A. P. Uchil et al.
recognition system that recognises the dynamic gestures and classifies the sequence of gestures (video of sign) needs to be developed. As sensor hand gloves or motion tracking devices are expensive and not common in rural health care outlets, we have focused on developing a vision based solution for sign language recognition. Our work is the first attempt to recognize medical terms in ISL. This will involve video of the signer captured from a regular smartphone camera as input. This paper is organised as follows: The next section lists the related work done on Indian and other sign language recognition followed by the proposed methodology, results and discussion.
2 Related Works ISL Ansari and Harit [2] achieved an accuracy of 90.68% in recognizing 16 alphabets of ISL. Using Microsoft Kinect camera, they recorded images of 140 static signs and devised a combination strategy using SIFT (Scale-Invariant Feature Transform), SURF (Speeded Up Robust Features), SURF neural network, Viewpoint Feature Histogram (VFH) neural network that gave highest accuracy. Tewari and Srivastava [3] used 2D-Discrete Cosine Transformation (2D-DCT) and Kohonen Self Organizing Feature Map neural network for image compression and recognition of patterns in static signs. Their dataset consisted of static signs of ISL numerals (0 to 5) and 26 alphabets captured on a 16.0 MP digital camera (Samsung DV300F). Implemented in MATLAB R2010a, their recognition rate was 80% for 1000 training epochs with least time to process. Their approach did not take non-manual features (body, arm, facial expression) into account and included only hand gesture information making it suitable for static signs only. Singha and Das [4] worked on live video recognition of ISL. They considered 24 different alphabets for 20 people (total of 480 images) in the video sequences and attained a success rate of 96.25%. Eigen values and Eigen vector were considered as features. For classification they used Euclidean distance weighed on Eigen values. Kishore and Kumar [5] used a wavelet based video segmentation technique wherein a fuzzy inference system was trained on features obtained from Discrete wavelet transformation and Elliptical Fourier descriptors. To reduce feature vector size, they further processed the descriptor data using Principle component analysis (PCA). For the 80 signs dataset they achieved 96% accuracy. Tripathi et al. [6] used gradient based key frame extraction procedure to recognize a mixture of dynamic and static gestures. They created a database of ten sentences using a Canon EOS camera with 18 mega pixels, 29 frames per second. Feature extraction using Orientation Histogram with Principal Component Analysis was followed by classification based on Correlation and Euclidean distance. Dour and Sharma [7] acquired the input from an i-ball C8.0 web camera at a resolution of 1024 768 pixels and used adapted version of fuzzy inference systems (Sugeno-type and Adaptive-Neuro Fuzzy) to recognize the gestures. 26 signs from 5 different volunteers and 100 samples of each gesture were used for training purpose. So a total of around 2611 training samples was obtained. The classification of the input sign gesture was done by a voting scheme according to the clusters formed. The system was able to recognize for all alphabets. Most of the
Vision Based Deep Learning Approach for Dynamic Indian SLR in Healthcare
373
misclassified samples correspond to the gestures that are similar to each other like letters E and F. Rao and Kishore [8] used a smart phone front camera (5 M pixel) to generate a database of 18 signs in continuous sign language by 10 different signers. Minimum Distance (MD) and Artificial Neural Network (ANN) classifiers were used to train the feature space. The average word matching score (WMS) of this selfie recognition system was around 85.58% for MD and 90% for ANN with minute difference of 0.3 s in classification times. They found that ANN outperformed MDC by an upward 5% of WMS on changing the train and test data. Other Sign Languages Camgoz et al. [9] approached the problem as a neural machine translation task. The dataset presented “RWTH-PHOENIX-Weather 2014T” covered a vocabulary of 2887 different German words translated from 1066 different signs by 9 signers. Using sequence-to-sequence (seq2seq) learning method, the system learnt the spatio-temporal features, language model of the signs and their mapping to spoken language. They used different tokenization methods like state-of-the-art RNN-HMM (Hidden Markov Model) hybrids and tried to model conditional probability using attention-based encoder-decoders with CNNs. Konstantinidis et al. [10] took the LSA64 dataset of Argentinian sign language consisting of 3200 RGB videos of 64 different signs by 10 subjects. For uniform video length, they processed the gesture video sequences to compose of 48 frames each. They extracted features using a pretrained ImageNet VGG19 network. This network up toconv4_4 was used for hand skeleton detection and initial 10 layers of this network for detecting body skeleton. They concluded that non manual features like body skeletal joints contain important information that when combined with hand skeletal joints information gave a 3.25% increase in recognition rate. Their proposed methodology classified both one-handed and two-handed signs of the LSA64 dataset and outperformed all variants by around 65% irrespective of the type of classifier or features they employed. Mahmoud et al. [11] extracted local features from each frame of the sign videos and created a Bag of features (BoFs) using clustering to recognize 23 words. Each image was represented by a histogram of visual words to generate a code book. They had further designed a two-phase classification model wherein the BoFs was used to identify postures and a Bag of Postures (BoPs) was used to recognize signs. Hassan et al. [12] used Polhemus G4 motion tracker and a camera to generate two datasets of 40 Arabic sentences. Features were extracted using window-based statistical features and 2D-DCT transformation and classified using three approaches: modified KNN (MKNN) and two different HMM toolkits (RASR and GT2 K). Polhemus G4 motion tracker measured hand position and orientation whereas the DG5-V Hand data gloves measured the hand position and configuration. They found that inspite of more precision in sensor-based data, vision-based data achieved greater accuracy. Most of the approaches listed above either used • Specialized camera like Microsoft Kinect which measures the depth information or motion tracker camera or • Wearable like hand gloves with sensors and • Extensive Image processing to extract and handcraft the features • CNN + RNN for temporal information which is computationally expensive • Static signs
374
A. P. Uchil et al.
Our proposed method models the problem based on computer vision and deep learning thus representing the sign language recognition as that of sequence of image classification. InceptionV3 model and OpenPose [13] are used to classify the image frames individually. A voting scheme is then applied which classifies the video based on the maximum vote for a particular category. The following are the contributions towards this paper: • Created the medical dataset comprising of 80 videos of 19 dynamic and 1 static sign for medical terms each performed by two different volunteers. • Studied the effect of using OpenPose with CNN to improve the accuracy of recognition of medical signs directly from RGB videos without specialized hardware or hand gloves and extensive image processing or computationally intensive RNN. Table 1 shows the list of medical terms used for our study. Except the word ‘chest’, all the other words are dynamic in nature. Table 1. List of medical terms comprising our dataset 1. Allergy 2. Old 3. Heart Attack 4. Chest 5. Exercise 6. Bandage 7. Throat 8. Blood pressure 9. Pain 10. Choking
11. 12. 13. 14. 15. 16. 17. 18. 19. 20.
Common Cold Cough Diabetes Ear Eye Chicken Pox Forehead Injection Mumps Blood
3 Proposed Methodology The sign language recognition system can be modelled as a video classification problem and can be built using different approaches as mentioned in [14]. Both spatial and temporal information are used in most of the approaches for the classification to give the desired output. But the use of RNN for storing temporal information is a bottleneck as it is more computationally expensive and requires more storage. The two articles [15] and [16] support the same and it is proved that CNN is all that we need. Also researchers have now shifted focus to attention-based network and multi-task learning which formed the basis of OpenPose architecture. OpenPose [13] has proven its ability to identify human poses with face and hand key points which are very crucial for sign language recognition. Hence we have leveraged the power of CNN and OpenPose for recognising dynamic signs.
Vision Based Deep Learning Approach for Dynamic Indian SLR in Healthcare
375
Algorithm: Training and Testing using OpenPose and CNN Input: Sign Video Output: Label for the Sign Recognised Training 1. For each of the classes in training set 2. For each of the videos in class folders do (a) Pass the RGB video to OpenPose framework to get the video with stick figure embedded. (b) Extract the frames from OpenPose videos and write the OpenPose frames to the images folder under the class folder; (c) Extract frames for each of the RGB videos and add it to the images folder (d) Feed the Inception V3 model with all the images in all the class folders. (e) Save the model built Testing 1. 2. 3. 4.
Repeat steps 1 to 2 (b) of the training set for the test set Feed the learnt CNN model with all the images in the test set. Get the prediction of each of the frames for a particular video as a list Use majority voting scheme on the list above to get the class prediction of the video. Advantages of our proposed method:
• Specialized hardware like Kinect cameras that capture depth information is not required. A common smartphone camera is sufficient. • All the models are trained and tested on CPU and hence no GPU is required. OpenPose is run online on Colab environment with GPU enabled and hence does not require GPU system. • Previous works focus mainly on signs limited to the wrist whereas our system takes both manual (hand) and non-manual features (arm, body, face) of the sign into account. This is useful for recognition of sign language wherein complexity of the signs is greater. • Removed the feature engineering part. Experimental Setup The sign language recognition system was implemented on an Intel Corei5 CPU (1.80 GHz 2) with 8 GB RAM. The system ran macOS Mojave (version 10.14.3). Lenovo-Intel Corei3 – UCPU with 4 GB RAM running windows7 was also used. The programming language used for implementing the system was Python. OpenPose library was used for keypoint detection from the sign language videos. OpenCV was used for handling data and pre-processing. The recognition system took 10 min on the MacOS and 18 min on the windows system to complete training on the sign language dataset.
376
A. P. Uchil et al.
3.1
Data Acquisition
The sign videos are recorded in a controlled environment by a 12 MP smartphone front camera having frame resolution of 640 480 pixels captured at the rate of 30 fps. In the process initially we collected 20 signs of medical terms from Indian sign language. The dataset has been selected from the ISL general, technical and medical dictionaries. Indian Sign Language Research and Training Institute (ISLRTC), New Delhi, [17] has been taken as the standard source for Indian sign language. So, all the videos are performed based on observation from the ISLRTC YouTube channel. The ISLRTC videos were lengthier of the order of 30 s for every word on an average which is very long compared to other sign languages. For example, the Argentinian sign language [18] has words with video lengths of the order of 1 or 2 s. Hence we took only that portion of the video which essentially means the sign performed. So we have fixed 3 s for the signs. Inorder to standardise and simplify processing, the signs are performed with a uniform white background at a constant distance of 1 m right in front of the camera. The signers wore black coloured clothes with sleeves to make sure that only the hands and face part are visible. Each sign is repeated 2–3 times depending on the complexity of the gesture. Due to different recording environments the lighting conditions vary slightly from subject to subject. The dataset hence forth referred to as ‘ISL Medical Dataset’, consists of 1 static and 19 dynamic signs handpicked from ISL. All volunteers were non-native to sign language. The videos were recorded for a period of three seconds consistently for all the signs. The frame rate of the videos captured was 30 fps. Another Research Institute extensively working on building ISL dictionary – the Ramakrishna Mission Vivekananda Educational and Research Institute (RKMVERI), Coimbatore Campus has their ISL videos listed in [19]. The ISL videos given by RKMVERI were also used as reference by the signers. 3.2
Data Pre-processing
(i) Preparation of training set The data pre-processing stage is also pictorially represented in Fig. 1. The video acquired above was fed to OpenPose. OpenPose requires GPU for processing. Hence Google Colab environment which is freely available for academic use [20] was used to get the pose estimation of hands and face keypoints. Thus we get an equivalent video with the keypoints marked in the video. Each video consisted of 90 frames at a frame rate of 30 fps. This key points video was fed to OpenCV for frame extraction to extract every 5th frame. Thus we get a total of 18 frames. Usually the first and last few fractions of a second do not contain useful information. The frames are then cropped on the sides to ensure that the signer is prominent in all the frames and also to reduce the dimensionality for training. Hence the first two and last two frames were omitted from the training set. Thus a total of 14 image frames for each video sequence comprise our training set. Thus for 20 classes, there were 1120 image frames. To study the effect of using OpenPose with CNN, the training set was prepared in two phases. In the first phase, the original videos without OpenPose pose estimations formed our training set. In the second phase, the training frames were extracted from
Vision Based Deep Learning Approach for Dynamic Indian SLR in Healthcare
377
Fig. 1. Overall architecture of the proposed CNN+OpenPose system
the output of OpenPose pose estimation. Testing was done after each of these training phases. The results are shown later in the results section. (ii) Preparation of test set To test the performance of our gesture recognition approach we collected videos of each sign from the official YouTube channel of ISLRTC, New Delhi (Indian Sign Language Research and Training Centre). The signs performed by signers involved repetition of the same gesture multiple times in a video to clearly explain how the sign is performed. The videos also involved the signer explaining the context of the sign so each sign language reference video was observed to be around 30–40 s which was too lengthy for our model to detect features and recognise the gestures. Consequently, we clipped the videos to display the signer performing the key gesture once or twice at maximum and obtained our shortened test videos which were fed for frame extraction. 3.3
Fine Tuning Inception V3 Model
The frames extracted from the two phases of pre-processing with and without OpenPose were fed to the Inception V3 model with all other layers frozen except for the top layer. Hence fine tuning of the model was done using our training data for all the twenty categories. The number of training steps was 500 with a learning rate of 0.01. The cross entropy results and accuracy of the training set is shown is Figs. 2 and 3 with OpenPose frames.
378
A. P. Uchil et al.
Fig. 2. Cross entropy results of training
Fig. 3. Training accuracy of the model
Cross entropy was used for calculating loss and from Fig. 2 we can observe that the loss reduced when approaching 500 steps in training. The training accuracy with the augmentation of OpenPose frames was 99% as shown in Fig. 3. 3.4
Classification of Sign Video
(i) Prediction of individual frames The test frames from ISLRTC videos were fed to the trained/learnt model to classify each frame individually. The probability score of each frame is given as output. This is done by using the features learnt from the CNN model. The features are
Fig. 4. Data flow pipeline for prediction of class label
Vision Based Deep Learning Approach for Dynamic Indian SLR in Healthcare
379
represented in Fig. 4 as stick figures from OpenPose embedded in the RGB frames. Thus these feature embedded frames give a body part mapping which is very useful in matching the frame to a corresponding frame with similar pose estimation in training set. The same is explained in Fig. 4. (ii) Prediction of sign video The predictions of each of the 14 frames are given as input to a majority voting scheme which give the predicted label of the video sequence based on the probability scores of individual frames. The formula for majority voting is given in Eq. (1). y ¼ mode CðX1 Þ; CðX2 Þ; CðX3 Þ; . . . :; CðX14Þ
ð1Þ
where ‘y’ is the class label of the sign video X1, X2, … X14 represent the frames extracted for a single sign video and C(X1), C(X2), … C(X14) represent the class label predicted for each frame. The Fig. 4 shows an example of bandage class where the OpenPose video is fed to the model for testing. The individual frames are extracted and classified individually. Both OpenPose and RGB frames are combined for training. As it can be seen, all the frames are classified as ‘bandage’ class with a probability score of 0.75 on an average. Hence, the test video or the sign is recognised as ‘bandage’ with high confidence.
4 Results and Discussion The test results given in Table 2 show that without OpenPose key points, the CNN fails to recognise the class correctly. But with OpenPose, the results are amazing, given the large difference in probability score as seen below for the allergy class from 23.09 for old (wrong class) to allergy (correct class) with 87.81 for frame f5. Table 2. Improvement in results with and without OpenPose trained samples
380
A. P. Uchil et al.
Table 3 shows the test results with exclusive RGB and exclusive OpenPose frames and the combined set of both RGB and OpenPose frames for training. It can be readily seen that when only RGB images are taken for building the model, the features from the CNN on RGB images were not able to model the data accurately and it fails on 12 of the 20 classes. The training set with only OpenPose images did a decent job of classifying more that 50% of the frames correctly. For example, for the Blood Pressure sign the model trained with only OpenPose images classified 10 frames correctly of the total 14 frames in test images. The misclassification reduced further and when both the OpenPose and RGB images were given for training. Thus it was able to classify 13 out of 14 frames correctly as shown (OP + RGB training) below in Table 3. Table 3. Pure RGB vs Pure OpenPose vs Hybrid results comparison
Thus our final training set had a total of 2240 frames with the augmentation of OpenPose frames. The inception v3 model pretrained on our RGB image data used for retraining the model. The test results were positive with five more classes correctly identified. Table 4 shows the test results of the top 6 classes for which it performed extremely well. As it can be seen, for most of these classes, except for two or three frames, our model labelled all the frames correctly with increased probability. Table 4. Successful cases (top 6)
Table 5 shows the failure cases when the system failed. This can be attributed to the absence of key points for the ear, forehead and neck. For example, the ‘EAR’ class gets confused with the ‘Exercise’ class because the hands go up and there are no key points for the ear. OpenPose frames does not prove to be useful for such cases and hence the result. Similarly, the ‘Forehead’ class gets confused with exercise class for the same reason. The ‘Mumps’ class has one hand near the shoulders similar to the ‘Pain’ class and hence most of the predictions come out to be ‘Pain’.
Vision Based Deep Learning Approach for Dynamic Indian SLR in Healthcare
381
Table 5. Failure cases (Misclassified classes)
The test results show that of the 20 medical sign videos, 17 were classified correctly and the above three classes were misclassified. Thus our system has an overall accuracy of 85%. The confusion matrix for the test results is shown in Fig. 5 below. The misclassification rate can be reduced if more keypoints are included as features. Also training set can be increased to give more variations in the data.
A - Allergy B - Old C - Heart Attack D - Chest E - Exercise F - Bandage G -Throat H - Blood pressure I - Pain J – Choking
K - Common Cold L - Cough M - Diabetes N - Ear O - Eye P - Chicken Pox Q - Forehead R - Injection S - Mumps T - Blood
Fig. 5. Test Accuracy of the proposed system shown as confusion matrix
Limitations • Video classification is not performed in real time. • Our system depends on OpenPose for improved accuracy. Future Enhancements Currently our work involves classification of 20 medical terms (19 dynamic and 1 static). The training set can be increased with more videos performed by different signers. The Bag of visual features model can be used to classify each of the patches separately for every frame to capture the local information along with a global classifier. A mobile application can be developed to classify the video in real-time.
382
A. P. Uchil et al.
5 Conclusion Real time recognition of Indian sign Language in a healthcare setting is important and crucial for the improvement in quality of life for the deaf and dumb. But to realise this in a resource limited clinic is a challenge. We have addressed this challenge with minimal system requirements so that the computational and storage requirements are not a bottleneck. If this system could be made available in a smartphone, the applications are endless from condition monitoring to assisting the disabled. We have shown that OpenPose with CNN is all we need for a sign language recognition system. Compliance with Ethical Standards. ✓ All authors declare that there is no conflict of interest ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. National Public Radio (US). https://www.npr.org/sections/goatsandsoda/2018/01/14/575921716/ a-mom-fights-to-get-an-education-for-her-deaf-daughters 2. Ansari, Z.A., Gaurav, H.: Nearest neighbour classification of Indian sign language gestures using kinect camera. Sadhana 41(2), 161–182 (2016) 3. Tewari, D., Srivastava, S.: A visual recognition of static hand gestures in indian sign language based on kohonen self-organizing map algorithm. Int. J. Eng. Adv. Technol. (IJEAT) 2(2), 165–170 (2012) 4. Singha, J., Das, K.: Recognition of Indian sign language in live video. arXiv preprint: arXiv: 1306.1301 (2013) 5. Kishore, P.V.V., Rajesh Kumar, P.: A video based Indian sign language recognition system (INSLR) using wavelet transform and fuzzy logic. Int. J. Eng. Technol. 4(5), 537 (2012) 6. Tripathi, K., Nandi, B.N.G.C.: Continuous Indian sign language gesture recognition and sentence formation. Procedia Comput. Sci. 54, 523–531 (2015) 7. Dour, S., Sharma, M.M.: Review of literature for the development of Indian sign language recognition system. Int. J. Comput. Appl. 132(5), 27–34 (2015) 8. Rao, G.A., Kishore, P.V.V.: Selfie video based continuous Indian sign language recognition system. Ain Shams Eng. J. 9(4), 1929–1939 (2018) 9. Cihan Camgoz, N., Hadfield, S., Koller, O., Ney, H., Bowden, R.: Neural sign language translation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7784–7793 (2018) 10. Konstantinidis, D., Dimitropoulos, K., Daras, P.: Sign language recognition based on hand and body skeletal data. In: 2018-3DTV-Conference: The True Vision-Capture, Transmission and Display of 3D Video (3DTV-CON), pp. 1–4. IEEE (2018) 11. Zaki, M.M., Shaheen, S.I.: Sign language recognition using a combination of new vision based features. Pattern Recogn. Lett. 32(4), 572–577 (2011) 12. El-Soud, A.M., Hassan, A.E., Kandil, M.S., Shohieb, S.M.: A proposed web based framework e-learning and dictionary system for deaf Arab students. In: IJECS2828, 106401 (2010)
Vision Based Deep Learning Approach for Dynamic Indian SLR in Healthcare
383
13. Cao, Z., Hidalgo, G., Simon, T., Wei, S., Sheikh, Y.: OpenPose: real time multi-person 2D pose estimation using Part Affinity Fields. arXiv preprint: arXiv:1812.08008 (2018) 14. Five video classification methods implemented in Keras and Tensorflow. https://blog.coast. ai/five-video-classification-methods-implemented-in-keras-and-tensorflow-99cad29cc0b5 15. Chen, Q., Wu, R.: CNN is all you need. arXiv preprint: arXiv:1712.09662 (2017) 16. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Li, F.-F.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014) 17. Youtube ISLRTC, New Delhi. https://www.youtube.com/channel/UC3AcGIlqVI4nJWCw HgHFXtg/videos?disable_polymer=1 18. Ronchetti, F., Quiroga, F., Estrebou, C.E., Lanzarini, L.C., Rosete, A.: LSA64: an Argentinian sign language dataset. In: XXII CongresoArgentino de Ciencias de la Computación (CACIC 2016) (2016) 19. Indian Sign Language, FDMSE, RKMVERI, CBE (Deemed University). http://www. indiansignlanguage.org/indian-sign-language/ 20. Openpose Colab Jupyter notebook. https://colab.research.google.com/github/tugstugi/dlcolabnotebooks/blob/master/notebooks/OpenPose.ipynb
A Two-Phase Image Classification Approach with Very Less Data B. N. Deepankan(&) and Ritu Agarwal Department of Information Technology, Delhi Technological University, Delhi, India {bn.deepankan,ritu.jeea}@gmail.com
Abstract. Usually, neural networks training requires large datasets. In this paper, we have proposed a two-phase learning approach that differentiates the multiple image datasets through transfer learning via pre-trained Convolutional Neural Network. Initially, images are automatically segmented with the Fully connected network to allow localization of the subject through creating a mask around it. Then, the features are extracted through a pre-trained CNN in order to train a dense network for classification. Later, those saved weights of the dense network are loaded to train the model once again by fine-tuning it with a pretrained CNN. Finally, we have evaluated our proposed method on a well-known dog breed dataset, and bird species dataset. The experimental results outclass the earlier methods and achieve an accuracy of 95% to 97% for classifying these datasets. Keywords: Batch normalization Transfer learning
Deep learning Image segmentation
1 Introduction Image classification means classifying the subjects using images. It is performed on a daily basis for differentiating the traffic signals or in choosing apple from oranges. Although the process seems to be an easy task for an individual, but the machine cannot classify on the go. However, with the latest advancements in deep learning techniques, training the machine for accurate classification is made possible. Though learning a machine for efficient classification can be conducted, it will require a huge dataset to do so. However, for some problems sufficient data are not available to train the model precisely. In those situations, data augmentation methods such as rotation, scaling and inverting the available images on the dataset will increase the classification accuracy. Although, excessive training can lead to the overfitting of data, which can cause some inaccurate classifications. Unlike simple image classification between cats from dogs, differentiating between similar objects remains as a difficult task due to the variety of matching shapes, colour, texture and appearance. Furthermore, images of dog or birds usually contain similar surrounding such as grass, leaves etc. For my experiment, I have considered a dataset containing 8,351 images of 133 dog breeds [1], another dataset with 6,033 images of 200 bird species [2]. These datasets contain images from similar background moreover, © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 384–394, 2020. https://doi.org/10.1007/978-3-030-37218-7_44
A Two-Phase Image Classification Approach with Very Less Data
385
each type is comparable in shape, size, and light. However, manual classification is possible by a trained professional but it can be a tedious task for a pedestrian. Hence, learning a machine for performing such classification can reduce the hassle involved. There are many techniques through which features can be extracted from the image which can be used for classification purposes. It involves scale-invariant feature transform (SIFT) [3], speeded up robust features [4] etc. However, it cannot achieve higher accuracy of classification due to the wide range of images belonging to similar classes as well as overhead complexity involved. This occurs because the features extracted from these techniques are similar due to comparable shapes and background of images which can become a cumbersome task to segregate the classes. In this work, we have used the latest advancement in the deep learning methods to tackle the image classification problem with very fewer data. In our approach, we have performed localization of the subject through a fully connected network which removes the background through minimum bounding box around the object. Dataset of segmented images is used for extracting features from pre-trained CNN which trains the dense network for differentiating between classes. Later, those saved weights are loaded to the same dense network which is kept on top of the convolutional base of pretrained CNN for further training. Our proposed method is divided into two phases: phase one consist of feature extraction through the pre-trained network and training the dense network. The second phase involves loading those saved weights fine-tuning it with pre-trained CNN. We have trained our network twice to converge it towards for better classification. Moreover, we have given methodologies to overcome the problem for overfitting while converging it rapidly. The experimental results show that the proposed technique outclasses the earlier methods and achieves an accuracy of 95% for bird species dataset while 97% for dog breed dataset. The rest of the paper is structured as follows: Sect. 2 gives a brief background about the recent work on image classification and segmentation approaches. Section 3 will provide a detailed explanation of our approach. The experimental results of our approach are shown in Sect. 4. We then conclude our work in Sect. 5.
2 Related Work In the following section, we will give a brief description of the image segmentation and classification applications. We then present related background work which addresses the image classification and image segmentation task. 2.1
Image Segmentation
There are a variety of ways has been proposed for image segmentation which can perform binary segmentation: 0 for background and 1 for the object region(s). One of which is YOLO [5] (You only look once) algorithm, for object classification and localization using bounding boxes. It consists of 24 layers of a single CNN network which is followed by a dense network at the end. However, to reduce the training period we can opt for fast YOLO (9 layers of a single CNN network instead of 24).
386
B. N. Deepankan and R. Agarwal
Though the data availability, there may be a possibility of fewer training samples to train the model effectively for segmentation. U-Net [6] is one of the methods to train with fewer annotated labels, but they can only process 2D images. V-Net [7] can perform 3D segmentation based on volumetric, fully convolutional, neural network. Furthermore, various other methods are proposed which uses CNN to do the object localization process such as Mask-RCNN [8]. Overall, all of the mentioned methods use the concept of the fully convolutional network (FCN) for semantic segmentation. 2.2
Image Classification
There are several methods have been proposed to classify dog breed and bird species datasets. The majority of the proposed techniques have used machine learning based methods. Like the work in [9] which coarsely classify dog breeds in 5 groups and for each group principal component analysis (PCA) is used as a feature extractor. The extracted features are compared with the feature template which was derived while classifying the dog images into groups. The image is classified as the breed which gives the minimum distance between two feature vectors. In [10], they model the classification of a dog as a manifold which uses facial shapes and their associated geometry. They have applied Grassmann manifold to represent the geometry of different dog breeds. After feature extraction, SVM [11] categorizes the features to the corresponding category. An improved dog breed classification is proposed [12] from click-through logs and pre-trained models. In this approach, more dog images are mined through the web with dog-related keywords for gathering more information. After that, important information is derived using the pretrained model (DCNN) by keeping related neurons in the last layers. Later, those features are classified through a dense neural network. To classify among different bird species several methods have been proposed, in [13] the authors have extracted textural content of spectrogram images. These features are extracted through three different techniques of digital image processing: Gabor features [14], Local Binary Pattern (LBP) [15] and Local Phase Quantization (LPQ) [16]. After that, SVM classifier is used to classify these textural content and the results were taken as 10 fold-cross validation. Though the basic feature extraction methods can provide valuable information for categorizing the images. Although, it cannot achieve higher classification accuracy in comparison with deep learning techniques. However, hybrid of techniques were used to classify between bird species as shown in work [17] using decision tree and SVM classifier. In addition, the classification of bird species is performed through bird voices [18] using CNN.
3 Proposed Approach We have conducted a two-phase experiment in which the best weights of the dense network are saved in the first phase. Later, in the second phase, those saved weights are loaded to the dense network for a fine-tuned network with pre-trained CNN.
A Two-Phase Image Classification Approach with Very Less Data
3.1
387
Pre-trained CNN
Pre-trained networks consist of trained weights according to a particular data set. If a model is built from scratch, then each synapse is initially assigned with random weights. However, after hours of training through a training dataset, it may converge in order to perform the desired task. After that, the accuracy can be measured through testing via blind dataset. Later those weights are saved for different problems through fine-tuning which can reduce the overhead involved in training a model from scratch. In our approach, we have used the VGG 16 network [19] which is a pre-trained convolutional neural network trained on the ImageNet database. It consists of 5 blocks of the convolutional network each with max-pooling layers which are followed by a dense network, as visualized in Fig. 3. It has a total of 14 million trainable parameters with weights assigned as per the ImageNet database. 3.2
First Phase of Image Classification
The first phase follows the process as shown in Fig. 1 in which the bird image dataset is passed to the Fully Connected Network [20] for localization of the object through creating a mask around it. This operation is performed to remove the background features which can cause the network to learn unwanted features. Ultimately, it can cause the model to classify inaccurately. The segmented image dataset is given to the VGG-16 model for feature extraction which extracts bottleneck features. Later, those significant bottleneck features are used to train the dense network. The Dense network is a two-layer perceptron, the first Dense layer has 256 feature maps with Relu activation function. Batch normalization [21] is conducted on the data presented by the first layer of the dense network to normalize the data such that the mean of the data becomes 0 and the standard deviation equals to 1. Basically, training a neural network through normalized inputs helps it to learn. The idea is that, instead of just normalizing the inputs to the network, we normalize the inputs to layers within the network. The normalized output is transferred to the second dense layer with 20 feature maps with softmax activation function for determining the probability of the bird species. Also, 50% of the network weights are dropped for each step so the model doesn’t succumb to the overfitting problem. The architecture is similar for Dog breed classification, however, there is a slight difference in the dense network implemented for categorizing the dog breeds as shown in Fig. 2. It consists of a single layer dense network with 133 feature maps with softmax activation function. In dog breed classification we haven’t used Batch normalization because it didn’t provide a rise in classification accuracy. Furthermore, those extracted features are used to train the dense network for image classification until it converges or it doesn’t improve for five epochs. The best weights associated with the dense network is saved for fine-tuning purposes on the second phase of the experiment.
388
B. N. Deepankan and R. Agarwal
Fig. 1. First phase of proposed approach for bird classification [7, 9]
3.3
Second Phase of Image Classification
In the second phase of bird classification as shown in Fig. 3, the dense layer is loaded with the best-saved weights from the first phase of the experiment. Also, the last layer of the VGG 16 network remained trainable while the other layers were frozen. Since the whole network has a large entropic capacity and could lead to the problem of overfitting. The features learned by the lower blocks are more abstract than the toplayers of the convolutional block. So, it is sensible to keep the last layers of the model trainable while keeping top layers untrainable. Finally, I have trained the network once again by rescaling the image through data augmentation. For dog classification, it is the same architecture as shown above but the dense network is according to Fig. 2.
A Two-Phase Image Classification Approach with Very Less Data
389
Fig. 2. First phase of proposed approach for dog classification [6, 8]
4 Experimental Results We have conducted the experimental results on two datasets. First, we have conducted the operation on 20 different bird species, where we have a total of 1115 images in which train set consist of 789 images while validation set contains 267 images and 59 images in the test set. However, for each type of species, there are around 49 train images which are very much less number of images to train our model efficiently. Table 1 shows the difference between various methods in terms of accuracy, with or without image segmentation for the bird image dataset. However, the dense network is the same as shown in Fig. 1 for each method. Table 2 provides a comparison between the methods using batch normalization on segmented image dataset. The methods are shown below has the same neural network model discussed above, although the batch normalization layer is skipped in some to get the perspective of using that layer. Second, The data set used for dog classification consists of 133 different dog breeds of the total 8351 images. The train set consist of 6680 images while the validation set
390
B. N. Deepankan and R. Agarwal
Fig. 3. Second phase of the proposed approach for bird classification Table 1. Bird species image classification Method VGG-16 pre-trained network network VGG-16 pre-trained network network VGG-19 pre-trained network network VGG-19 pre-trained network network Proposed Two-phase method
with dense
Image segmentation (Y/N) N
Accuracy (%) 49.23
with dense
Y
54.62
with dense
N
44.43
with dense
Y
46.15
Y
94.89
Table 2. The difference between using Batch Normalization for Bird classification Method VGG-16 pre-trained network with dense network VGG-19 pre-trained network with dense network Proposed Two-phase method
Batch normalization (Y/N) N
Accuracy (%) 50.00
N
42.31
N
89.84
consist of 835 images and the test set contains 836 images. Table 3 shows a comparison between various methods with or without image segmentation.
A Two-Phase Image Classification Approach with Very Less Data
391
Table 3. Dog breed Image classification Method VGG-16 pre-trained network with dense network VGG-16 pre-trained network with dense network ResNet50 pre-trained network with dense network ResNet50 pre-trained network with dense network Proposed Two-phase method
Image segmentation (Y/N) N
Accuracy (%) 74.61
Y
86.34
N
82.46
Y
88.95
Y
97.12
Table 4 shows the difference between not using the Batch normalization layer on a segmented image dataset for dog breed classification. Table 4. The difference between using Batch Normalization for Dog classification Method VGG-16 pre-trained network with dense network ResNet50 pre-trained network with dense network Proposed Two-phase method
Batch normalization (Y/N) N
Accuracy (%) 77.27
N
84.26
N
93.87
The bird species model with a batch size of 10 is trained for 25 epochs and enabled with early stopping after 5 epochs. If the model doesn’t improve for 5 epochs then the model stops training to avoid the overfitting problem. The dog breed model with a batch size of 20 is trained for 20 epochs while has a similar criteria of early stopping. Some of the experimental results of the proposed method are shown in Figs. 4 and 5.
5 Conclusion Classification between similar images is a difficult task for an algorithm. However, by using our proposed deep learning technique, we have achieved the classification accuracy of 94.89% for 20 various types of bird species, while we got an accuracy of 97.12% for classifying between 133 different types of dog breeds. Although, many other methods use hand-crafted features, the proposed deeplearning approach learns the unique features through the convolutional neural network. Convolutional image classification techniques use a plethora of features extracted from the images with the aim of improving classification performance [12]. In this paper, I
392
B. N. Deepankan and R. Agarwal
Fig. 4. Some of the experimental results for Bird species classification [5]
Fig. 5. Some of the experimental results of dog breed classification [8]
A Two-Phase Image Classification Approach with Very Less Data
393
have discussed the methods involved in fine-tuning a pre-trained neural network to efficiently classify the model to categorize the images with higher accuracy. Also, describing an image segmentation technique to remove the background from the classifying subject. The experimental results performed on bird dataset and dog dataset work significantly better than other models to classify among different species and breeds which are similar in shape, luminance and having a complex background. The proposed methodology also states a novel method to train the model proficiently, making it robust against erroneous images consisting of multiple subjects. Furthermore, I have used very little data to train the model to give high accuracy as shown on experimental results. The proposed two-phase algorithm trained twice with the same dataset through data augmentation technique which transforms the image to prevent overfitting. The CNN also learns to the rotation variant information for categorizing the images efficiently. The proposed algorithm can be extended in classifying between medical images due to the scarcity of sufficient images in training a neural network efficiently. In addition, the applicability of my algorithm on several kinds of classification problems which are similar to the bird and dog classification. Compliance with Ethical Standards. ✓ All authors declare that there is no conflict of interest ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Khosla, A., Jayadevaprakash, N., Yao, B., Li, F.-F.: Novel dataset for fine-grained image categorization. In: First Workshop on Fine-Grained Visual Categorization (FGVC), IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2011) 2. Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: CaltechUCSD Birds 200. California Institute of Technology. CNS-TR-2010-001 (2010) 3. Geng, C., Jiang, X.: SIFT features for face recognition. In: 2009 2nd IEEE International Conference on Computer Science and Information Technology, Beijing, pp. 598–602 (2009) 4. Zhou, D., Hu, D.: A robust object tracking algorithm based on SURF. In: 2013 International Conference on Wireless Communications and Signal Processing, Hangzhou, pp. 1–5 (2013) 5. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, pp. 779–788 (2016) 6. Medical Image Computing and Computer-Assisted Intervention (MICCAI). LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). arXiv:1505.04597 7. Milletari, F., Navab, N., Ahmadi, S.: V-Net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, pp. 565–571 (2016) 8. He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell.
394
B. N. Deepankan and R. Agarwal
9. Chanvichitkul, M., Kumhom, P., Chamnongthai, K.: Face recognition based dog breed classification using coarse-to-fine concept and PCA. In: 2007 Asia-Pacific Conference on Communications, Bangkok, pp. 25–29 (2007) 10. Wang, X., Ly, V., Sorensen, S., Kambhamettu, C.: Dog breed classification via landmarks. In: 2014 IEEE International Conference on Image Processing (ICIP), Paris, pp. 5237–5241 (2014) 11. Hearst, M.A., Dumais, S.T., Osuna, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE Intell. Syst. Appl. 13(4), 18–28 (1998) 12. Xie, G., Yang, K., Bai, Y., Shang, M., Rui, Y., Lai, J.: Improve dog recognition by mining more information from both click-through logs and pre-trained models. In: 2016 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), Seattle, WA, pp. 1– 4 (2016) 13. Lucio, D.R., Maldonado, Y., da Costa, G.: Bird species classification using spectrograms. In: 2015 Latin American Computing Conference (CLEI), Arequipa, pp. 1–11 (2015) 14. Kamarainen, J.: Gabor features in image analysis. In: 2012 3rd International Conference on Image Processing Theory, Tools and Applications (IPTA), Istanbul, pp. 13–14 (2012) 15. Meena, K., Suruliandi, A.: Local binary patterns and its variants for face recognition. In: 2011 International Conference on Recent Trends in Information Technology (ICRTIT), Chennai, Tamil Nadu, pp. 782–786 (2011) 16. Zhai, Y., Gan, J., Zeng, J., Xu, Y.: Disguised face recognition via local phase quantization plus geometry coverage. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, pp. 2332–2336 (2013) 17. Qiao, B., Zhou, Z., Yang, H., Cao, J.: Bird species recognition based on SVM classifier and decision tree. In: 2017 First International Conference on Electronics Instrumentation & Information Systems (EIIS), Harbin, pp. 1–4 (2017) 18. Narasimhan, R., Fern, X.Z., Raich, R.: Simultaneous segmentation and classification of bird song using CNN. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, pp. 146–150 (2017) 19. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 20. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, pp. 3431–3440 (2015) 21. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015)
A Survey on Various Shadow Detection and Removal Methods P. C. Nikkil Kumar(&) and P. Malathi Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India [email protected], [email protected]
Abstract. Shadows plays an inevitable part in an image and also the major source of hindrance in Computer Vision analysis. Shadow detection is the performance enhancement process that increases the accuracy of the Computer Vision algorithms like Object Tracking, Object Recognition, Image Segmentation, Surveillance, Scene Analysis, Stereo, Tracking, etc. Shadows limit the stability of these algorithms, and hence detecting shadows and its elimination are profound pre-processing techniques for improving execution of Vision algorithms efficiently. To label the challenges under various environmental conditions, researches have been carried out to develop various algorithms and techniques for shadow detection. This paper objective is to bring comparative analysis of shadow detection techniques with their pros and cons. Keywords: Shadow processing
Shadow detection Image enhancement Image
1 Introduction Shadows are the repeatedly occurring disturbances to many computer vision algorithms which decreases its efficiency in problem solving and detection of object mapping. So in order to overcome this effect many shadow detection algorithms were introduced. Different algorithms works best for different conditions like Indoor and Outdoor lightening. This paper has discussed different algorithms their pros and cons and their efficiency to use in compressive conditions. However detecting false shadows and scattered shadows is quite demanding task. Generally shadows are formed when a light source is blocked from the source light. Cast and self-shadows are two types of shadows. Shadows occurred by illumination of objects by direct light are called cast shadows. Penumbra and Umbra are the two types of cast shadows. Detection of shadows is a complex task in Penumbra and easy in Umbra. Shadows occurred which objects are not exposed to direct light are self-shadows. In this paper focuses on interpreting 13 different Shadow detection and Removal techniques with their main motive and pros and cons. These shadow detection methods are very accurate in various conditions like Outdoor and Indoor.
© Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 395–401, 2020. https://doi.org/10.1007/978-3-030-37218-7_45
396
P. C. Nikkil Kumar and P. Malathi
2 Shadow Detection and Removal Methods This paper explains comparative analysis of shadow detection techniques with their pros and cons. The comparative analysis of various Shadow detection and Removal methods are shown in Table 1. From Table 1, it shows that the comparative results of different shadow detection and removal, it is clear that for different methods and algorithm produces different results and output. For Illumination recovery method [1] a novel approached has been used for single images and aerial images, the input image is decomposed to overlay patches with respect to shadow distribution. Illumination recovery operator is build using correspondence between shadow patch and illuminator patch. It is simple and effective and can produce rich texture image as output. In Multiple ConvNets [2] an automatic framework is used to detect and remove shadow in real world scenes, it has ability to learn features at super-pixel level. For shadow extraction Bayesian formulation is used. Here a removal framework is used to remove original recovered image from shadow matte. For Automatic soft shadow detection method [3] Merging the histogram splitting of two data peaks with resultant image to achieve smooth detection results. Main advantage is that it can rebuild even objects in shadow frame. This method can reinstate even objects in High-resolution shadow images. For 3D intensity surface modeling [4] smoothing filter is used to erase texture information using a technique called image decomposition. 3D modeling helps to obtain the proper intensity of the surface and illumination in shadow areas. In Kernel Least-Squares Support Vector Machine (LSSVM) [5] main operation is predicting the label of each region and LSSVM technique helps to separate nonshadow regions and shadow regions. Markov Random Field (MRF) is a framework useful for performance enhancement. For ESRT model [6] first step is to convert image to HSV and 26 parameters were taken. Shadows are removed using Morphological processing, it has better detection rate and accuracy. For Local Color Constancy computation [7] the illumination match is done using random field model which is contained in color shade descriptors and it also produces coherent shadow fields. It uses anisotropic diffusion for each image pixel in shadow to determine illuminant narrowly. Tricolor Attenuation Model [8] it has ability to extract shadows for complex outdoor scenes but for one image at a time. This method is more useful in video sequences and complex outdoor images. Near-Infrared Information [9] normally built-in sensitivity of digital camera sensors are higher, this method uses it detect shadows automatically. Moreover sensitivity is low in near-infrared (NIR). Ratio Edge [10] geometric heuristics and Intensity constraint are used. These constraints are used to maximize the execution. This method used to detect moving shadows actively. It is very effective and useful in detecting moving shadows.
Kernel LeastSquares Support Vector Machine (LSSVM) ESRT (enhanced streaming
5
6
4
Automatic soft shadow detection method 3D intensity surface modeling
It is better in terms of accuracy and detection rate
1. HSV image with 26 Parameters are taken for measurement of the image
To envision the identification of each region, LSSVM is used for dividing shadow and non-shadow regions
Shadow region is obtained by Surface intensity illumination technique and 3D modeling technique Markov Random Field (MRF) framework is used to improve the performance of region classifier
Image intensity surface with local smoothness Pattern is retained by edgepreserving filter
The ability to restore uniform objects in shadow areas
Shadow image is processed to high quality texture enriched with non-uniform shadows For extracting shadow matte and removing shadows Bayesian formulation is widely used
Constructing an optimized illumination recovery operator
Illumination recovering optimization method Convolutional deep neural networks (ConvNets)
Features are trained at higher level using super-pixel and these trained model is send to conditional random model to produce a polished mask of shadow Merging the splitting histogram of two data peaks with resultant image achieves smooth detection results
Advantages
Key idea
Method
3
2
Sr. no 1
Salman H. Khan, Mohammed Bennamoun
Ye Zhang, Shu Tian, Yiming Yan, and Xinyuan Miao Kai He, Rui Zhen, Jiaxing Yan, and Yunfeng Ge Tomas F. Yago Vicente, Minh Hoai, and Dimitris Samaras
Not very effective for outdoor conditions
This method includes two level processes for shadow removal It can deal only with cast shadows
Less accurate to normal pixels than core pixels of shadow regions
Vicky Nair, Parimala Geetha Kosal Ram, (continued)
Ling Zhang, Qing Zhang, and Chunxia Xiao
It cannot effectively process video sequences as processing an image
Processing stage of algorithm is large
Author
Disadvantages
Table 1. Comparative analysis of various shadow detection and method
A Survey on Various Shadow Detection and Removal Methods 397
Ratio Edge
Stereo photography
10
11
9
Tricolor Attenuation Model Near-Infrared Information
8
Algorithm involves foreground segmentation, shadow mask, detection of a shadow candidate area, and deshadow
A novel method has been introduced for moving cast shadows detection
A model is formed by using image formation theory and tricolor attenuation model Building a shadow map using pixel image based on visibility
2. The algorithm with trained dataset is used for segmentation 3. Removal of shadow is performed through morphological processing Color shade descriptors are used in condition random field model. This model is able to find the illumination pairs and coherent shadow regions
random tree) model
Local Color Constancy Computation
Key idea
Method
7
Sr. no
An improved local color constancy is used for computations This has capability to find illuminant at each pixel image in shadow It extracts shadows even for image with Complex outdoor scenes Automatically detect shadows with maximum accuracy and by using sensitivity of digital camera sensors Intensity constraint and geometric heuristics are used to enhance the performance Solves shadow problems caused by flash photography
when compared with Bayesian classifier
Advantages
Table 1. (continued)
Sang Jae Nam, and Nasser Kehtarnavaz
Can detect only for flash shadows images
(continued)
Jiandong Tian, Jing Sun, and Yandong Tang Dominic Rüfenacht, Clément Fredembach, Sabine Süsstrunk Wei Zhang, Xiang Zhong Fang, Xiaokang K. Yang
Xingsheng Yuan, Marc Ebner, Zhengzhi Wang
Sundaravadivelu Sundararaman
Author
Operation can only be used to single image at a time Simultaneously capturing visible and NIR information is not possible by current photography It can only detect cast shadows
It doesn’t work with images having even shadow surfaces
Disadvantages
398 P. C. Nikkil Kumar and P. Malathi
13
Sr. no 12
Shadow and background uses same color tune value with different intensity values
Moving Cast shadow detection is introduced and merged into segmentation algorithm
Object segmentation
Color and Edge
Key idea
Method
Author Jurgen Stauder, Roland Mech, and Jorn Ostermann Maryam Golchin et al.
Disadvantages Mainly effective for video sequences
It takes more time for computation. Fails when shadow and background intensity is same
Advantages Detection of cast shadows is through integrating the temporal regions of background objects Well founded technique for colored images
Table 1. (continued)
A Survey on Various Shadow Detection and Removal Methods 399
400
P. C. Nikkil Kumar and P. Malathi
Stereo photography [11] this technique reduces the shadows occurring during flash photography. It can detect only for flash shadow images. Object segmentation [12] detecting the cast shadows using temporal integration of covered background regions. The change detection method is employed in detecting object mask from shadow background. Color Based [13] Shadow and background uses same color tune value with different intensity values. Well founded technique for colored images. It takes more time for computation. It fails when shadow and background intensity is same.
3 Conclusion This paper aim at different comparative analyses of above mentioned shadow detection and removal techniques. There are 13 shadow detection techniques are discussed in this paper. For each technique different analysis of method and their corresponding pros and cons are discussed. It can be conclude after final analysis that there is no such the best Algorithm. So the choice of foremost algorithm is based on dataset feature and Conditions like Outdoor and Indoor sequence. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Zhang, L., Zhang, Q., Xiao, C.: Shadow remover: image shadow removal based on illumination recovering optimization. IEEE Trans. Image Process. 24(11), 4623–4636 (2015) 2. Khan, S.H., Bennamoun, M., Sohel, F., Togneri, R.: Automatic shadow detection and removal from a single image. IEEE Trans. Pattern Anal. Mach. Intell. 38(3), 431–446 (2016) 3. Su, N., Zhang, Y., Tian, S., Yan, Y., Miao, X.: Shadow detection and removal for occluded object information recovery in urban high-resolution panchromatic satellite images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 9(6), 2568–2582 (2016) 4. He, K., Zhen, R., Yan, J., Ge, Y.: Single-image shadow removal using 3D intensity surface modeling. IEEE Trans. Image Process. 26(12), 6046–6060 (2017) 5. Vicente, T.F.Y., Hoai, M., Samaras, D.: Leave-one-out kernel optimization for shadow detection and removal. IEEE Trans. Pattern Anal. Mach. Intell. 40(3), 682–695 (2018) 6. Nair, V., Kosal Ram, P.G., Sundararaman, S.: Shadow detection and removal from images using machine learning and morphological operations. J. Eng. 2019(1), 11–18 (2019) 7. Yuan, X., Ebner, M., Wang, Z.: Single-image shadow detection and removal using local colour constancy computation. IET Image Process. 9(2), 118–126 (2015). https://doi.org/10. 1049/iet-ipr.2014.0242 8. Tian, J., Sun, J., Tang, Y.: Tricolor attenuation model for shadow detection. IEEE Trans. Image Process. 18(10), 2355–2363 (2009)
A Survey on Various Shadow Detection and Removal Methods
401
9. Rüfenacht, D., Fredembach, C., Süsstrunk, S.: Automatic and accurate shadow detection using near-infrared information. IEEE Trans. Pattern Anal. Mach. Intell. 36(8), 1672–1678 (2014) 10. Zhang, W., Fang, X.Z., Yang, X.K., Wu, Q.M.J.: Moving cast shadows detection using ratio edge. IEEE Trans. Multimed. 9(6), 1202–1214 (2017) 11. Nam, S.J., Kehtarnavaz, N.: Flash shadow detection and removal in stereo photography. IEEE Trans. Consum. Electron. 58(2), 205–211 (2012) 12. Stander, J., Mech, R., Ostermann, J.: Detection of moving cast shadows for object segmentation. IEEE Trans. Multimed. 1(1), 65–76 (1999). https://doi.org/10.1109/6046. 748172 13. Golchin, M., Khalid, F., Abdullah, L., Davarpanah, S.H.: Shadow detection using color and edge information. J. Comput. Sci. 9, 1575–1588 (2013). https://doi.org/10.3844/jcssp.2013. 1575.1588
Medical Image Registration Using Landmark Registration Technique and Fusion R. Revathy, S. Venkata Achyuth Kumar(&), V. Vijay Bhaskar Reddy, and V. Bhavana Department of Electronics and Communication Engineering, Amrita School of Engineering, Bengaluru, Amrita Vishwa Vidyapeetham, Bengaluru, India [email protected], [email protected]
Abstract. The process of image registration and fusion is used to obtain a resultant image with better characteristics. The concept of image registration and fusion can be used in various domains such as remote sensing, satellite imaging, medical imaging, etc. But, as the smallest of errors can have great consequences in medical images, applying these concepts in medical imaging will make a great difference. Aligning images spatially into one single framework is attained through image registration. There are various methods or techniques to implement image registration. Landmark registration, a semi-automatic registration method is used where images are spatially aligned by selecting control points from both the images and applying geometric transformation that helps in translation, rotation, scaling and shearing. During fusion process intermixing of traits from source images is performed using Discrete Wavelet Transform and Bicubic interpolation. The resultant image helps the doctors for better clinical diagnosis. Keywords: Image fusion Image registration Landmark registration Discrete wavelet transform Bicubic interpolation
1 Introduction A set of finite values called pixels are used to represent a 2D image and is known as a digital image. Devices like computers are used to manipulate these images, known as Digital Image Processing. Several challenges are faced during monitoring and analysing the medical images. Medical imaging has different modalities and it is broadly classified into two categories namely anatomical and functional imaging. Anatomical imaging such as MRI (Magnetic Resonance Imaging) gives anatomic structure of the body, which helps to determine the bone structures, internal abnormalities, etc. Functional imaging such as PET (Positron Emission Tomography) gives physiological information which helps to analyse the change of metabolism, blood flow, etc. During PET or MRI scan, distortions in signal can arise due to several factors such as variation in breathing levels of the patient that can have impacts on the scans. To eradicate errors as well as to enhance the features of medical images, the following process is implemented. The first step in the process of integration is bringing different modalities into a spatial alignment called image registration [7]. © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 402–412, 2020. https://doi.org/10.1007/978-3-030-37218-7_46
Medical Image Registration Using Landmark Registration Technique and Fusion
403
Image registration is followed by image fusion [1] which is used to display the integrated data from multiple input images. In this work functional image (PET) and anatomical image (MRI). Image registration is the process of finding best transformation method and aligning the images as per the region or structure of interest. In this work, landmark based approach is used for image registration. Two sets of points have been loaded; one as fixed and the other as moving. Moving image points are compared to the fixed image points and are adjusted relatively to bring them to similar orientation in spatial domain. Then, the converted images are used for multimodal image fusion. When fusion is performed, the characteristics from more than two images are combined to produce a single image frame with better information [9]. Image fusion is technique to qualitatively and quantitatively increase the image quality of imaging features which makes fusion of medical image more efficient. Medical image fusion contains a broad range of techniques to address medical issues reflected through images of human body, organs, and cells. In this work, image fusion is performed through 4 level Discrete Wavelet Transform (DWT) [1] and Bi-cubic interpolation [3]. DWT decomposition [10] is performed to acquire lower and higher frequency coefficients. Bicubic interpolation is an advancement of cubic interpolation [3] in which the weighted average of the nearest sixteen neighbours are taken and the new pixel value is obtained. Fusion rules [6] have been used to fuse the corresponding coefficients at each level and which gives new coefficients. Inverse discrete wavelet transform (IDWT) is then performed on the new coefficients which is followed by color conversion that results in a final fused image.
2 Impact of Image Registration on Clinical Diagnosis Image registration has got a huge impact on medical image diagnosis. During diagnosis various medical images provide different information such as any issue to deal with bone is found using CT scan and issues related to the structure of tissue which include blood vessels and muscles is found using MRI whereas the ultrasound image deals with problems regarding luminal structures and organs. When medical images are in form of a digital images it is easier for doctors to analyse such images and thus digital image processing and development has a great significance in getting a better image that make medical diagnosis easier. Registration is important as during the process of medical diagnosis there may be a need of correlating two or more images, which may be in different spatial framework, in order to bring them in the same spatial orientation image processing is used. This ensure more efficiency and precision during the diagnosis.
3 Literature Review Different transform techniques such as Discrete Wavelet Transform, Discrete Cosine Transform have been used for wavelet decomposition for fusing medical images. Discrete wavelet Transform gives better results [6] than Discrete Cosine Transform in terms of performance metrics. Fusion rules like mean, maximum, minimum, based on
404
R. Revathy et al.
contrast, based on energy for corresponding detailed and approximate coefficients. For approximation and detail coefficients, mean and maximum rules respectively provides better enhanced image. Table 1 shows that pros and cons of different image registration tools or graphical user interface tools. Table 1. Pros and Cons of image registration tools Tool OsiriX
FreeSurfer
ANTS
Mango
Pros OsiriX is compatible for both Desktop i.e. OsiriX MD and iPhone and iPad i.e. OsiriX HD Platforms. OsiriX is one of the best medical Image viewer or GUI uses 64 bit computing and multi-threading for better performance FreeSurfer is one of the open source software for better analysis and performance of MRI images of a human brain. It works on both Linux and Mac OS ANTS which stands for Advanced Normalization Tools are helpful in extracting information from the complex data sets which includes imaging
Mango is a non-commercial software for viewing, analysing and editing the volumetric medical images. It is compatible with MacOS, Windows and Linux. Mango, Imago, Papaya are some user interface software’s works on desktop, iPhone or iPad and browser respectively
Cons OsiriX is compatible only with the latest Mac OS available and it is not compatible with Windows and Linux. A minimum of 6 GB RAM is required for the processing and a restriction of 128 bit registers is put for all operations that are complex An additional third party software has to be installed for the working of Free Surfer. As it is not fully automated human interaction is necessary and only few images formats a supported Though it is very handy and functional, it doesn’t produces equivalent results. ANTS data servers render unpredicted databases that are bound by RAM limits, for large user loads the performance is significantly degraded Imago in particular requires an iPad with iOS 5.0 or higher, whereas papaya is a medical viewer developed using Java script and it is an online tool
4 Proposed Technique 4.1
Pre-processing
Now a day, the most widely used images are digital images, and sensors such as CCD (Charge Coupled Device) is used to obtain these images. The principle behind medical imaging is when a light source (ex. X-Ray source) provides illumination, the sensors arranged opposite to the source collects the energy that passes through the human body. This is the basis for medical computer axial tomography (CAT). MRI and PET which are the source images also comes under CAT principle. The images thus acquired may
Medical Image Registration Using Landmark Registration Technique and Fusion
405
not be of the same size, thus image scaling which is the process of resizing (i.e. altering the dimension of the image) is used to bring it to a desirable size. These images then undergo filtering which includes smoothing, sharpening, edges detection and denoising for better image enhancement [3]. Histogram helps in analysing and finding the noise is present in an image. 4.2
Image Registration
Image Registration [7] is a crucial step in medical imaging because it helps to align different medical images that are taken at different places or different situations or different angles or from different sensors. Image registration helps physicians to analyse the changes or growth of disease in single image instead of multiple images of a patient at different stages. Image Registration is widely used for many anatomical organs of the human body like brain, retina, chest/lungs etc. Out of the two source images, one will be designated as fixed image or reference image or target image and other as moving image or sense image. Figure 1 represents the block diagram of image registration. At a time, one source image act as reference image and the other source image acts as moving image. The Landmark registration contains of four steps. The first step is the resizing (to make both the images to be of same size) [8], after pre-processing the identification of features from both the fixed and source images. Second, the user needs to select the points corresponding from both the moving and fixed images and then pair the feature points from both the images. Third step is the crucial step for selecting the transformation model like affine or similarity, transformation calculation and finally application of transformation to one of the image set to align to the other image.
Fig. 1. Registration methodology
4.3
Image Fusion
The images obtained after landmark registration were passed through the proposed fusion process as shown in Fig. 2. Initially the images were gone through segmentation [1] to obtain low and high frequency components by passing through low pass and high
406
R. Revathy et al.
pass filters respectively. DWT [7] provides flexible multi resolution analysis for the image and preserves the important information. In Discrete Wavelet Transform, the mother wavelet is broken down into smaller blocks called wavelets. A four level DWT decomposition as shown in Fig. 4 has been applied on low and high frequency components individually. The decomposition can be done using different wavelet filters. Some of them are Haar, Daubechies, and Coiflet. A Comparison is made using different wavelet filters such as coiflet Daubechies and Haar. Haar has been used for further process. The DWT coefficients that are obtained are used in further stages.
Fig. 2. Proposed image fusion technique
Discrete set of child wavelets cjk will be computed from mother wavelet uj;k ðtÞ, where k and j are shift and scale parameters respectively. 1 t k2 j uj;k ðtÞ ¼ pffiffiffiffiffi u 2j 2j Z 1 t k2 j ffiffiffiffi ffi p x ð t Þ dt u cjk ¼ 1 1 2j 2j
ð1Þ ð2Þ
The 4 level DWT process is shown in Fig. 3. To obtain LL, LH, HL, and HH coefficients of an image first level DWT is performed. The DWT is applied for the low frequency coefficients as it has the maximum amount of information. So, DWT is applied for the LL coefficient obtained from first level DWT, then LL2, LH2, HL2, and HH2 coefficients are obtained. The process of decomposition continues for two more times to obtain 4 level DWT coefficients. The DWT coefficients that are obtained after 4 level DWT [1] are interpolated using Bi-Cubic Interpolation [3], where aij represents sixteen nearest neighbour coefficients. Fusion rules like Maximum and Mean fusion rules [2] are applied. Maximum fusion rule is applied for the higher frequency
Medical Image Registration Using Landmark Registration Technique and Fusion
407
coefficients for each level of decomposition. Every pixel value of low frequency coefficients of both the images are compared and the resultant pixel value will be the maximum of the two pixel values. The Mean fusion rule is applied for the lower frequency coefficients for each level of decomposition. Every pixel value of low frequency coefficients of both the images are compared and the result pixel value will be the average of the two pixel values. By comparing different fusion rules, it is concluded that maximum fusion rule can preserve the edges and applying the mean fusion rule can preserve the approximations. gðx; yÞ ¼
3 X 3 X
aij xi y j
ð3Þ
i¼0 j¼0
The corresponding coefficients of 4 level DWT are fused based on the Mean fusion rule and the other corresponding coefficients are fused based on the Maximum fusion rule [6].
Fig. 3. Four level discrete wavelet transform
The Mean fusion rule is used for lower coefficients because they will have maximum details and maximum fusion rule for higher coefficients because it contains vertical, horizontal and diagonal edges [5]. Then inverse DWT is applied to the fused coefficients for the reconstruction of the image [7]. The final image obtained after this process is the fused image that has the characteristics of both PET and MRI source images.
5 Algorithm for the Proposed Innovation 1. Two images, MRI image as moving and PET image as reference image are considered for registration to obtain MRI registered on PET. 2. The images are preprocessed [11, 12] by resizing, smoothening, sharpening and filtering to obtain the high and low frequency components. 3. Landmark registration can be achieved by locating and loading the points of interest for the images.
408
R. Revathy et al.
4. The image thus formed by mapping the corresponding points is the registered image using landmark registration. 5. To obtain PET registered on MRI, the landmark registration is performed with the moving image as PET and the reference image as MRI image. 6. The two registered images after landmark registration are processed through proposed fusion methodology. 7. The registered images are performed with the 4 level DWT and followed by BiCubic interpolation. 8. The obtained coefficients are fused using the fusion techniques. 9. IDWT is performed on fused coefficients to obtain the final fused image.
6 Results and Analysis The proposed methodology has been implemented and verified on various performance metrics like Mutual Information (MI), Peak signal to Noise Ratio (PSNR), Mean Square Error (MSE), and Standard Deviation (STD). Mean Square Error (MSE): MSE for an M * N image is the cumulative squared error between the compressed image and the original MSE ¼
1 XM1 XN1 ½Iði; jÞ Kði; jÞ2 i¼0 j¼0 MN
ð4Þ
where I and K represents original and noisy image. Peak Signal to Noise Ratio (PSNR): PSNR is the comparative measure between highest possible signal power and the power of noise due to corruption in the signal. PSNR ¼ 10: log10
MAXI2 MSE
ð5Þ
where MAXI represents the highest possible pixel value of the image. Entropy: Entropy is the measure of information content in an image, which is the average of the uncertainty in information source. H¼
XL i¼1
Pi logðPi Þ
ð6Þ
where Pi represents the probability of occurrence of pixel value in an image and L represents the number of gray levels of an image. Standard Deviation (STD): STD is a measure that is used to calculate the amount of variation in a data set.
Medical Image Registration Using Landmark Registration Technique and Fusion
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi XM XN 1 STD ¼ ðxði; jÞ xmean Þ2 Þ i¼1 j¼1 MN 1
409
ð7Þ
where x (i, j) is the pixel value of an image and xmean is the mean of all pixel values in an M * N image. The images taken are MRI image (Fig. 4) with the spatial resolution of 256 * 256 * 3 and the PET image (Fig. 5) with the spatial resolution of 256 * 256 * 3.
Fig. 4. MRI image [2]
Fig. 5. PET image [2]
Figure 6 is the fused image using only proposed fusion Technique and Fig. 7 represents fused image using both proposed registration and fusion methodology.
Fig. 6. Fused image using only fusion methodology [3]
Fig. 7. Fused image using registration and fusion [4]
The images taken are MRI image (Fig. 8) with the spatial resolution of 300 * 300 * 3 and the PET image (Fig. 9) with the spatial resolution of 300 * 300 * 3. Both the images has been scaled to 256 * 256 * 3 for better analysis.
Fig. 8. MRI image
Fig. 9. PET image
Figure 10 is the fused image using only proposed fusion Technique and Fig. 11 represents fused image using both proposed registration and fusion methodology.
410
R. Revathy et al.
Fig. 10. Fused image using only fusion methodology
Fig. 11. Fused image using registration and fusion
Input MRI entropy ¼ 2:6981 Input PET entropy ¼ 2:4391
MRI STD ¼ 78:5347 PET STD ¼ 83:1191
From Table 2 i.e. fusion without registration, when compared with the results obtained the output entropy is maximum for the Haar wavelet, MSE is minimum, PSNR is maximum, STD is maximum and M.I is minimum for the Haar wavelet. Table 2. Performance indices for the fused image for dataset 1 Wavelets Coif1 Db2 Haar
Output Entropy M.I M.S.E PSNR 3.4624 1.6748 0.0110 67.6510 3.3254 1.8118 0.0093 67.9810 3.5017 1.6355 0.0087 68.7356
STD 89.3098 89.9247 90.2457
Table 3. Performance indices of fusion wıth registratıon for data set 1 Wavelets Coif1 Db2 Haar
Output Entropy M.I M.S.E PSNR 3.6904 1.4468 0.0108 67.7965 3.7924 1.3465 0.0084 68.8880 3.8140 1.3232 0.0070 69.6798
STD 92.7447 93.0524 93.7143
From Table 3 i.e. fusion with registration, when compared with the results obtained the output entropy is maximum for the Haar wavelet, MSE is minimum, PSNR is maximum, STD is maximum and M.I is minimum for the Haar wavelet. MRI entropy ¼ 6:4919 PET entropy ¼ 6:3934
MRI STD ¼ 45:1057 PET STD ¼ 70:972
Table 4. Performance indices for the fused image for data set 2 Wavelets Coif1 Db2 Haar
Output entropy M.I M.S.E PSNR 6.8512 5.98118 0.0078 69.3217 6.8485 6.0368 0.0077 69.2659 6.8611 6.0242 0.0074 69.4384
STD 53.3037 53.9885 55.6311
Medical Image Registration Using Landmark Registration Technique and Fusion
411
From Table 4 fusion without registration, when compared with the results obtained the output entropy is maximum for the Haar wavelet, MSE is minimum, PSNR is maximum, STD is maximum and M.I is minimum for the Haar wavelet. Table 5. Fusion with registration results on different wavelet fılters Wavelets Coif1 Db2 Haar
Output entropy M.I M.S.E PSNR 7.1452 5.7401 0.0075 69.3801 7.0906 5.7947 0.0076 69.3226 7.2833 5.6020 0.0069 69.7488
STD 51.3893 50.3267 49.3889
From Table 5 i.e. fusion with registration, when compared with the results obtained the output entropy is maximum for the Haar wavelet, MSE is minimum, PSNR is maximum, and M.I is minimum for the Haar wavelet.
7 Conclusion After applying the proposed methodology on the acquired MRI and PET images and comparing the performance evaluation factors such as M.I, M.S.E, PSNR and STD values, it is concluded that this methodology gives better results in terms of higher entropy, effective mutual information, lower Mean Square Error, higher Peak Signal to Noise Ratio and higher Standard Deviation. The resultant image proved to be the one with more information and less error, if implemented in real time would help doctors in clinical diagnosis immensely in analysing the diseases more precisely. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Bhavana, V., Krishnappa, H.K.: Multi-modality medical image fusion using discrete wavelet transform. Procedia Comput. Sci. 70, 625–631 (2015). Open Access 2. Madhuri, G., Hima Bindu, C.H.: Performance evaluation of multi-focus image fusion techniques. In: 2015 International Conference on Computing and Network Communications (CoCoNet 2015), Trivandrum, India, 16–19 December 2015 3. Sekar, K., Duraisamy, V., Remimol, A.M.: An approach of image scaling using DWT and bicubic interpolation. In: Proceedings of the IEEE International Conference on Green Computing, Communication and Electrical Engineering, ICGCCEE 2014 (2014). https://doi. org/10.1109/ICGCCEE.2014.6922406
412
R. Revathy et al.
4. Haribabu, M., Hima Bindu, C.H.: Visibility based multi modal medical image fusion with DWT. In: IEEE International Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI 2017) (2017) 5. Haribabu, M., Satya Prasad, K.: Multimodal medical image fusion of MRI and PET using wavelet transform. In: Proceedings of the 2012 International Conference on Advances in Mobile Networks, Communication and Its Applications, MN, pp. 127–130 (2012). https:// doi.org/10.1109/MNCApps.2012.33 6. Jha, N., Saxena, A.K., Shrivastava, A., Manoria, M.: A review on various image fusion algorithms. In: International Conference on Recent Innovations is Signal Processing and Embedded Systems (RISE 2017), 27–29 October 2017 7. Shakir, H., Talha Ahsan, S., Faisal, N.: Multimodal medical ımage registration using discrete wavelet transform and Gaussian pyramids. In: 2015 IEEE International Conference on Imaging Systems and Techniques (IST) (2015) 8. El-Gamal, F.E.-Z.A., Elmogy, M., Atwan, A.: Current trends in medical image registration and fusion. Egypt. Inform. J. 17(1), 99–124 (2016) 9. Krishna Chaitanya, Ch., Sangamitra Reddy, G., Bhavana, V., Sai Chaitanya Varma, G.: PET and MRI medical image fusion using STDCT and STSVD. In: International Conference on Computer Communication and Informatics (ICCCI 2017), Coimbatore, India, 5–7 January 2017 10. Bhavana, V., Krıshnappa, H.K.: Fusion of MRI and PET images using DWT and adaptive histogram equalization. In: 2016 International Conference on Communication and Signal Processing, ICCSP 2016, Melmaruvathur, Tamilnadu (2016) 11. Radhakrishna, A., Divya, M.: Survey of data fusion and tumor segmentation techniques using brain images. Int. J. Appl. Eng. Res. 10, 20559–20570 (2015) 12. Corso, J.J., Sharon, E., Yuille, A.: Multilevel segmentation and integrated bayesian model classification with an application to brain tumor segmentation. In: MICCAI 2006 (2006)
Capsule Network for Plant Disease and Plant Species Classification R. Vimal Kurup(B) , M. A. Anupama, R. Vinayakumar, V. Sowmya, and K. P. Soman Center for Computational Engineering and Networking (CEN), Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, Coimbatore, India [email protected], [email protected], [email protected], [email protected]
Abstract. In deep learning perspective, convolutional neural network (CNN) forms the backbone of image processing. For reducing the drawbacks and also to get better performance than conventional neural network, the new architecture of CNN known as, capsulenet is implemented. In this paper, we analyze capsulenet for two datasets, which are based on plants. In the modern world, most of the diseases are contaminating due to the lack of hygienic food. One of the main reasons for this is, diseases affecting crop species. So, the first model is built for plant disease diagnosis using the images of plant leaves. The corresponding dataset consists of 54,306 images of 14 plant species. The proposed architecture with capsulenet gives an accuracy around 94%. The second task is plant leaves classification. This dataset consists of 2,997 images of 11 plants. The prediction model with capsulenet gives an accuracy around 85%. In the recent years, the use of mobile phones is rapidly increasing. Here for both the models, the images of plant leaves are taken using mobile phone cameras. So, this method can be extended to various plants and can be adopted in large scale manner.
1
Introduction
In deep learning, one powerful tool for image processing applications is convolutional neural network (CNN). For getting better results, several architectures of CNN have been developed. But still, conventional CNN have lot of drawbacks. It needs a large dataset for building a model. Also, orientation and spatial relationships between the components of image are not very important to CNN. As a solution to this problem, a new architecture called capsulenet is introduced [13]. Capsules forms the capsulenet for information extraction from images, similar to human brain. In this work, two cases namely, plant image diagnosis and plant leaves classification are taken into account to prove the significance of capsulenet. c Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 413–421, 2020. https://doi.org/10.1007/978-3-030-37218-7_47
414
R. V. Kurup et al.
One of the basic needs of human is food. Modern farming techniques have the ability to produce food for 7.6 billion people across the world. Although enough food is available, we are still facing lot of diseases. One of the main reasons is due to the problems associated with these food materials. Surveys showing that more than 50% of plants are affected with pests and diseases [23]. Even, high-tech farming also faces similar type of issues. While considering plants, there are so many diseases associated with each plant. Plant diseases are caused by different factors such as climate changes, lack of organic fertilizers etc. Here the main challenge is to find the particular disease affecting particular plant. Majority of farming is done by small scale farmers and they are also unaware of different types of plant diseases. Various methods are adopted to detect plant diseases. In earlier days, rather than finding the diseases, small scale farmers used several pesticides and chemical fertilizers, which cause severe after effects. So, disease predictions have much importance in recent days. Nowadays, the use of internet and mobile phones are drastically increased. This can be utilized to get a solution for our problem. Smartphones have much computational power and high-resolution cameras. Here, we implemented an image recognition system for plant disease diagnosis. For disease diagnosis, we propose a deep learning model using capsulenet. The dataset under consideration for plant disease diagnosis is taken from plant village project (Hughes and Salath´e 2015). It consists of 54,306 images of 14 plant species with 38 classes. The performance of the given model is measured on the basis of the exact prediction of crop-disease pair. Capsulenet also works well on small datasets. It is proved by the second case. ie, Plant leaves classification. The dataset is taken from leafsnap and it consists of 11 classes [14]. In past few years, computer vision and image recognition tasks have advanced tremendously. So, convolutional neural network is the best thing to be chosen to perform a task like plant disease diagnosis. The neural network consists of a number of layers that map input to output. The challenge is to create functions for layers to correctly map the input to output. The parameters of the network are effectively tuned during the training process to get high accuracy.
2
Related Works
The objective of the present work is to build a capsulenet model for two applications. Plant disease diagnosis can be done in several ways. Several plants are affected by different types of diseases. There are different symptoms for each of the diseases. Plant disease diagnoses based on different symptoms are described in [1]. Now-a-days, several approaches are available to predict plant disease. An image processing based technique, which is also used as smart phone application is discussed in [2]. Rather than a stand-alone application, cloud access for large database can provide more accurate results.
Capsule Network for Plant Disease and Plant Species Classification
415
In [3], diseases are determined based on micro-organisms. They implemented an image processing technique based on Convolutional Neural Networks (CNN). For performing the deep CNN training, they used Caffe as the deep learning framework, which is developed by Berkiey Vision Learning Centre. Another approach is described in [4], based on real time polymerase chain reaction and DNA analysis. For the plant disease diagnosis there are certain types of non-destructive techniques such as, spectroscopic and image processing [5]. There is a real time Polymerase Chain Reaction based approach, which does molecular tests, that facilitates rapid diagnostics [6]. The digital image processing techniques, which is used to detect, quantify and classify plant diseases using neural networks are done in [7]. Along with Gabor filter, image segmentation is also used to feed the images as input to the Convolutional Neural Networks to get better accuracy [8]. For the image diagnostics in corn only, a deep learning based image processing technique is used in [9]. The same approach is used in rice plants and is followed in [10]. In [11], diseases are predicted across rice varieties and they have also provided the user interface component. There is an approach developed to predict diseases in fruits [12]. Image processing techniques and stereo microscope are used to measure size and shape variations in fruits and thereby predict the diseases. Apart from capsulenet, plant leaves classifications are also done in different ways. In [15], the leafsnap dataset is analyzed using support vector machine (SVM) and CNN. Plant leaves can be also classified with Artificial Neural Network (ANN) and Euclidean classifier [16]. Arupriya et al. proposed an approach, which consists of three phases such as: pre-processing, grayscale transformation, boundary enhancement and feature extraction [17]. In [18], to find the similarity between shapes during matching process, a probabilistic curve evolution method with particle filters is used. Pooja [20] did x-ray classification using support vector machine and Grand Unified Regularized Least Squares and obtained good results. Swapna [20] did cardiac arrhythmia diagnosis using convolutional neural network, recurrent neural network and LSTM and hybrid cnn and obtained good results which shows the significance of neural networks.
3
Proposed Architecture
The proposed architecture is unique for the datasets related to Plants. Further extension can be done with increasing parameters so that we can solve problems related Plants. The proposed architecture can be visualized as shown in Fig. 1.
416
R. V. Kurup et al.
Fig. 1. Proposed architecture of capsule network for plant leaves classification and plant disease diagnosis [2].
4
Data Description
In this work, the analysis of capsulenet is done on two datasets. One is plant disease diagnosis and the other is plant leaf classification. In both the cases, the images are down scaled while applying to the network. The size at which the image is down scaled is 28 × 28. The dataset for plant leaf prediction is taken from leafsnap [14]. It consists of images taken from species found in Northeastern United States. From the huge dataset, we analyze 11 classes, which contain 2,977 images. Table 4 shows number of images of each class (80% Train and 20% Test data). In plant disease diagnosis, we analyze 54,306 images of 14 different crop species. It consists of 38 classes, which consists of 26 diseases and 12 healthy cases. Table 1 shows number of images of each class (80% Train and 20% Test data). Each label is a crop disease pair. The data was taken from the project PlantVillage (Hughes and Salath´e 2015). Before applying to the network, various transformations are applied to the input images including random cropping, flipping, normalizing etc. We need a very large dataset for this type of neural network application. Until very recently such data sets are unavailable, solution to this problem is a difficult task. So, Plant Village Project (Hughes and Salath´e 2015) collected thousands of images of disease affected leaves as well as healthy and has made them openly and freely available. Table 4 shows number of images of each class (80% Train and 20% Test data).
5
Experimental Results and Observations
We evaluated the applicability of capsule network on plant village dataset and leafsnap dataset. The model is executed in GPU enabled TensorFlow background [22]. Based on the proposed architecture, the capsule network model is evaluated for both the tasks: plant disease diagnosis as well as plant leaves classification. Tables 2 and 3 shows the configuration details of capsulenet for plant disease diagnosis and Plant leaves classification respectively.
Capsule Network for Plant Disease and Plant Species Classification
417
Table 1. Configuration details of capsulenet for plant disease diagnosis. Layer (type)
Output shape
Param # Connected to
input1 (InputLayer)
(None, 28, 28, 3)
0
conv1 (Conv2D)
(None, 20,20,256) 62464
input1[0][0]
primarycap conv2d (Conv2D)
(None,6,6,256)
5308672
conv1[0][0]
primarycap reshape (Reshape) (None,1152,8)
0
primarycap conv2d[0][0]
primarycap squash (Lambda)
(None,1152,8)
0
primarycap reshape[0][0]
plantcaps (CapsuleLayer)
(None, 38,16)
5603328
primarycap squash[0][0]
input 2 (InputLayer)
(None, 38)
0
mask 1 (Mask)
(None, 608)
0
plantcaps[0][0], input 2[0][0]
capsnet (Length)
(None, 38)
0
plantcaps[0][0]
decoder (Sequential)
(None, 28, 28, 3)
3247920
mask 1[0][0]
Table 2. Configuration details of capsulenet for plant leaves classification. Layer (type)
Output shape
Param # Connected to
input1 (InputLayer)
(None, 28, 28, 3
0
conv1 (Conv2D)
(None, 20,20,256) 62464
primarycap conv2d (Conv2D)
(None,6,6,256)
input1[0][0]
5308672
conv1[0][0]
primarycap reshape (Reshape) (None,1152,8)
0
primarycap conv2d[0][0]
primarycap squash (Lambda)
(None,1152,8)
0
primarycap reshape[0][0]
plantcaps (CapsuleLayer
(None, 11,16)
1622016
primarycap squash[0][0]
input 2 (InputLayer)
(None, 11)
0
mask 1 (Mask)
(None, 176)
0
plantcaps[0][0], input 2[0][0]
capsnet (Length)
(None, 11)
0
plantcaps[0][0]
decoder (Sequential)
(None, 28, 28, 3)
3026736
mask 1[0][0]
In this work we analysed the Plant disease diagnosis and Plant Leaves classification using capsulenet. The training set is choosen as 80% of entire dataset for both the cases and validation accuracy is calculated for monitoring the training accuracy. Then based on the parameters obtained the accuracy is calculated for test data. Figure 2 shows the performance of capsnet model for training and validation datasets for plant disease diagnosis and plant leaves classification respectively. The graph indicates that plant disease diagnosis have less error compared to plant leaves classification, since the dataset is small for plant leaves classification task. For plant disease diagnosis we got 94.5% of accuracy and for Plant Leaves Classification we got 85% accuracy.
418
R. V. Kurup et al.
Fig. 2. Performance of capsnet model for Training and validation datasets in Plant leaves classification and Plant disease diagnosis. Table 3. Train-test split and accuracy assessment parameters obtained for plant leaves classification using proposed work. Sl No. Classes
Train Test Precision Recall F1 score AUC
1
Acer negundo
188
2
Acer rebrum
248
49
0.64
0.79
0.71
0.94
3
Asiminatriloba
207
42
0.88
0.97
0.92
0.99
4
Broussonettiapapyrifera 242
53
0.97
0.76
0.85
0.98
5
Catalpa bignonioides
37
0.92
0.92
0.92
0.99
6
Cercis Canadensis
179
37
0.77
0.93
0.84
0.99
7
Chionanthusvirginicus
180
37
0.89
0.78
0.83
0.99
8
Ilex opaca
206
39
0.63
0.73
0.68
0.96
9
Liriodendron tulpifera
200
38
0.72
0.76
0.74
0.96
10
Maclurapomifera
378
72
0.91
0.81
0.86
1.00
11
Ulmusrubra
275
54
0.76
0.76
0.76
0.96
180
41
0.87
0.94
0.91
0.99
Capsule Network for Plant Disease and Plant Species Classification
419
Table 4. Train-test split and accuracy assessment parameters obtained for plant disease diagnosis using proposed work. Sl No. Classes
Train Test
Precision Recall F1 score AUC
1
Apple scab
504
126
0.80
0.89
0.85
2
Apple Black rot
496
125
0.92
0.93
0.93
1.00
3
Apple cedar
220
55
0.55
0.91
0.68
0.98
4
Apple healthy
1316
329
0.91
0.92
0.91
0.99
5
Blueberry healthy
1202
300
0.98
0.92
0.95
1.00
6
Cherry healthy
684
170
0.99
0.97
0.98
1.00
7
Cherry powdery mildew
842
210
0.95
0.91
0.93
1.00
8
Corn cercospora grey spot
410
103
0.71
0.88
0.78
0.99
9
Corn common rust
953
239
1.00
1.00
1.00
1.00
10
Corn healthy
929
233
1.00
0.98
0.99
1.00
11
Corn northern leaf blight
788
197
0.91
0.89
0.90
1.00
12
Grape black rot
944
236
0.88
0.92
0.90
1.00
13
Grape esca
1107
276
0.93
0.91
0.92
1.00
14
Grape healthy
339
84
0.99
0.97
0.98
1.00
15
Grape leaf blight
861
215
0.97
0.96
0.97
1.00
16
Orange haunglongbing
4405
1102
0.98
0.97
0.98
1.00
17
Peach bacterial spot
1838
459
0.94
0.94
0.94
1.00
18
Peach healthy
288
72
0.96
0.91
0.93
1.00
19
Pepper bell bacterial spot
797
200
0.91
0.85
0.88
0.99
20
Pepper bell healthy
1183
295
0.96
0.92
0.94
1.00
21
Potato early blight
800
200
0.95
0.90
0.92
1.00
22
Potato healthy
121
31
0.61
0.83
0.70
0.99
23
Potato late blight
800
200
0.84
0.86
0.85
1.00
24
Raspberry healthy
297
74
0.92
0.86
0.89
1.00
25
Soybean healthy
4072
1018
0.98
0.99
0.98
1.00
26
Squash powdery mildew
1468
367
0.98
0.98
0.98
1.00
27
Strawberry healthy
364
92
0.95
0.95
0.95
1.00
28
Strawberry leaf scorch
887
222
0.94
0.96
0.95
1.00
29
Tomato Bacterial spot
1702
425
0.95
0.94
0.94
1.00
30
Tomato Early blight
800
200
0.71
0.80
0.76
0.99
31
Tomato healthy
1273
318
0.98
0.98
0.98
1.00
32
Tomato Late blight
1527
382
0.83
0.86
0.84
0.99
33
Tomato Leaf Mold
761
191
0.91
0.93
0.92
1.00
34
Tomato Septoria leaf spot
1417
354
0.90
0.92
0.91
1.00
35
Tomato Two-spotted spider mite 1341
335
0.92
0.92
0.92
1.00
36
Tomato Target Spot
281
0.93
0.87
0.90
1.00
37
Tomato mosaic virus
299
74
0.89
0.92
0.90
1.00
38
Tomato Yellow Leaf Curl Virus
4286
1071
0.99
0.98
0.99
1.00
1123
1.00
420
6
R. V. Kurup et al.
Conclusion
This paper analyzes the performance of capsulenet for two tasks related to plants. We chose capsulenet rather than conventional convolutional neural architectures, since CNN have some drawbacks. Capsules present in capsulenet are similar to human brain in capturing the relevant information. The two tasks considered in the present work are plant disease diagnosis, which resulted in accuracy around 94.5%. To evaluate the effectiveness of capsulenet model, the same model is executed in one more task called, plant leaves classification, which has a small dataset and the proposed capsulenet architecture resulted around accuracy of 85%. Since large computational power is required, we done experiments on this small number of images. But, capsulenet is capable of giving better accuracies for small datasets also. For last few years, the uses of mobile phones are drastically increased; mobile phone images can be used to use such models in real-time. So, in future, it is easier to increase the number of images in the dataset and can re-train the same model. Compliance with Ethical Standards. All authors declare that there is no conflict of interest No humans/animals involved in this research work. We have used our own data.
References 1. Riley, M.B., Williamson, M.R., Maloy, O.: Plant disease diagnosis. The Plant Health Instructor. https://doi.org/10.1094/PHI-I-2002-1021-01.2002 2. Georgakopoulou, K., Spathis, C., Petrellis, N., Birbas, A.: A capacitive to digital converter with automatic range adaptation. IEEE Trans. Instrum. Meas. 65(2), 336–345 (2011) 3. Sankaran, S., Mishra, A., Eshani, R., Davis, C.: A review of advanced techniques for detecting plant diseases. Comput. Electron. Agric. 72(1), 1–13 (2010) 4. Schaad, N.W., Frederick, R.D.: Real time PCR and its application for rapid plant disease diagnostics. Can. J. Plant Pathol. 24(3), 250–258 (2002) 5. Purcell, D.E., O’Shea, M.G., Johnson, R.A., Kokot, S.: Near infrared spectroscopy for the prediction of disease rating for Fiji leaf gall in sugarcane clones. Appl. Spectro. 63(4), 450–457 (2009) 6. Schaad, N.W., Frederick, R.D., Shaw, J., Schneider, W.L., Hickson, R., Petrillo, M.D., Luster, D.G.: Advances in molecular-based diagnostics in meeting crop biosecurity and phytosanitary issues. Ann. Rev. Phytopathol. 41, 305–24 (2003) 7. Barbedo, G.C.A.: Digital image processing techniques for detecting quantifying and classifying plant diseases. Springer Plus 2, 660 (2013) 8. Kulkarni, A., Patil, A.: Applying image processing technique to detect plant diseases. Int. J. Mod. Eng. Res. 2(5), 3361–3364 (2012) 9. Lai, C., Ming, B., Li, S.K., Wang, K.R., Xie, R.Z., Gao, S.J.: An image-based diagnostic expert system for corn diseases. Agric. Sci. China 9(8), 1221–1229 (2010) 10. Sarma, S.K., Singh, K.R., Singh, A.: An expert system for diagnosis of diseases in rice plant. Int. J. Artif. Intell. 1(1), 26–31 (2010)
Capsule Network for Plant Disease and Plant Species Classification
421
11. Abu-Naser, S.S., Kashkash, K.A., Fayyad, M.: Developing an expert system for plant disease diagnosis. J. Artif. Intell. 1(2), 78–85 (2008). Asian Network for Scientific Information 12. Mix, C., Pic´ o, F.X., Ouborg, N.J.: A comparison of stereomicroscope and image analysis for quantifying fruit traits. SEED Technology, vol. 25, no. 1 (2003) 13. Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Neural Information Processing Systems (2017) 14. Kumar, N., Belhumeur, P.N., Biswas, A., Jacobs, D.W., John Kress, W., Lopez, I.C., Soares, J.V.B.: Leafsnap: a computer vision system for automatic plant species identification. In: Proceedings of the 12th European Conference on Computer Vision (2012) 15. Sulc, M., Matas, J.: Fine-grained recognition of plants from images. Plant Methods 13, 115 (2017) 16. Satti, V., Satya, A., Sharma, S.: An automatic leaf recognition system for plant identification using machine vision technology. Int. J. Eng. Sci. Technol. 5, 874 (2013) 17. Arun Priya, C., Balasaravanan, T., Selvadoss Thanamani, A.: An efficient leaf recognition algorithm for plant classification using support vector machine. In: Proceedings of the International Conference on Pattern Recognition, Informatics and Medical Engineering, pp. 428–432 (2012) 18. Valliammal, N., Geethalakshmi, S.N.: Automatic recognition system using preferential image segmentation for leaf and flower images. Comput. Sci. Eng.: Int. J. 1, 13–25 (2011) 19. Kekre, H.B., Mishra, D., Narula, S., Shah, V.: Color feature extraction for CBIR. Int. J. Eng. Sci. Technol. 3, 8357–8365 (2011) 20. Pooja, A., Mamtha, R., Sowmya, V., Soman, K.P.: X-ray image classification based on tumor using GURLS and LIBSVM, pp. 0521–0524 (2016). https://doi.org/10. 1109/ICCSP.2016.7754192 21. Swapna, G., Soman, K.P., Vinayakumar, R.: Automated detection of cardiac arrhythmia using deep learning techniques. Procedia Comput. Sci. 132, 1192–1201 (2018). https://doi.org/10.1016/j.procs.2018.05.034 22. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M.: TensorFlow: a system for large-scale machine learning, vol. 16, pp. 265–283 (2016) 23. Harvey, C.A., Rakotobe, Z.L., Rao, N.S., Dave, R., Razafimahatratra, H., Rabarijohn, R.H., et al.: Extreme vulnerability of smallholder farmers to agricultural risks and climate change in madagascar. Philos. Trans. R. Soc. Lond. B Biol. Sci. 369, 20130089 (2014). https://doi.org/10.1098/rstb.2013.008
RumorDetect: Detection of Rumors in Twitter Using Convolutional Deep Tweet Learning Approach N. G. Bhuvaneswari Amma1(B) 1
and S. Selvakumar1,2
National Institute of Technology, Tiruchirappalli 620 015, Tamil Nadu, India [email protected] 2 Indian Institute of Information Technology, Una, Himachal Pradesh, India [email protected], [email protected]
Abstract. Nowadays social media is a common platform to exchange ideas, news, and opinions as the usage of social media sites is increasing exponentially. Twitter is one such micro-blogging site and most of the early update tweets are unverified at the time of posting leading to rumors. The spread of rumors in certain situations make the people panic. Therefore, early detection of rumors in Twitter is needed and recently deep learning approaches have been used for rumor detection. The lacuna in the existing rumor detection systems is the curse of dimensionality problem in the extracted features of Twitter tweets which leads to high detection time. In this paper, the issue of dimensionality is addressed and a solution is proposed to overcome the same. The detection time could be reduced if the relevant features are only considered for rumor detection. This is captured by the proposed approach which extracts the features based on tweet, reduces the dimension of tweet features using convolutional neural network, and learns using fully connected deep network. Experiments were conducted on events in Twitter PHEME dataset and it is evident that the proposed convolutional deep tweet learning approach yields promising results with less detection time compared to the conventional deep learning approach. Keywords: Convolutional neural network · Deep learning extraction · Rumor detection · Social media · Twitter
1
· Feature
Introduction
Online social media is a platform that generates massive amount of data during breaking news [11]. Twitter is one of the social media platforms in which people post information about stock markets, natural hazards, general opinion, etc. But it is hard to trust the information being spread in the social media virally as the users simply click Twitter retweet without verifying the information. This unverified information is termed as Rumor and it originates from one or more sources and spread virally from node to node in the social media network. The rumor may c Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 422–430, 2020. https://doi.org/10.1007/978-3-030-37218-7_48
RumorDetect
423
be true, false, or remain unresolved. The detection of rumors is crucial in crisis situations and therefore, rumor detection system is needed to stop the spread of rumors at the earliest [8]. The rumor detection systems discriminate the rumor from non-rumor by considering it as a binary classification problem. The binary classification techniques are categorized into statistical approach and machine learning approach. The statistical approach requires low computational cost but it needs proper assumptions and inability to fix the assumptions leads to low false positive rate. The machine learning approach requires no prior assumptions and detects the rumor with high accuracy [1]. The recent buzzword in the literature is deep learning which is the subset of machine learning. The deep learning approaches automatically find the relationship in the data and learn the features with different levels of abstractions. The most extensively used deep approach is Convolutional Neural Network (CNN). The CNN extracts the features and compresses the extracted features to make it suitable for learning. Generally, the CNN consists of Convolution Layer (CL), Pooling Layer (PL), Dropout Layer (DOL), and fully connected layer. The features extracted from the tweets play a major role in classification problems. Therefore, feature vector formation is an important step to improve the performance of the rumor detection systems [7]. The following propositions are the key contributions of this paper: 1. Extraction of salient features to detect rumors. 2. A Convolutional Deep Tweet Learning network structure for training the Twitter rumor detection system. 3. RumorDetect approach to classify the tweet as rumor or non-rumor. The rest of the paper is organized as follows: Sect. 2 discusses feature extraction in Twitter rumors and deep learning based rumor detection. Section 3 describes the proposed RumorDetect approach. Section 4 discusses the experimental results of the proposed approach. The conclusion with the directions for further research is discussed in Sect. 5.
2 2.1
Related Works Feature Extraction in Twitter Rumors
Generally Twitter rumor is the collection of tweets about an event that spread through Twitter. The tweets are similar assertions about the event [8]. The rumors are thousands of tweets with same meaning but the order of words in the tweet is different. The features play a major role in machine learning based classification problems. Therefore, extracting the salient features from the Twitter tweets is an essential step in rumor detection [4]. It is observed from the literature that the features are categorized into content, multimedia, propagation, and topic based features [5].
424
N. G. Bhuvaneswari Amma and S. Selvakumar
Fig. 1. Architecture of proposed RumorDetect approach
2.2
Deep Learning Based Rumor Detection
The surveys of existing rumor detection systems in social media are given in [9,10]. One category is based on rumors known apriori. In this type of rumor detection, a set of pre-defined rumors are fed to the classifier and it classifies the new tweets as either rumor or non-rumor. This approach is suitable for long standing rumors but not suitable for breaking news as unseen rumors emerge and keywords linked to the rumor are yet to be detected [12]. Another category is based on learning the generalizable patterns that help to identify rumors during emerging events [8]. Deep learning based rumor detection system learns the tweet stream automatically by finding the relationship in the tweets. The deep learning techniques such as autoencoder, CNN, long short term memory, etc., adapted to rapidly changing twitter data streams motivated us to propose a convolutional deep tweet learning classifier which extracts the features with different levels of representation and learns automatically the extracted features to classify the Twitter data stream [2,6].
3
Proposed RumorDetect Approach
The architecture of proposed RumorDetect approach is shown in Fig. 1. This approach consists of training tweets and testing tweets phases. The training tweets phase comprises of Feature Vector Construction and Convolutional Deep Tweet Learning modules. The testing tweets phase uses Rumor Detection module to classify the given tweet as either rumor or non-rumor. The flows of training and testing tweets are represented using solid and dashed lines respectively. The tweets in the dataset consist of source tweet and reactions for each Twitter data. 3.1
Feature Vector Construction
The feature vector construction phase includes pre-processing and feature extraction sub modules. The pre-processing sub module comprises of tokenization, parts of speech tagger, stop word removal, stemming, bag of words creation,
RumorDetect
425
Fig. 2. Proposed convolutional feature generation network structure
and term frequency-inverse tweet frequency (tf-itf) calculation [9]. Tokenization is the process of reading the tweets and output tokenized words. The parts of speech tagging is the process of reading the tokenized words and output parts of speech tagged words. Stop word removal is the process of removing less important and meaningless words such as articles and prepositions like a, the, is, etc. Stemming is the process of finding all root forms of the input word. Bag of words creation is the process of taking into account the words and their frequency of occurrence in the tweet disregarding the semantic relationship in the tweet. The tf-itf calculation is the process of computing the weights for each token in the tweet to evaluate how important a word is to a tweet. The feature extraction plays a major role in rumor detection. The extracted features are represented as vector for further learning. The extracted features from the tweets are as follows: Tweet count, Listed count, Retweet count, Favorite count, Account age, Followers count, Hashtag count, Friends count, URL count, Character count, Digit count, and Word count. 3.2
Convolutional Deep Tweet Learning
The convolutional deep tweet learning module comprises of two sub modules, viz., Convolutional Feature Generation (CFG) based pre-training and Fully Connected Deep Network (FCDN) based training. The proposed CFG network compresses and reduces the size of the feature vector. The structure of the CFG network is depicted in Fig. 2. It consists of CL, PL, and DOL. The normalized training tweets are given as input to the CL. The CL [3] produces the transformed convolution as follows: T rConIi = XD Ii × F
(1)
where XD is the normalized tweet vector and F is the filter. The reason for using two levels of CL before PL is to generate high level features to compress the generated feature vector so as to reduce the feature vector size.
426
N. G. Bhuvaneswari Amma and S. Selvakumar
Fig. 3. Structure of proposed fully connected deep network
The PL compresses and reduces the size of the feature vector so as to overcome the issue of overfitting. The proposed approach uses average pooling with the size of 2 using (2). T rP oo(T rColIij , T rColIik ) = avg(T rColIij , T rColIik )
(2)
The subsequent CLs and PL are computed using (1) and (2) respectively. The application of filter with feature vector creates co-dependency among each other leading to overfitting. In order to overcome this issue, DOL is incorporated in the CFG network. The nodes are dropped out with the probability, p, so that the reduced network is passed to the FCDN for further training. FCDN trains the feature vector and produces learned weights for rumor detection. Figure 3 depicts the proposed FCDN with q − 9 − 7 − 6 − 1 structure. The intuition behind choosing this structure is to learn the feature vector with different abstractions. The computation in the hidden layer is performed as follows: p Inij × WijH1 (3) H1i = i=1
where Inij is the DOL output which is passed to the input layer of FCDN and WijH1 is the weights between input and hidden layer. The Rectified Linear Unit (ReLU) and sigmoid functions are used in the hidden and output layers respectively. The ReLU is computed as follows: f (H1i ) = max (H1i , 0)
(4)
The sigmoid activation function is used in the output layer and computed as follows: O f IjO = 1/ 1 + e−Ij (5)
RumorDetect
427
Algorithm 1. RumorDetect Algorithm Input: Test Tweet, learned weights Output: Rumor/Non-Rumor 1: for each record do 2: Compute transformed convolution using (1) 3: Compute transformed pooling using (2) 4: Compute sum of product of DOL output and weights 5: Compute ReLU activation function 6: Compute output using (5) 7: if output ≤ 0 then 8: Return N on − Rumor 9: else 10: Return Rumor 11: end if 12: end for Table 1. Statistics of datasets Breaking news Training Testing Rumors Non-rumors Rumors Non-rumors FU
275
973
183
648
OS
170
515
114
344
SS
143
139
95
92
CHS
282
252
188
168
GPC
313
419
209
280
In order to train the model, Root Mean Square Error (RMSE) is computed using the class label and the computed output of each tweet in the training dataset [3]. The tweet learning process continues till the RMSE reaches a pre-defined threshold which is suitable for rumor detection. 3.3
Rumor Detection
The Rumor Detection module is similar to that of CFG pre-training and FCDN training without RMSE computation. The learned weights obtained from tweet learning are used for detection of rumor. Algorithm 1 depicts the RumorDetect Algorithm. The test Twitter tweet is given as input to the Rumor Detection module to detect whether the given tweet is rumor or non-rumor.
4
Experimental Results
The proposed approach was implemented in Python on Intel(R) Core(TM) i76700 CPU @ 3.40 GHz processor and 16 GB RAM under Windows 10. PHEME dataset was used for experimentation and it consists of Twitter tweets posted
428
N. G. Bhuvaneswari Amma and S. Selvakumar Table 2. Overall performance of proposed RumorDetect approach
Breaking news TN FP FN TP Precision Recall F-measure Accuracy ER FU
607 41
31
152 78.76
83.06
80.85
91.34
8.66
OS
312 32
20
94 74.60
82.46
78.33
88.65
11.35
SS
74 18
13
82 82.00
86.32
84.10
83.42
16.58
CHS
140 28
25
163 85.34
86.70
86.02
85.11
14.89
GPC
251 29
33
176 85.85
84.21
85.02
87.32
12.63
Fig. 4. Rumor detection approaches versus detection time
during breaking news [11]. The following are the five events that attracted the media with huge interest: Ferguson Unrest (FU), Ottawa Shooting (OS), Sydney Siege (SS), Charlie Hebdo Shooting (CHS), and Germanwings Plane Crash (GPC). Table 1 shows the training and testing events in the PHEME dataset. The proposed approach is evaluated using True Negative, False Positive, False Negative, and True Positive performance measures. The performance metrics, such as, Precision, Recall, F-measure, Accuracy, and Error Rate (ER) are computed for the proposed approach and tabulated in Table 2. It is observed from the table that all the events in the PHEME dataset yield significant results. Figure 4 depicts the comparison of detection time taken by the proposed approach and conventional deep learning approach. The structure of conventional deep learning classifier used for experimentation consists of two levels of CL and PL with FCDN with q − 9 − 7 − 6 − 1 structure. It is seen that the detection time of proposed approach is less than the existing deep learning approach for all the events in the dataset as the proposed approach uses DOL after CL and PL.
RumorDetect
5
429
Conclusion
In this paper, RumorDetect approach is proposed to detect rumors in Twitter tweets. The objective is to overcome the curse of dimensionality problem from the extracted features of Twitter tweets, thereby reducing the detection time of the proposed RumorDetect system. The features were extracted from the tweets to generate feature vector. The network was trained using convolutional deep tweet learning approach and the RumorDetect system classifies the unseen tweet as either rumor or non-rumor. The performance of the proposed RumorDetect approach was evaluated using benchmark Twitter PHEME dataset. The proposed approach achieves significant improvement in performance compared to the conventional deep learning classifier. However, the extension of this work could be the online detection of rumors in Twitter to further automate the system. Compliance with Ethical Standards. All authors declare that there is no conflict of interest No humans/animals involved in this research work. We have used our own data.
References 1. Ahmed, F., Abulaish, M.: A generic statistical approach for spam detection in online social networks. Comput. Commun. 36(10–11), 1120–1129 (2013) 2. Ajao, O., Bhowmik, D., Zargari, S.: Fake news identification on Twitter with hybrid CNN and RNN models. In: Proceedings of the 9th International Conference on Social Media and Society, pp. 226–230. ACM (2018) 3. Amma, N.G.B., Subramanian, S.: VCDeepFL: vector convolutional deep feature learning approach for identification of known and unknown denial of service attacks. In: TENCON 2018-2018 IEEE Region 10 Conference, pp. 0640–0645. IEEE (2018) 4. Chen, C., Wang, Y., Zhang, J., Xiang, Y., Zhou, W., Min, G.: Statistical featuresbased real-time detection of drifted Twitter spam. IEEE Trans. Inf. Forensics Secur. 12(4), 914–925 (2017) 5. Liang, G., He, W., Xu, C., Chen, L., Zeng, J.: Rumor identification in microblogging systems based on users behavior. IEEE Trans. Comput. Soc. Syst. 2(3), 99–108 (2015) 6. Liu, Y., Jin, X., Shen, H.: Towards early identification of online rumors based on long short-term memory networks. Inf. Process. Manag. 56(4), 1457–1467 (2019) 7. Vishwarupe, V., Bedekar, M., Pande, M., Hiwale, A.: Intelligent Twitter spam detection: a hybrid approach. In: Smart Trends in Systems, Security and Sustainability, pp. 189–197. Springer (2018) 8. Vosoughi, S., Mohsenvand, M., Roy, D.: Rumor gauge: predicting the veracity of rumors on Twitter. ACM Trans. Knowl. Discov. Data (TKDD) 11(4), 50 (2017) 9. Wu, T., Wen, S., Xiang, Y., Zhou, W.: Twitter spam detection: survey of new approaches and comparative study. Comput. Secur. 76, 265–284 (2018) 10. Zubiaga, A., Aker, A., Bontcheva, K., Liakata, M., Procter, R.: Detection and resolution of rumours in social media: a survey. ACM Comput. Surv. (CSUR) 51(2), 32 (2018)
430
N. G. Bhuvaneswari Amma and S. Selvakumar
11. Zubiaga, A., Liakata, M., Procter, R.: Learning reporting dynamics during breaking news for rumour detection in social media. arXiv preprint arXiv:1610.07363 (2016) 12. Zubiaga, A., Spina, D., Martinez, R., Fresno, V.: Real-time classification of Twitter trends. J. Assoc. Inf. Sci. Technol. 66(3), 462–473 (2015)
Analyzing Underwater Videos for Fish Detection, Counting and Classification G. Durga Lakshmi(&) and K. Raghesh Krishnan Department of Computer Science and Engineering, Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidhyapeetham, Coimbatore, India [email protected], [email protected]
Abstract. Underwater video processing is a valuable tool for analyzing the presence and behaviour of fishes in underwater. Video based analysis of fishes finds its use in aquaculture, fisheries and protection of fishes in oceans. This paper proposes a system to detect, count, track and classify the fishes in underwater videos. In the proposed approach, two systems are developed, a counting system, which uses gaussian mixture model (for foreground detection), morphological operations (for denoising), blob analysis (for counting) and kalman filtering (for tracking), and a classification system, which uses bag of features approach that is used to classify the fishes. In the bag of feature approach, surf features are extracted from the images to obtain feature descriptors. k-mean clustering is applied on the feature descriptors, to obtain visual vocabulary. The test features are input to the MSVM classifier, which uses visual vocabulary to classify the images. The proposed system achieves an average accuracy of 90% in counting and 88.9% in classification, respectively. Keywords: Gaussian mixture model Kalman filtering Morphological operations Blob analysis Bag of features Visual vocubulary SURF features K-means clustering FAST (Features from Accelerated Segment Test)
1 Introduction Counting and classification of fishes is an essential tool for fisheries and oceanographers, to analyze the presence and behavior of fishes. In practice, oceanographers generally verify the underwater conditions using nets (to collect and test the fishes) or human observation. Manual approaches are time consuming, expensive and less accurate in analyzing the state and behavior of fishes. The process also involves an enormous risk of killing or injuring the fishes and damaging the surroundings. In order to overcome these issues, the oceanographers have started exploring alternate approaches to marine life observation. One such approach is to acquire underwater fish videos and examine them using computer vision techniques, which split the problem into fish counting and classification modules, for observation. Most of the traditional computer vision based approaches to underwater fish analysis in the literature make use of manual counting. In order to increase the speed of the counting process and to
© Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 431–441, 2020. https://doi.org/10.1007/978-3-030-37218-7_49
432
G. D. Lakshmi and K. R. Krishnan
reduce the human intervention, this work proposes an automatic approach to fish counting and classification.
2 Literature Survey Boudhane Mohcine and Benayad Nsiri, devised a useful method to pre-process and detect fishes in underwater images [1]. Poisson-Gauss theory is used for de-noising, as it accurately removes the noise in the images. A mean-shift algorithm is used to segment the foreground, following which statistical estimation is applied to every segment, to combine the segments into objects. The quality of the reconstructed image is measured using MSE and PSNR. The results estimated using log likelihood ratio test show a reduction in the number of false regions. Spampinato et al. propose a method for recognition of fish species in underwater videos using a sparse low-rank matrix decomposition, to extract the foreground from the image [2]. The features of the fishes are extracted using a Deep Architecture, which consists of two convolution layers, a feature pooling layer and a non-linear layer. In above layers PCA, binary hashing and block wise histograms used, respectively. In order to collect the information which does not vary with respect to large poses, spatial pyramid pooling algorithm used. The result obtained is given as an input to a linear SVM classifier. The deep network model is trained for 65000 iterations, to achieve an accuracy of 98.64%. Prabowo et al. propose a system to provide fish and video related information, such as the number of fishes present in the video, the quality of the video and the dominant color of the video [3]. The work uses a gaussian mixture model followed by a moving average detection algorithm, to detect the moving objects in the video. Statistical moments of the gray level histogram are used in the analysis of color and texture, for fish detection and mean shift algorithm is used for fish tracking. The accuracies of detection and counting of fishes are reported as 84% and 85%, respectively. Palazzo et al. propose an approach to detect moving objects in underwater videos in [4]. The RGB video is converted into gray scale video normal distribution is applied to distinguish the background pixels and objects. The average sensitivity obtained for two test videos is 83% and the precision value is 67.5%. Rathi et al., devise an algorithm to detect fish in marine videos [5]. The covariance-based model is used to collaborate color and texture features. Kernel Density Estimation is used to progress the detection of fish. Mark R. Shortis et al. propose a new approach to classify the fishes in images in [6]. Otsu’s thresholding, erosion and dilation are applied to pre-process the images, following which convolutional neural networks are used to classify the fish species. The work reports an accuracy of 96.29%. Jing Hu discusses an algorithm to classify 6 different species of fishes in images [7]. RGB, HSV color features, statistical texture features, such as gray scale histogram, GLCM features and wavelet-based features are used to differentiate the fish species. These features are used to train and test the LIBMSVM and DAGMSVM and it was observed that the DAGMSVM took less time than LIBMSVM. Fabric et al. propose a systematic approach to count fishes in videos acquired using dynamic camera [8].
Analyzing Underwater Videos for Fish Detection, Counting and Classification
433
Laplacian of the Gaussian (LoG) is used to produce color histograms, which differentiate the background from fish. Following this, Zernike moments are extracted to analyse the shape of the fishes. Blob detector and connected component labelling are used in the counting of fishes. Prabhakar and Kumar use global contrast correction, to produce dual intensity images, to enhance the underwater images [9]. Contrast enhanced images are produced from the dual intensity images and processed, to enhance the features. To improve the saturation and brightness, color correction is applied on the the contrast enhanced image. Mark R in [10], proposes an automated fish detection system in a video. In this work, PCA is used to model the fish shape. Haar classifier is to detect the fishes in the video. Christopher Richard Wren et al. in [11] conducted experiments in order to detect the fishes in underwater images using histogram-based background subtraction, region growing, edge detection, which proves to be useful when the camera is static. Khanfar et al. in [12] propose an algorithm that differentiates background and foreground, which simplifies object detection in every frame. However, there are few drawbacks in this approach. The algorithm fails in scenarios where the background is also in motion or the objects to be detected are in the same color as that of the background. The second problem is fixed by incorporating texture and color features in the model. The algorithm is evaluated on underwater videos and is found to produce great and reliable performance. Fiona H Evans et al. propose an approach based on Expectation-maximization (EM) algorithm, for fish detection [13]. In this approach, a particular threshold intensity is chosen from the intensity values of the fish images and the pixels which have their intensity higher than the chosen threshold are assumed to be the pixels of fishes. The pixel values are defined in terms of x and y sets. The fish image is modelled as gaussian mixture model, considering shape of each fish as a multivariant gaussian. After this, parameters of gaussian mixture model, along with the quantity of fishes are calculated, using EM algorithm. Neetika Singhal, Kalaichelvi et al. propose an image classification algorithm using bag of visual words model [14]. The image features are extracted using FAST algorithm and converted into feature descriptors. K-means clustering is applied on the feature descriptors, to produce a bag of visual words. These words are input to a linear SVM and report an accuracy of 90.8%. Inspired by the gaussain mixture model, kalman filter and bag of feature approach, this paper attempts to implement an effective system, to solve the problem of underwater fish counting and classification using videos.
3 Proposed Approach This work proposes two systems, as shown in Figs. 1 and 2, to analyze the underwater fish behavior. The first system, namely the fish counting system, is used to count the number of fishes present in underwater videos and the second fish classification system, is used to classify the fish species. In the first system, the initial step is to convert the video into frames and a gaussian mixture model is applied for foreground detection. Denoising is performed using morphological operators to eliminate the unnecessary noise. Connected component labelling is used to detect the fishes. Finally, Kalman filter is applied, to count and track the fishes.
434
G. D. Lakshmi and K. R. Krishnan
The Second system used to classify the fish species in videos. Proposed architecture of the fish classification system is shown in Fig. 2. This system uses bag of feature approach, In which the dataset is segregated into training and testing sets for classification. SURF features are extracted from the training images and feature descriptors are framed. These feature descriptors are input to the k-means clustering algorithm, which outputs the visual vocabulary. The test images are given to the trained Multi Class Support Vector Machine (MSVM) classifier, which uses visual vocabulary to classify the test images.
Fig. 1. Proposed Fish clounting system
3.1
Fig. 2. Proposed classification system
Fish Counting System
3.1.1 Foreground Detection Using GMM Background subtraction refers to the task of separating the foreground from the videos for further processing. In Gaussian mixture model, the background pixels are modelled as normal distribution and the variables of the distributions such as mean and variance are determined. The low probability pixels are treated as object and the high probability pixels are considered as the backgound. In GMM, each pixel in every frame is modelled as a Gaussian distribution. Initially, each frame is split in the RGB color space based on its intensity. Therefore, the intensity distinction is assumed to have a uniform variance. The covariance matrix is calculated as: X i;t
r2i;t I
ð1Þ
If a gaussian value is greater than the threshold value estimated, it is considered as background and if a gaussian is less than the threshold estimated, it is considered as foreground.
Analyzing Underwater Videos for Fish Detection, Counting and Classification
B ¼ argminb ð
Xb i¼1
xi;t [ TÞ:
435
ð2Þ
Whenever a picture element matches one of the K gaussians, the values of x, l and r are modified as shown below: xi;t þ 1 ¼ ð1 aÞxi;t þ a
ð3Þ
X q ¼ a g Xt ; li ; i
ð4Þ
where,
In cases where the pixels do not match with any of the K gaussians, x is modified as. xj;t þ 1 ¼ ð1 aÞxj;t
ð5Þ
3.1.2 Denoising In this work, traditional morphological opening and closing are applied, to remove the noise in foreground. Morphological opening. The opening operation includes performing erosion followed by dilation, with the same structuring element. By performing this morphological operation, noise can be removed from the frame. For the problem at hand, morphological opening removes the noise in the final step. Morphological closing. The closing operation includes performing dilation followed by erosion, with the same structuring element. By performing this morphological operation, holes in object can be filled, to avoid misclassification.
3.1.3 Blob Analysis Connected components of the foreground pixels are treated as moving objects in the video. On the obtained denoised foreground, blob analysis is applied, to find the connected components, and compute their properties like area, centroid and bounding box. Based on these properties the objects are identified as blobs. These blobs are used in the sub-sequent step of kalman filtering to track the moving objects. 3.1.4 Kalman Filter Kalman filtering is a popular optimal state estimation algorithm. This work uses Kalman filter, for object tracking, namely the fishes. It mainly focusses on three factors namely, predicting the future positions of object, decreasing mis-detections due to noise, providing multiple object tracks. The Kalman filter generates mean and covariance of the object state with respect to time. Initially apriori estimate ˆxk+1|k, for the state is built. When the state of the object changes, posteriori estimate ˆxk|k is calculated using the filter. In discrete Kalman filter,
436
G. D. Lakshmi and K. R. Krishnan
this step performed for every specific time period. To obtain the general Kalman filter, a discrete linear system is considered as shown below: wk ¼ N ð0; Qk Þ vk ¼ N ð0; Rk Þ Qk w wT wk vTk E k Tk ¼ 0 vk wk vk vTk
ð6Þ ð7Þ 0 Rk
ð8Þ
The initial values of state and covariance error are given by: x0 ¼ E ½x0
ð9Þ
P0 ¼ E ½ðx0 ^x0 Þðx0 ^x0 Þ
ð10Þ
where, P0 is the initial value of the estimation error covariance Pk. The apriori estimate can be stated as: ^xk þ 1jk ¼ Ak ^xkjk þ Bk uk
ð11Þ
Pk þ 1jk ¼ Ak Pkjk ATk þ Kk Qk KkT
ð12Þ
when the new measurement is available the estimate is filtered: 1 KF ¼ Pkjk1 CkT Ck Pkjk1 CkT þ Ck
ð13Þ
^xkjk ¼ ^xkjk1 þ KF yk Ck ^xkjk1
ð14Þ
Pkjk ¼ ½I KF Ck Pkjk1
ð15Þ
The Kalman filter calculates the state xk accurately given yk, the outcome. Since this filter can be used to estimate approximate sampling interval without affecting accuracy in calculating the state, it can be used to calculate the state accurately even if the sample has missing or incorrect data. 3.2
Fish Classification System
3.2.1 Bag of Features Approach Bag of features approach is inspired from bag of words concept in natural language processing. As there is no concept of words in computer vision the above-mentioned bag of features are generated based on surf key points, which are extracted using surf detector. These surf feature descriptors are input to the k-means clustering algorithm, which generates the visual vocabulary. This visual vocabulary is used to train the Multi Class Support Vector Machine (MSVM) classifier, to classify the objects.
Analyzing Underwater Videos for Fish Detection, Counting and Classification
437
4 Results The Video dataset used in this work was collected from fish clef data centre [18], which consists of 20 underwater videos with various backgrounds and climatic conditions. The image dataset contains 21346 images of 11 different fish species. 4.1
Fish Counting
The proposed approach of counting of fishes is applied on the dataset. İn every video, the first 150 frames are used for training the gaussain mixture model and 150 random frames are chosen from the testing video and the fishes are detected in those frames using trained GMM foreground detector. Foreground detection is achieved through background subtraction, which results in binary frames with the foreground isolated as shown in Fig. 3 (ii). The underwater videos are generally prone to several artifacts, which appear in the form of noise in the background subtracted videos, which lead to the error in the detection of foreground and background pixels. This noise should be minimized to prevent its effect on the subsequent phase of fish detection, in object frames, in order to achieve accurate results. Noise removal is achieved by applying morphological opening and closing with struture elements 3 3, 5 5 rectangles respectively. The results of noise removal are shown in Fig. 3(iii). Connected component analysis is performed on the denoised frames, in order to detect the objects as shown in Fig. 3(iii). Following this, blob analysis is performed to draw the detected fish as shown in Fig. 3(iv). Subsequently Kalman filtering is applied to track and predict the position of the fishes. Figure 3(v) shows the counting of fishes using using Kalman filter.
Fig. 3. (i) Video Frame, (ii) Foreground, (iii) Foreground after post-processing, (iv) Object Detection, (v) Object Tracking
438
G. D. Lakshmi and K. R. Krishnan
Fish Detection Rate ¼
Correctly Detected Fish Count TP ¼ Actual Fish Count TP þ FN
Where, True Positive(TP): Number of fishes are accurately classified. False Positive(FP): Number of non-fish objects missclassified as fish object. True Negative(TN): Number of non-fish objects accurately classified. False Negative(FN): Number of fish objects missclassified as non-fish object
Table 1. Fish detection results Video 1 2 3 4 5
Actual count Detected fishes FN FP Fish detection rate 334 348 20 34 94% 329 381 26 78 92% 190 308 6 124 96.8% 154 180 9 35 94% 220 250 12 52 90% Average = 93.36%
Table 1 summarizes the results of the fish detection system. There are false positives in that table as the quality of videos is low and some of them have moving background. Average fish detection rate is 93%. Fish Counting Rate ¼
Correctly Counted Fish Count in Video TP ¼ Actual Fish Count in ideo TP þ FN
Table 2. Fish counting system results Video 1 2 3 4 5 6 7
Actual count Detected fish Extra counted Not detected as fish Fish counting rate 28 33 6 1 96.4% 14 10 1 5 64.2% 22 28 6 0 100% 29 34 8 3 89.6% 26 32 8 2 92.3% 6 7 1 0 100% 8 11 4 1 87.5% Average = 90%
Table 2 summarizes the results of the fish counting system. The false negatives are very low in that which means that the number of fish objects miss classified as non-fish objects is low. The Average fish counting rate is 90%.
Analyzing Underwater Videos for Fish Detection, Counting and Classification
4.2
439
Fish Classification Fish Classification Rate ¼
Correctly Classified Fish species objects TP ¼ Actual Fish species Count TP þ FN
Here TP, FN values are calculated from confusion matrix, which is the output of the multi-class support vector machine classifier. Table 3. Fish classification system results No
1 2 3 4 5 6 7 8 9 10 11
Species name
Total count
Correctly classified
Incorrectly classified
Abudefduf vaigiensis Acanthurus nigrofuscus Amphiprion clarkii Chaetodon lunulatus Chaetodon trifascialis Chromis chrysura Dascyllus aruanus Dascyllus reticulatus Myripristis kuntee PlectroglyPhidodon dickii Zebrasoma scopas Average
306
276
30
Fish classification rate 90%
2511
2310
201
92%
2985
2687
298
90%
2494
2394
100
96%
24
22
2
92%
3196 904 3196
2940 877 2876
256 27 320
92% 97% 90%
3004 2456
2673 2013
331 443
89% 82%
271 21,346
184 19,252
87 2094
68% 88.9%
Table 3 summarizes the results of the fish classification system. The classification rate of Zebrasoma scopas fish species is less as the texture of that species is similar to texture. The average fish classification rate is 88.9%.
5 Conclusion and Future Work This paper proposes an approach that gives an accuracy of 93% for detection and 90% for classification. The non-linear motion of the fishes coupled with a complex background results in high false positives. The future work could focus on the reduction of false positives through video enhancement techniques and texture features. Most recent
440
G. D. Lakshmi and K. R. Krishnan
AI methods, for example, deep learning could be used to increase the efficiency of the fish classification system. Compliance with Ethical Standards. ✓ All authors declare that there is no conflict of interest ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Boudhane, M., Nsiri, B.: Underwater image processing method for fish localization and detection in submarine environment. J. Vis. Commun. Image Represent. 39, 226–238 (2016) 2. Spampinato, C., et al.: Detecting, tracking and counting fish in low quality unconstrained underwater videos. VISAPP 2, 514–519 (2008) 3. Prabowo, M.R., Hudayani, N., et al.: A moving objects detection in underwater video using subtraction of the background model. In: 2017 4th International Conference on Electrical Engineering, Computer Science and Informatics (EECSI), Yogyakarta, pp. 1–4 (2017). https://doi.org/10.1109/eecsi.2017.8239148 4. Palazzo, S., Kavasidis, I., Spampinato, C.: Covariance based modeling of underwater scenes for fish detection. In: 2013 IEEE International Conference on Image Processing. IEEE (2013) 5. Rathi, D., Jain, S., Indu, S.: Underwater fish species classification using convolutional neural network and deep learning. In: 2017 Ninth International Conference on Advances in Pattern Recognition (ICAPR). IEEE (2017) 6. Shortis, M., Ravanbakhsh, M., et al.: An application of shape-based level sets to fish detection in underwater ımages. In: GSR (2014) 7. Fabic, J.N., et al.: Fish population estimation and species classification from underwater video sequences using blob counting and shape analysis. In: 2013 IEEE International on Underwater Technology Symposium (UT). IEEE (2013). In Proceedings International Conference on Communication and Signal Processing, ICCSP 2017 (2017) 8. Ghani, A.S.A., Isa, N.A.M.: Enhancement of low quality underwater image through integrated global and local contrast correction. Appl. Soft Comput. 37, 332–344 (2015) 9. Benson, B., et al.: Field programmable gate array (FPGA) based fish detection using Haar classifiers. American Academy of Underwater Sciences (2009) 10. Prabhakar, P., Kumar, P.: Underwater image denoising using adaptive wavelet subband thresholding. In: Proceedings IEEE ICSIP 2010, Chennai, India, pp. 322–327 (2010) 11. Feifei, S., Xuemeng, Z., Guoyu, W.: An approach for underwater image denoising svia wavelet decomposition and high-pass filter. In: Proceedings IEEE ICICTA 2011, Shenzhen, China, pp. 417–420 (2011) 12. Wren, C.R., Azarbayejani, A., Darrell, T., Pentland, A.P.: Pfinder: real-time tracking of the human body. IEEE Trans. Pattern Anal. Mach. Intell. 19(7), 780–785 (1997) 13. Evans, F.H.: Detecting fish in underwater video using the EM algorithm. In: Proceedings of the 2003 International Conference on Image Processing (ICIP), vol. 3, p. III–1029. IEEE (2003) 14. Khanfar, H., et al.: Automatic Fish Counting in Underwater Video Cuenta Automática de Peces en Video Submarino Comptage Automatique de Poisson dans la Vidéo Sous-Marine. Preprocessing. Indian Journal of Science and Technology, pp. 1170–1175 (2014)
Analyzing Underwater Videos for Fish Detection, Counting and Classification
441
15. Singhal, N., Singhal, N., Kalaichelvi, V.: Image classification using bag of visual words model with FAST and FREAK. In: 2017 Second International Conference on Electrical, Computer and Communication Technologies (ICECCT), Coimbatore, pp. 1–5 (2017) 16. Aarthi, R., Arunkumar, C., RagheshKrishnan, K.: Automatic isolation and classification of vehicles in a traffic video. In: 2011 World Congress on Information and Communication Technologies, Mumbai, pp. 357–361 (2011) 17. Venkataraman, D., Mangayarkarasi, N.: Computer vision based feature extraction of leaves for identification of medicinal values of plants. In: 2016 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC). IEEE (2016) 18. https://www.imageclef.org/lifeclef/2015/fish
Plant Leaf Disease Detection Using Modified Segmentation Process and Classificatıon K. Nirmalakumari1(&), Harikumar Rajaguru1, and P. Rajkumar2 1
Department of ECE, Bannari Amman Institute of Technology, Sathyamangalam, Tamilnadu, India {nirmalakumarik,harikumarr}@bitsathy.ac.in 2 Robert Bosch Engineering and Business Solutions Pvt. Ltd, Coimbatore, Tamilnadu, India [email protected]
Abstract. Agriculture is essential in our Indian economy and the plant infections greatly affect the production rate in terms of quality and quantity. Humans get exhausted of monitoring the diseased plants and hence there is a necessity to monitor the plant growth everyday by using some software techniques. The proposed method will routinely detect and classify the plant diseases in exact way. The core process involved in this method comprises of image capturing, preprocessing, segmentation, extraction of informative features and finally classification part to conclude healthy or diseased. The proposed method utilizes a modified segmentation algorithm that improves the segmentation results, thus in turn improves the classification accuracy of the Artificial Neural Network classifier. Keywords: Image capturing Preprocessing Segmentation clustering Feature extraction Artificial neural network
K-means
1 Introduction Agriculture is very essential in India and the farmers have a decision to choose the required crops. One of the important issues in agriculture is plant disease which affects the amount, quality and development of plant growth. Position and categorization of plant infections are simple procedure to improve plant yield and financial improvement. The suitable pesticides for the plant have to be identified to suppress the disease and increase the production. In-order to raise the production with good quality the plant needs to be monitored regularly since the plant disease leads to decrease of the product. For effective cultivation, one should observe the health as well as the infection of the plant. Diseases in plant form the basis for substantial loss of the product. India is the world’s highest producer of pomegranate. India exports 54000 tons of Pomegranates which makes 1.55% of total export in the whole world. Pomegranate is a vital fruit crop, because of its health benefits. 10–31% of production reduces because of the disease found on the leaf of the pomegranate plant. Henceforth the initial stage identification of the disease is essential, endorsing farmers to prevent the damage in crop production for yield growth. Plants suffer from leaf diseases due to bacteria, fungal and viral infections. The various spots and patterns on the different parts like branch, leaf and fruit are noted and that indications are considered for disease detection. © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 442–452, 2020. https://doi.org/10.1007/978-3-030-37218-7_50
Plant Leaf Disease Detection
443
In ancient days, the expertise person would manually monitor and analyze the plant disease that involved huge work and also was time consuming. Image processing is one such procedure that is employed to detect the disease in plant. There are few countries where farmers are not having enough facilities to contact the expertise since it is time consuming and at the high cost. In such situation, image processing technique comes into picture and is beneficial in monitoring the field. The symptoms on the leaves will help in automatically detecting the disease, which is easy and also cheap. The technique involved in processing of images measures the affected area of infection, determines the texture, color and shape of the affected area, and also classifies the infection. The method proposed in this paper will solve the above issues by means of image processing techniques. The second section describes the literature works that had been carried out in this field and third section describes the proposed procedure with its block diagram. Section four discusses the results obtained by the proposed method and finally section five provides the conclusion of this research.
2 Literature Review Ghaiwat et al. [1] developed a review on diverse techniques for classification that can be employed for disease sorting of plant leaves. For particular test example, k-nearestneighbor method works good and it is simple one for class prediction. If non linearly separable data during training makes hard to define optimal parameters, which is one of the drawback in SVM. Authors in paper [2] define four main steps for detecting the leaf diseases. In first step, the input color image with three planes namely red, green and blue layers are converted to HSI, a color transformation structure and can be used as color descriptor. In step two, pixels containing green values are screened and eliminated by employing suitable threshold level. In step three, the useful segments from the above step are obtained using pre-computed threshold value and segmentation is performed in final step. Mrunalini et al. [3] offers the system to categorize and recognize diverse infections that affect plants. The recognition can be performed with machine learning that reduce the currency, computation and time. The method involved for extraction of features is the color co-occurrence method. For programmed leave disease detection, neural networks are utilized. The method projected can considerably provide a provision for accurate leaf disease detection, and appears to be vital approach, in event of root and stem diseases, placing less effort in calculation. In paper [4] explained the identification step that includes four key steps that are described: Initially, in a color image, consider the color alteration arrangement and applyig a exact threshold, the green pixels are screened and detached, and again additionally tracked by segmentation task, and for receiving beneficial segments, the texture information are calculated. The final step involves the classifier to classify the extracted features for disease diagnosis. The strength of the proposed algorithm is shown by means of results for totally 500 leaves of different plants. Kulkarni et al. developed a procedure for initial and correct plant disease detection with ANN (Artificial Neural Network) and different image processing methods. As the projected technique uses the ANN classifier and Gabor filter for categorization and extracting features respectively and it produces up to 91% recognition rate. Thus the
444
K. Nirmalakumari et al.
ANN utilizes the combination of features to detect the disease [5]. Authors discussed disease recognition in Malus domestica over an active procedure by means of texture, clustering with k-means and analysis of colors [6]. For classifying and distinguishing various agricultural plants, it customs the features containing texture and colors that vary in normal and abnormal plant leaves. In future, the task of classification can be achieved by employing clustering, Bayes and principal component classifiers. Rastogi [7], developed an approach for rating the infections in plant leaves in an automatic way. It employs Machine Vision Techniques and Artificial Neural Network (ANN) for automatic detection of leaf disease with grading. This system are very supportive as it is effective than normal procedure. The measure of eulidean and clustering are used for segmentation of leaf image for identifying foreground and background areas. Finally the plant disease affected area is estimated and the leaves are graded into different classes. Islam [8], presented the leaf disease detection using two classifiers namely GA and PNN which provides accurate and less computational work. The experiment results indicate that the accuracy of classifier with GA performs good than the PNN. Tripathi [9], proposed a system for plant leaf disease detection by processing images and the application of NN techniques. The color and texture features are extracted and they are provided as inputs to train the NN classifier for successive detection of leaf diseases. Muthukannan [10], employed three classifiers namely Feed Forward Neural Network (FFNN), Learning Vector Quantization (LVQ), and Radial Basis Function (RBF) for classification of leaf disease of two types namely bitter gourd and bean based on their evaluated performance measures. Papers [11] describes the different review that has been carried out for feature extraction of plant leaf diseases and paper [12] explains the crop disease identification using image segmentation technique. Shortcoming of present works: • The application further require improvement in accuracy for certain cases. Additional effort to optimize the solution is required. • Data should be known in advance for necessary segmentation. • Database addition is desired for achieving higher accuracy. • Only rare diseases are being addressed. Many more diseases has to be found. • The imaginable causes that promote misclassifications includes various infections in plants, optimization is required and extra training trials are desirable for further cases and to envisage the disease in an accurate way.
3 Proposed Methodology Leaves and its parts are acquired using high quality digital cameras and those images are processed in an efficient way to obtain essential features that can be used in the analysis.
Plant Leaf Disease Detection
445
Algorithm: • The foremost process is to acquire the image of a plant with digital camera. • Input image requires preprocessing for quality improvement and noises present in images. Image of particular part of plant leaf is cropped to obtain the attentive image area and then smoothening is finished by means of the smoothing filter. Enhancement of image is done for improving the contrast. The input RGB image is converted to HSV format. • The green colored pixels are screened in the third step. • A threshold value is initialized and if the green component intensity value is lower than initial threshold value, then those pixels are removed. • Acquire the worthy segments using morphological and K-means clustering opeartions. • Estimating the features using GLCM. • Classify the leaf disease and its affected area computation using neural network. The steps involved for plant leaf disease detection is shown in Fig. 1. The image of various types of leaves are acquired using digital camera and preprocessing is performed to remove noise. The preprocessing process is done for obtaining the valuable features. Finally a classifier is employed to identify and categorize the type of disease.
Input RGB Leaf Image
Perform Preprocessing to Denoise the Image
Modified Image SegmentaƟon Algorithm & k-means Clustering
Extracted GLCM features
Diseased Leaf Detection and Affected Area Calculation
Fig. 1. Block diagram for disease identification in plants
3.1
Image Acquisition
The blocks involved for detecting the infections in plants are shown in Fig. 1. The Leaf images are acquired using webcam camera with maximum image resolution of 5500 pixels wide and 3640 pixels height. The images are preprocessed and it is resized to 640 * 480 pixels. The input image is a color and is changed to gray image for further process.
446
K. Nirmalakumari et al.
3.2
Image Pre-processing
This step helps in the elimination of lower frequency levels related noise. The disturbances and noise levels prevailing in the image are removed by means of image clipping, smoothing and enhancement steps. A color space transformation structure is formed that explains the RGB to Hue Saturation Value (HSV) color space conversion. Hue defines the perceived pure color by a viewer. Saturation represents the quantity of white light supplemented to Hue and light’s amplitude is described by Value. The Hue component is considered for further process as it provides more valuable information than the Saturation and Value components. 3.3
Modified Segmentation
The infected area in a leaf is obtained with segmentation. Permitting the ROI, the image is divided into some significant region. The proposed method includes morphological operations prior to k-means clustering which will enhance the result of segmentation. The main morphological operations include Erosion, Dilation, Opening and Closing. The morphological processes such as opening is performed to fill the holes and closing is done to remove the unwanted areas. 3.4
K-Means Clustering
The separated parts of leaves is to be converted as four dissimilar clusters. It is beneficial if the k value is known. The algorithm puts the information of pixel in the clusters. One among four comprises the diseased leaf cluster. Then a definite threshold rate is calculated for the segmented pixels and most of the green pixels are screened when pixel intensity of green is smaller than the threshold value. Further the RGB modules of the this pixel is allotted to a value of zero, because it would characterize the healthiest parts of the leaf and wont contribute to disease that leads to computation time reduction. The zero values of RGB modules and border pixels of diseased cluster gets totally eliminated. 3.5
GLCM Features
GLCM is formed for separate pixel plot for Hue and Saturation images of diseased cluster. The graycomatrix function in MATLB creates a GLCM matrix by operating how frequently a exact intensity of pixel i occurs in a specified spatial association to a j pixel value. The spatial correlation is the current pixel and its straight right pixel, which is default in MATLAB. Multiple GLCM can be a produced using offset that rapidly shows the pixel associations of varying distance and direction. The three directions in horizontal, vertical and diagonal directions can be taken. The following GLCM features are considered. Contrast: Produces the contrast intensity amount between a pixel and its end-to-end pixels in entire image. Range = [0 (size (GLCM, 1)-1) ^2]. The contrast is computed by the formula given below:
Plant Leaf Disease Detection
Contrast ¼
XN1 i;j¼0
ði; jÞ2 Cði; jÞ
447
ð1Þ
Where, the pixel information at location (i,j) is denoted by C(i,j). Energy: Produces the sum of squared pixel values in the GLCM matrix. The Range is within 0 and 1 and the value is equivalent to one for a persistent image. The energy can be computed using the below formula: XN1 Energy ¼ Cði; jÞ2 ð2Þ i;j¼0 Homogeneity: Computes the element distribution closeness value to the GLCM diagonal. Range = [0 1]. The value is one for a diagonal GLCM matrix and its formula is given below: XN1 2 Homogeneity ¼ C ð i; j Þ=ð1 þ i jÞ ð3Þ i;j¼0 Correlation = Amount of correlated pixel to its neighboring pixel over the complete image. Range = [− 1 1]. XG1 XG1 ði X jÞX Pði; jÞ lx ly ð4Þ Correlation ¼ i¼0 j¼0 rx X ry The four mentioned statistics brings data about image textures. 3.6
Neural Network Based Classification
The features extracted are provides data to pretrained NN for automatic classi disease categorization. Artificial Neural network (ANN) is an eminent classification tool and also active classifier for numerous real applications. The process of training and validation are the vital steps in evolving an exact process model by means of NNs. The dataset is partitioned into training and validation sets with their features to demonstrate the proficient ANN model. The type of training is definite for getting perfect classification of plant infections.
4 Experimental Results and Observations The four different banana leaves are considered to show the results of our proposed method. Out of four, three are diseased leaves namely Banana Anthracnose, Banana Leaf streak, Banana Spot leaf and a Normal banana leaf. The input RGB image to gray image is done and then denoising is performed to remove the noise followed by segmentation using morphological operations and k-means clustering. Finally the type of disease detection and its affected area are computed using the ANN Classifier. The Fig. 2, the banana anthracnose input color image and it is converted to grayscale. Figure 3 shows the modified segmentation algorithm results and its corresponding output on the color image.
448
K. Nirmalakumari et al.
Fig. 2. Banana Anthracnose & its grayscale [3] Fig. 3. Modified segmentation & its color
Fig. 4. Banana leaf streak and its grayscale [7]
Fig. 5. Modified segmentation & its color
The image of banana leaf streak and its corresponding grayscale image are depicted in Fig. 4. The Fig. 5 shows the segmented output and its results on the color image. Another infected banana leaf with spots is shown in Fig. 6 and its corresponding segmentation output results for gray and color images are shown in Fig. 7.
Fig. 6. Banana spot leaf and its grayscale [9]
Fig. 7. Modified segmentation & its color
The Fig. 8 shows the normal banana leaf without infections and the segmentation results are shown in Fig. 9. From the resultant images, it is clear that the modified approach of segmentation aids in finding the infected area in an exact way, which will improve the detection accuracy of the ANN classifier.
Plant Leaf Disease Detection
449
The Table 1 shows the computation of affected areas of banana leaves with four GLCM features and its percentage calculation shows the spread of infections in the plant leaf. The normal banana leaf is classified with zero affected area. Hence this proposed algorithm is very robust in the detection of plant leaf diseases.
Fig. 8. Normal banana leaf [11]
Fig. 9. Modified segmentation & its color
Table 1. Percentage of affected area calculation for four types of banana leaves S.No. Type of leaf
GLCM features
Affected Area in sq mm
Total area in sq mm
Percentage of affected Area
1.
Banana Anthracnose
74529
196282
37.9703
2.
Banana Leaf streak
83120
191851
43.3251
3.
Banana Spot leaf
16608
183560
9.04772
4.
Normal banana leaf
Contrast = 3.5842 Correlation = 0.8504 Energy = 0.4433 Homogeneity = 0.9360 Contrast = 2.8351 Correlation = 0.8842 Energy = 0.4458 Homogeneity = 0.9494 Contrast = 1.3098 Correlation = 0.9257 Energy = 0.6142 Homogeneity = 0.9766 Contrast = 0.7446 Correlation = 0.9645 Energy = 0.5573 Homogeneity = 0.9867
NIL
196585
0
From Table 2, it is identified that the correlation feature with respect to horizontal offset produces a non linear curve for the infected leaves and for the normal banana leaf the graph approximately shows a linear line. Thus using GLCM features and ANN, the plant leaf disease can be detected in an accurate way.
450
K. Nirmalakumari et al. Table 2. Texture correlation graph vs offset for four banana leaves Sl.No.
Leaf Type
1.
Banana Anthracnose
2.
Banana Leaf streak
3.
Banana Spot leaf
4.
Normal banana leaf
Texture Correlation Graph
Plant Leaf Disease Detection
451
5 Conclusion The proposed method employs the usage of image processing that performs preprocessing, segmentation and extraction of useful features. The morphological operators are included in the segmentation part that improves the detection of affected area to a greater level and the classification of different plant diseases is done using ANN classifier. This proposed method is very significant in detection of leaf disease in a precise and simple way. k-means clustering along with morphological operators namely opening and closing aids in segmentation. The GLCM features extracted after this step are the important features and they are applied to neural network classifier to identify the affected area and the type of plant disease. The proposed method can be utilized for all kinds of diseased plant and also the percentage of affected area of the disease is computed, hence the user can select the appropriate pesticides to control and prevent the disease infection in plants. The future scope of this project will be the application of different algorithms in segmentation, feature extraction and classification parts. Compliance with ethical standards ✓ All authors declare that there is no conflict of interest ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Ghaiwat, S.N., Arora, P.: Detection and classification of plant leaf diseases using image processing techniques: a review. Int. J. Recent. Adv. Eng. Technol. 2(3), 2347–2812 (2014). ISSN 2. Dhaygude, S.B., Kumbhar, N.P.: Agricultural plant leaf disease detection using image processing. Int. J. Adv. Res. Electr. Electron. Instrum. Eng. 2(1), 599–602 (2013) 3. Badnakhe, M.R., Deshmukh P.R.: An application of K-means clustering and artificial intelligence in pattern recognition for crop diseases. In International Conference on Advancements in Information Technology, vol. 20. IPCSIT (2011) 4. Arivazhagan, S., Newlin Shebiah, R., Ananthi, S., Vishnu, V.S.: Detection of unhealthy region of plant leaves and classification of plant leaf diseases using texture features. Agric. Eng. Int. CIGR. 15(1), 211–217 (2013) 5. Kulkarni Anand, H., Ashwin Patil, R.K.: Applying image processing technique to detect plant diseases. Int. J. Mod. Eng. Res. 2(5), 3661–3664 (2012) 6. Bashir, S., Sharma, N.: Remote area plant disease detection using image processing. IOSR J. Electron. Commun. Eng. 2(6), 31–34 (2012). ISSN: 2278-2834 7. Rastogi, A., Arora, R., Sharma, S.: Leaf disease detection and grading using computer vision technology & fuzzy logic. In: 2nd International Conference on Signal Processing and Integrated Networks (SPIN), IEEE, pp. 500–505 (2015) 8. Islam, M.N., Kashem, M.A., Akter, M., Rahman, M.J.: An approach to evaluate classifiers for automatic disease detection and classification of plant leaf. In: International Conference on Electrical, Computer and Telecommunication Engineering, RUET, pp. 626–629 (2012)
452
K. Nirmalakumari et al.
9. Tripathi, G., Save, J.: An ımage processıng and neural network based approach for detection and classification of plant leaf diseases. Int. J. Comput. Eng. Technol. IJCET 6(4), 14–20 (2015) 10. Muthukannan, K., Latha, P., Pon Selvi, R., Nisha, P.: Classıfıcatıon of diseased plant leaves using neural network algorithms. ARPN J. Eng. Appl. Sci. 10(4), 1913–1918 (2015) 11. Pandey, S., Singh, S.K.: A review on leaf disease detection using feature extraction. Int. Res. J. Eng. Technol. (IRJET),06(01) (2019) 12. Kumar, K.V., Jayasankar, T.: An identification of crop disease using image segmentation. Int. J. Pharm. Sci. Res. 10(3), 1054–1064 (2019)
Neuro-Fuzzy Inference Approach for Detecting Stages of Lung Cancer C. Rangaswamy1,2(&), G. T. Raju3, and G. Seshikala4 1
2
REVA University, Bangalore 560064, India [email protected] ECE Department Sambrama Institute of Technology, Bangalore 560097, India 3 CSE Department RNSIT, Bangalore 560098, India [email protected] 4 ECE School, Reva University, Bangalore 560064, India [email protected]
Abstract. A hybridized diagnosis system called Neuro-Fuzzy Inference System (NFIS) uses neural network for the classification of lung nodule into benign or malignant and then a fuzzy logic for detecting various stages of lung cancer is proposed in this paper. Using fuzzy logic based algorithms such as Enhanced Fuzzy C-Means (EFCM) and Enhanced Fuzzy Possibilistic C-Means (EFPCM), the required features from the lung CT scan image are segmented and extracted using GLCM and GLDM matrix. Then the features are selected using DRASS algorithm. These features are fed as input for Radial Basis Function Neural Network (RBFNN) classifier with k-means learning algorithm for detecting the lung cancer. Once the lung cancerous nodules are detected, the result of RBFNN is combined with fuzzy inference system that determines appropriate stage of the lung cancer. Experiments have been conducted on ILD lung image datasets with 104 cases. Results reveal that our proposed NFIS effectively classify lung nodule into benign or malignant along with the appropriate stage with considerable improvement in respect of Recall/Sensitivity 94.44%, Specificity 92%, Precision 96.22%, Accuracy 93.67%, and False Positive Rate 0.08. Keywords: Fuzzification Fuzzy
Inference Defuzzification K-means Neuro-
1 Introduction Lung cancer caused more deaths than any other disease in men and women [2]. The identification of early lung cancer is essential for effective action and can rise the chance of survival of the patient [1, 3]. A few logical intercessions have utilized to enhance early detection. Lately, frameworks fueled by machine learning apparatuses have been conveyed to help medicinal work force in breaking down CT image and distinguishing the event of lung disease with significant achievement. Artificial Neural Networks(ANNs) have been used by researchers for classification of lung nodule into benign or malignant. However, ANNs would not state the appropriate stage at which the cells are pretentious [11]. Or, Fuzzy logic is capable of determining the amount of the illness, though it is not adequately practiced in erudition and generalization [4]. © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 453–462, 2020. https://doi.org/10.1007/978-3-030-37218-7_51
454
C. Rangaswamy et al.
In this paper, we propose a hybridized NFIS to achieve better performance in the classification and determination of the stage of cancer to avoid the limits of these methods. It uses a gray level matrix for the extraction of features.
2 Related Work ANN was used to classify according to the extracted characteristics. Fatma, et al. projected a technique for early detection of lung growth [1]. The Bayesian classification method for the pre-processing of image data was used using histogram analysis. The diagnosis was carried out using an ANN and a SVM to classify cells [6]. Ekta and others also suggested a pulmonary cancer detection and classification system using curve let transformation and NN [2]. Hamad proposed a pulmonary cancer diagnostic system using fuzzy logic and neural networks [3]. Al-Amin [12] presented work using GLCM and NN’s to categorize cells as each malignant or benign, similar to the previous work presented by John [10]. Table 1 provides a summary of the contributions of various authors. Table 1. Contributions Summary Authors/Reference Classifier/ Methodology
Datasets
Features
Performance
Taher, et al. [1]
NCI
accuracy-97%
VIA/IELCAP
geometric, and chromatic –
accuracy-90%
LIDC
GLCM
accuracy-89.6%
Solanki [2] Hamad [3] Al-Amin [12]
Bayesian with histogram analysis Curvelet transform and neural networks Fuzzy logic and neural networks Neural networks
NCI and ACS contrast, correlation, energy, homogeneity
accuracy-96.67%
3 Neuro-Fuzzy Inference System The Fuzzy framework is exceptional in light of the fact that it can deal with numerical information and semantic learning all the while, much the same as the procedure of human derivation [5]. It gives a surmising morphology that enables evaluated human thinking aptitudes to be connected to information based frameworks and furthermore offers a numerical capacity to comprehend the vulnerabilities related with human scholarly advancements, for example, considering and thinking [8]. The Fuzzy rationale worldview can basically be abridged as a lot of phonetic principles connected to the idea of Fuzzy inclusion and compositional tenets of induction. The point is to have a model fit for mapping semantic control rules from the contribution of master learning to a programmed control yield [9]. Fluffy rationale can be demonstrated to decipher the properties of neural systems and give increasingly precise portrayal of their execution [7]. Figure 1 shows the architecture of NFIS system. It consists of RBFNN and FIS. FIS is categorized by fuzzy sets and fluid if-then form rules: “If x is A then y is B” where A and B are fuzzy sets and x and y are sets members. The FIS contains 4 key portions: fozzifier, rules, defuzzifier and inference engine: FIS performs input
Neuro-Fuzzy Inference Approach for Detecting Stages of Lung Cancer
455
fuszification and determines the extent to which the input is part of the fuszy set. Using the Fuzzy operator, the result of If part of the rule is combined to find the degree of support for the rule and applies to the output function. It uses the support degree for the whole rule to determine the fuse output set. The part of the fuzzy rule Then part assigns the output to a whole fuzzy set. Aggregation is used to combine the output of all the rules into a single fuzzy set and the appropriate decisions are made.
Fig. 1. Architecture of NFIS [4]
The FIS has four algorithmic steps:
4 Methodology We have modeled an smart diagnostic scheme by means of neural networks and fuzzy logic. Image data sets are handled and transmitted to the RBFNN to train and classify lung cells using the learning method k-means. The cancer cells were then transferred to a FIS in order to decide the stage of lung cancer. 4.1
Data Collection
ILD CT image data set with 104 cases, out of which 79 cases have been used for training and 25 cases used for testing.
456
C. Rangaswamy et al.
4.2
Features of the System
A CT image from the data set undergoes several procedures in the training model, such as preprocessing, improvement, segmentation and extraction of features. The outputs of these procedures are transmitted to the RBFNN as input training data. In the diagnostic model, the extracted features of the training model were transferred to the neural network for classification. In order to determine the stage of cancer, the classified output is then transferred to the fuzzy system. 4.3
Features Extraction
After the intention regions like colour, shape, size and intensity are extracted, it checks whether the extracted areas are malignant or not. The nodes are examined to once differentiate the true positive areas from the false. This phase is very significant because it excerpts important features to reduce image processing difficulty and assist to recognize small cancer nodes. Diameter, perimeter, area, eccentricity, entropy and intensity are the features extracted in this work. 4.4
Fuzzy Logic System
The Fuzzy scheme consists of 3 theoretical mechanisms: the membership function used as a rule base in the Fuzzy rule, which includes the selection of the Fuzzy rule, the rule base and the cognitive device, which calculates the inference procedure on the rules in order to reach a reasonable conclusion. We have used the Fuzzy system in our work to understand the productivity of neural network and to describe values precisely. The language term with the subsequent membership function is defined: 8 T1; if 2:5 cm\T 3 cm > > < T2; if 3 cm\T 5 cm MF ¼ > T3; if 5 cm\T 7 cm > : T4; if T [ 7 cm Organizing includes assessing the measure of a malignant growth and its invasion into neighboring tissues and the presence or non-presence of metastasis in lymph hubs or different organs. Stage One implies that the development of disease stays restricted to the lung. On the off chance that the malignant growth is as yet restricted to the lung yet is near the lymph, organize 2 is included. It’s stage 3 if it’s in the lung and the lymph hub. In the event that the malignancy has penetrated different parts of the body, we state it is as of now in stage 4. 4.5
Fuzzy Rules
The FIS uses the Fuzzy rule base to make decisions about the diagnosis of disease, and the system’s efficiency depends on the Fuzzy rules. Systems based on rules are mainly used for medical diagnostics. The knowledge base stores all information on the characteristics and the disease in the form of rules. The rules are generated according to data collected by a domain expert.
Neuro-Fuzzy Inference Approach for Detecting Stages of Lung Cancer
457
Rule 1 : if T1 Then Stage I Rule 2 : if T2 Then Stage II Rule 3 : if T3 Then Stage III Rule 4 : if T4 Then Stage IV 4.6
Decision Making in NFIS
Neural network’s output was combined with the fuzzy system to make the final decision on the stage of a lung cancer. If the output value neural network does not exceed the threshold value, it displays ‘no sign of lung cancer’. Otherwise it display ‘lung cancer is predicted’ and adjusts the fuzzy system to produce the crisp value based on the rule match. The NFIS makes final decision based on the crisp value as follows:
5 Experimental Results The outcomes of proposed NFIS are shown in Table 3. Table 2. Extracted Features Image Diameter Perimeter Entropy Intensity Eccentricity Img-1 Img-2 Img-3 Img-4 Img-5 Img-6 Img-7 Img-8 Img-9
2.7 3.1 3.7 5.6 3.5 6.7 7.4 2.2 8.2
598 84 69 64 252 79 175 180 349
4.0427 4.7536 4.1936 4.7971 4.7755 4.7747 4.5695 4.7904 4.7896
250 250 250 250 244.22 117.65 250 224.14 248.95
0 0 0 0 0 0 0 0 0
458
C. Rangaswamy et al. Table 3. Results of NFIS Image Img-1 Img-2 Img-3 Img-4 Img-5 Img-6 Img-7 Img-8 Img-9
RBFNN-Classification Abnormal Abnormal Abnormal Abnormal Abnormal Abnormal Abnormal Normal Abnormal
Fuzzy-Stage Stage I Stage II Stage II Stage III Stage II Stage III Stage IV None Stage IV
Fig. 2. Original Lung CT scan Images
Neuro-Fuzzy Inference Approach for Detecting Stages of Lung Cancer
459
Fig. 3. Preprocessed lung CT scan Images
Table 4. Performance of RBFNN Classifier Data
Actual
Predicted Positive(present) Training Positive TP - 51 Negative FN - 3 Testing Positive TP - 14 Negative FN - 3 Total
Total Negative(absent) FP - 2 79 (76%) TN - 23 FP - 0 25 (24%) TN - 8 104
After the lung CT scan regions have been separated into positive and negative regions, the negative regions are rejected while the positive regions are maintained, feature extraction has been performed. The features like area, perimeter, eccentricity and diameter are take out. Table 2 presents the results of some of the extracted features. The results of the NFIS suggested are shown in Table 3 (Figs. 2, 3, 4, 5, 6) (Tables 4 and 5).
460
C. Rangaswamy et al.
Fig. 4. Segmented lung CT scan Images
TP rate or Recall or Sensitivity = TPs/(TPs + FNs) = 94.44% TN rate or Specificity = TNs/(TNs + FPs) = 92% Positive Predictive Value or Precision = TPs/(TPs + FPs) = 96.22% Accuracy = (TPs + TNs)/(TPs + TNs + FNs + FPs) = 93.67% False Positive Rate (FPR) = FPs/(TNs + FPs) = 0.08
Table 5. Results of RBFNN Classifier Actual RBFNN Classification Diagnosis Normal Benign Malignant Normal 23 2 – Benign 2 24 – Malignant 2 3 23 Total 27 29 23
Total 25 26 28 79
Neuro-Fuzzy Inference Approach for Detecting Stages of Lung Cancer
Fig. 5. Classified lung CT scan Images – Results1
Fig. 6. Classified lung CT scan Images– Results2
461
462
C. Rangaswamy et al.
6 Conclusion A hybridized diagnostic system called the Neuro-Fuzzy Inference System (NFIS) has been presented in this paper, which uses neural networks and fuzzy logic to classify lung cells into cancer and non-cancer and detect lung cancer stages. The proposed NFIS was able to categorize the lung CT scan images into benign or malignant together with the appropriate stage after a series of operations as indicated in the methodology. Experimental results and comparison with other existing methods indicates the superiority of our proposed method, in terms of classification accuracy and computational speed. Proposed system can assist physicians in medical centers in assessing the risk of lung cancer in patients. But the challenge in designing a system is in providing accurate results for detection of lung cancer in early stages to reduce the complexity of treating cancer in advanced stages. Future research could be carried out on a mobile Android platform for real-time diagnostics. Acknowledgment. We acknowledge the support extended by VGST, Govt. of Karnataka, in sponsoring this research work vide ref. No. KSTePS/VGST/GRD-684/KFIST(L1)/2018. Dated 27/08/2018. Compliance with ethical standards ✓ All authors declare that there is no conflict of interest ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Taher, F., Werghi, N., Al-Ahmad, H.: Computer aided diagnosis system for early lung cancer detection. Algorithms 8(4), 1088–1110 (2015) 2. Solanki, E., Agrawal, M.A., Parmar, M.C.K.: Lung cancer detection and classification using curvelet transform and neural network. Int. J. Sci. Res. Dev. 3(3), 2668–2672 (2015) 3. Hamad, A.M.: Lung cancer diagnosis by using fuzzy logic. Int. J. Comput. Sci. Mobile Comput. 5(3), 32–41 (2016) 4. Viharos, Z.J., Kis, K.B.: Survey on neuro-fuzzy systems and their applications in technical diagnostics and measurement. Measurement 67, 126–136 (2015) 5. Suparta, W., Alhasa, K.M.: Modeling of Tropospheric Delays Using ANFIS. Springer International Publishing, Heidelberg (2016) 6. Nielsen, F.: Neural Networks algorithms and applications. Niels Brock Business College, pp. 1–19 (2001) 7. Rojas, R.: Neural Networks: A Systematic Introduction. Springer, Heidelberg (2013) 8. Nauck, D., Kruse, R.: Designing neuro-fuzzy systems through backpropagation. In: Pedrycz, W. (eds.) Fuzzy Modelling, vol. 7, pp. 203–228. Springer, Boston (1996) 9. Lee, C.-C.: Fuzzy logic in control systems: fuzzy logic controller. I. IEEE Trans. Syst. Man Cybern. 20(2), 404–418 (1990) 10. Jang, J.-S.R.: ANFIS: adaptive-network-based fuzzy inference system. IEEE Trans. Syst. Man Cybern. 23(3), 665–685 (1993) 11. Ansari, D., et al.: Artificial neural networks predict survival from pancreatic cancer after radical surgery. Am. J. Surg. 205(1), 1–7 (2013) 12. Al-Amin, M., Alam, M.B., Mia, M.R.: Detection of cancerous and non-cancerous skin by using GLCM matrix and neural network classifier. Int. J. Comput. Appl. 132(8), 44 (2015)
Detection of Lung Nodules Using Unsupervised Machine Learning Method Raj Kishore1(&), Manoranjan Satpathy2, D. K. Parida3, Zohar Nussinov4, and Kisor K. Sahu5 1
Virtual and Augmented Reality Centre of Excellence, Indian Institute of Technology Bhubaneswar, Bhubaneswar 752050, India [email protected] 2 School of Electrical Sciences, Indian Institute of Technology Bhubaneswar, 752050 Bhubaneswar, India [email protected] 3 Department of Radiotherapy, All India Institute of Medical Sciences, Bhubaneswar 751019, India [email protected] 4 Department of Physics, Washington University in Saint Louis, St. Louis, MO 63130-4899, USA [email protected] 5 School of Minerals, Metallurgical and Materials Engineering, Indian Institute of Technology Bhubaneswar, Bhubaneswar 752050, India [email protected]
Abstract. Machine learning methods are now becoming a popular choice in many computer-aided bio-medical image analysis systems. It reduces the efforts of a medical expert and helps in making correct decisions. One of the main applications of such systems is the early detection of lung cancerous nodules using Computed Tomography (CT) scan images. Here, we have used a new method for automated detection of lung cancerous/non-cancerous nodules. It is a modularity maximization based graph clustering method. The clustering is done based on the different region’s grayscale values of the CT scan images. The clustering algorithm is capable of detecting nodules of size as small as 4 pixels in two dimension (2D) or 9 voxels in three dimensional (3D) data. The advantage of nodule detection is that it can be used as an extra feature for many supervised learning algorithms especially for those Convolutional Neural Networks (CNN) based architectures where pixel-wise segmentation of data might be required. Keywords: Machine learning Lung cancer Graph clustering Modularity CNN Segmentation CT scan
1 Introduction In 2018 there were 17 million peoples affected by cancer and more than 50% (9.6 million) were died [1, 2]. The most deaths (around 33%) were caused by smoking and use of tobacco, which directly affects the lung. These statistics reveal that lung cancer © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 463–471, 2020. https://doi.org/10.1007/978-3-030-37218-7_52
464
R. Kishore et al.
continues to be the major cause of death. The survival probability of a lung cancer patient is inversely proportional to the size of the cancerous nodule at the time of detection. So the early detection of a cancerous nodule in the lung can help medical experts to take timely therapeutic intervention and reduce mortality. There are various algorithms have already been developed for image segmentation. For example, a fastRCNN algorithm [3] is often used for a region of interest (ROI) detection in CT scan images. These ROIs can be used for cancer detection. Mourchid et al. [4] have proposed a new perspective of image segmentation using community detection algorithms for better segmentation. Zhang et al. [5] has given hisFCM method which was an improvised form of Fuzzy c-mean (FCM) for image segmentation based on the histogram of the given image. There are several other algorithms [6, 7] developed for segmenting the image for object detection. In the present article, a new state-of-the-art technique is introduced for detecting cancerous/non-cancerous cells/nodules inside the lung. We have used an unsupervised machine learning method for nodule detection which was previously developed for the clustering of spatially embedded networks [8]. It clustered the given network based on the similarity between nodes. This similarity parameter can code different node properties. For example, this parameter can measure similarities between the degrees or intensities of individual nodes/pixels or, in a physical context, depending on the similarities of the stresses or force acting on disparate nodes. In this article, we have detected nodules by clustering the given CT scan images based on the different pixels grayscale values of the CT scan images. These detected nodules can further be classified as cancerous or non-cancerous by passing it through a trained CNN based architecture. The morphological information of nodules detected by the clustering method can also be used as an extra feature during the training of these algorithms. We can pass it as a second channel in CNN based architectures for training. We are expecting that this type of superimposition of information from unsupervised machine learning tool on the top of a supervised learning algorithm will increase the accuracy of these algorithms because it will help in identifying the boundaries of nodules. The mapped information of nodules location in the image data will aid especially those CNN based algorithms which are based on pixel-wise segmentation data [9, 10]. In this article, we have shown that our graph clustering method is detecting the cancerous nodules (labelled by medical experts) along with other non-cancerous nodules.
2 Method In this section, we will discuss the unsupervised machine learning method which is used for detection of the cancerous/non-cancerous nodules in the CT scan lung data. This cancer database contains a fairly large number of training data for lung cancer, which is truly valuable for supervised machine learning for cancer research. However, these machine learning schemes are not truly based on sound scientific physiological considerations, but rather purely rely on the mechanistic understanding of computer vision. Therefore, before really applying these schemes on the Indian ethnic population, one needs to ascertain the applicability of these training data. In the absence of such high quality medically verifiable large training data, such kind of extrapolation
Detection of Lung Nodules Using Unsupervised Machine Learning Method
465
will be always questionable. Therefore, using an unsupervised scheme parallel to supervised learning might be useful to remove the ill-effects of such training bias. We have used the NSCLC dataset [11] because a lot of people are using this data for the task of cancer segmentation. The flow chart of the proposed method is shown in Fig. 1. The first step is converting the ‘.dcm’ files from raw data into a readable ‘.tiff’ format because ‘.dcm’ the file is not an ideal file type for generating networks. The second step (explained in details in Sect. 2.1) is the separation of the lung portion from the CT scan image data because it contains many irrelevant body portions which unnecessarily increases the number of pixels to be studied. So this lung segmentation process reduces the actual nodule detection time. The third step (explained in details in Sect. 2.2) is network generation from the separated lung portion by considering each pixel/voxel as node and edge between first neighbouring pixels. The fourth step (explained in details in Sect. 2.3) is a clustering of the network generated in the previous step using unsupervised clustering method. The clustering method assembles the closely related pixels (in terms of their grayscale values) in one community. In the fifth step, we assign each community as separate nodule except for the single-pixel communities which belong to the black portion of the image. The sixth and seventh step is for detecting 3D lung nodules. For this purpose, we first convert the grayscale CT image into a binary image by assigning 0 and 1 for nodule pixels and other than nodule pixels respectively. After binary conversion, we stacked these 2D binary images sequentially (based on their Z location) for detecting 3D lung nodules.
Fig. 1. Flow chart of lung nodule detection using unsupervised machine learning method
2.1
Lung Segmentation
A single slice in a CT scan consists of 262144 pixels out of which, most of the pixels are not lung portion. It consists of a ribcage, stomach and other organs which are not important for lung cancer detection. Thus segmentation of lung portion is an important step in cancer detection. The pixel value corresponds to the electron density inside the body and varies from −1024 to +1000. All the nodules, organs and blood vessels have a high value of electron density compared to soft tissues. We have segmented the lung region from scans by using thresholding and morphological operations such as dilation, erosion, etc. [12] because there is a considerable amount of intensity change across the boundary of the lungs. Pure thresholding will not segment the boundary regions properly. Watershed algorithm [13] with the asymmetric kernel is used in creating a mask where the regions in the mask with value +255 corresponds to lung region and
466
R. Kishore et al.
the remaining region is non-lung portion. The lung portion varies as we move from top to bottom thus segmentation of lung portion from original Dicom images, reduces the average number of pixel/node per image from 262,144 to approximately 16000 pixels/node which is nearly 16 times smaller. Due to this reduction in node size, the overall clustering time reduces without compromising with the accuracy (Fig. 2).
Fig. 2. Segmentation of lung region from CT scan image. (a) The CT scan image and (b) The segmented lung region (white region).
2.2
Network Generation
A network consists of nodes and edges between these nodes. We can represent the CT scan images into a network by considering the pixel (2D) or voxels (3D) centres as the node position and an edge is present between a node to its first nearest neighbouring pixels. We have converted the pixels of the segmented lung region into a network by considering the same protocol. In 2D, a maximum of eight neighbours and in 3D, a maximum of twenty-six neighbour are possible. The network generated from the lung pixels (Fig. 3(b)) is nearly an 8-regular network because the degree of each node is equal to eight (except for the boundary nodes). The network is un-weighted, as we have used the weight of each edge as unity. So the node degree cannot be used as a clustering parameter (clustering parameter helps decide the community membership of each pixel). Since the pixels greyscale values are different and have a large range from −1024 to +1000, it is used as the clustering parameter for such networks. 2.3
Network Clustering
The network generated from the lung portion is clustered into “communities” by optimizing (i.e. maximizing) a modularity function developed by Raj et al. [8] for spatially embedded networks. It was successfully applied on the granular network and was able to determine the “best” resolution scale at which the community structure has the highest associability with the naturally identifiable structures. The modularity function is given below:
Detection of Lung Nodules Using Unsupervised Machine Learning Method
467
Fig. 3. Network generation from (a) the segmented lung portion. (b) The network formed by considering each pixel center as node and edge between first nearest neighbor pixels.
QðrÞ ¼
1 X aij Aij hðDxij Þbij Jij 2d ri ; rj 1 2m i;j
ð1Þ
Here m is the total number of edge, Aij is the adjacency matrix, Jij = 1−Aij, d is the Kronecker delta function, r is the community membership. The aij and bij are the strength of connected and missing edges between nodes i and j respectively. It is the difference between the local average greyscale value (a clustering parameter) of node i and j and the global average greyscale value of the given network, aij ¼ bij ¼ Where, \k [ ¼ N1
N P r¼1
ki þ kj \k [ 2
ð2Þ
kr and N is the total nodes and k is the node degree.
Our method has mainly two distinctions over other existing methods of modularity calculation. First, it does not use any null model. Second, it considers the effect of intercommunity edges while detecting the most modular structure. It also restricts the over penalization for missing links between two distant nodes within the same community by using Heaviside unit step function h(Dxij). It defines the neighborhood and penalizes only for the missing links inside the neighborhood. Here Dxij ¼ xc ~ ri ~ rj is the difference in Euclidian distance between nodes i, j and a predefined cutoff distance for neighborhood xc, which is chosen as xc = 1.05(Ri + Rj) for the present study (where R is the distance between two touching pixels). h Dxij ¼
1 0
Dxij [ 0 ðwithin cut off) otherwise ðoutside cut off)
ð4Þ
The clustering algorithm used here tries to merge a node with any of its connected neighbors to increases the overall modularity of the network. This process continues until no more merging is possible i.e. none of the merging increases the modularity of the network. The pictorial representation of the algorithm is given in Fig. 4.
468
R. Kishore et al.
Fig. 4. Pictorial representation of clustering algorithm. Q represents the modularity of given partition
The algorithm consist of mainly three steps: (i) Initialization: The connection matrix which is also called as adjacency matrix is calculated. Initially, the number of community is same as the number of nodes in the network (i.e. single node community). This type of initial community distribution is called symmetric distribution. Initial partition has modularity Q0. (ii) Node merging: In this step, each node is selected iteratively and checked for possible merging sequentially with all its connected neighbor communities. Modularity is calculated after each merging (e.g. Q1, Q2, Q3 etc.: refer “node merging” in Fig. 4). Finally, the merge which has the highest positive change in the modularity of network after and before merging in the community is selected. This node merging continues until no further merging increases the modularity of the system. (iii) Community merging: The node merging method used in step (ii) often converged at local maxima in the modularity curve. To remove any local maxima trap in the modularity curve, we merge any two connected community. If this merging increases the modularity then we keep these merging else reject it. The output of community merging is the final most modular clustered network.
3 Results We have detected nodules in various 2D slices of CT scan data of different patients. The pixels of the given image are segregated in different communities/nodules based on their grayscale values using the network clustering algorithm discussed in Sect. 2.3. Three different 2D slices of a patient CT scan image at different Z locations are shown in Fig. 5(a–c). The communities/nodules detected by the given modularity function in these images are shown in Fig. 5(d–f). The red circled regions in all the three CT scan images are the cancerous cells suggested by medical experts after performing different diagnosis processes including biopsies. It is clear that in all the three images, clustering
Detection of Lung Nodules Using Unsupervised Machine Learning Method
469
through modularity function can detect the cancerous nodule as a single community along with other nodules. It has captured not only the big size cancerous nodules but the small ones as well. These nodules are the potential regions need to be investigated.
Fig. 5. Nodule detection in 2D slices of CT scan data. (a–c) The 2D slices at different Z position of the lung data. The red circled regions are cancerous cells (labelled by the medical expert) and (d–f) are the corresponding colored images which contained different size of nodules. The color assigned to each detected nodule and the background is randomly selected. The same cancerous region as in panel (a–c) are circled in red color (Data source from Luna data: https://luna16. grand-challenge.org/Data/)
We have converted these clustered 2D images into binary mapped images by assigning nodule pixels as black and other than nodule (background) pixels as white. Then, all these 2D binary mapped images of a patient are stacked sequentially to generate the 3D binary mapped image (Fig. 6). The 3D binary mapped image represents mainly the opaque objects inside the lung and can be used for various 3D CNN architectures as the second channel along with the original unclustered 3D CT scan images for training purpose. It may increase the learning efficiency of the architectures because of providing extra morphological information of nodules.
470
R. Kishore et al.
Fig. 6. Generation of 3D binary mapped lung data. (a) Stacking of 2D binary mapped slices (detected nodules are in black and background in white color). (b) The 3D binary mapped image. The red circled nodules are cancerous nodules (labelled by the medical expert)
4 Conclusions We have used a new modularity function assisted clustering algorithm for cancerous/non-cancerous nodule detection in 2D CT scan images. It has detected the cancerous pixels as separate nodules. It has detected the nodules of bigger as well as small sizes. The clustered image is binary mapped (black for nodule region and white for the background) and can be used as an extra feature during the training of a CNN based architecture which depends on pixel-wise image segmentation data. This mapping contains the morphological information of nodules and thus might help reduce training as well as validation loss. However, more samples need to be studied for establishing the model. In future, the detected nodules can be used as a region of interest (ROI) which is mostly computed using RCNN. The advantage of the present method over RCNN is that it does not require any labelled data which is sometimes difficult to acquire. Acknowledgement. This project is funded by Virtual and Augmented Reality Centre of Excellence (VARCoE), IIT Bhubaneswar, India.
References 1. Cancer Research UK. https://www.cancerresearchuk.org/health-professional/cancer-statistics/ worldwide-cancer. Accessed August 2019 2. Ardila, D., Kiraly, A.P., Bharadwaj, S., Choi, B., Reicher, J.J., Peng, L., Tse, D., Etemadi, M., Ye, W., Corrado, G., Naidich, D.P.: End-to-end lung cancer screening with threedimensional deep learning on low-dose chest computed tomography. Nat. Med. 25(6), 954 (2019)
Detection of Lung Nodules Using Unsupervised Machine Learning Method
471
3. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural İnformation Processing Systems, pp. 91– 99 (2015) 4. Mourchid, Y., El Hassouni, M., Cherifi, H.: An image segmentation algorithm based on community detection. In: International Workshop on Complex Networks and their Applications, pp. 821–830. Springer, Cham, November 2016 5. Zhang, X., Zhang, C., Tang, W., Wei, Z.: Medical image segmentation using improved FCM. Sci. China Inf. Sci. 55(5), 1052–1061 (2012) 6. Hu, D., Ronhovde, P., Nussinov, Z.: Replica inference approach to unsupervised multiscale image segmentation. Phys. Rev. E 85(1), 016101 (2012) 7. Browet, A., Absil, P.A., Van Dooren, P.: Community detection for hierarchical image segmentation. In: International Workshop on Combinatorial Image Analysis, pp. 358–371. Springer, Heidelberg, May 2011 8. Kishore, R., Gogineni, A.K., Nussinov, Z., Sahu, K.K.: A nature ınspired modularity function for unsupervised learning involving spatially embedded networks. Sci. Rep. 9(1), 2631 (2019) 9. Kamal, U., Rafi, A.M., Hoque, R., Hasan, M.: Lung cancer tumor region segmentation using recurrent 3D-denseunet. arXiv preprint arXiv:1812.01951 (2018) 10. Abraham, N., Khan, N.M.: A Novel Focal Tversky Loss Function with Improved Attention U-Net for Lesion Segmentation. In: 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), pp. 683–687 (2019) 11. https://wiki.cancerimagingarchive.net/display/Public/NSCLC +Radiogenomics#39fc8e8414054aaaa88e56b88bb061f6 12. Sharma, D., Jindal, G.: Identifying lung cancer using image processing techniques. In: International Conference on Computational Techniques and Artificial Intelligence (ICCTAI), vol. 17, pp. 872–880 (2011) 13. Sankar, K., Prabhakaran, M.: An improved architecture for lung cancer cell identification using Gabor filter and intelligence system. Int. J. Eng. Sci. 2(4), 38–43 (2013)
Comparison of Dynamic Programming and Genetic Algorithm Approaches for Solving Subset Sum Problems Konjeti Harsha Saketh and G. Jeyakumar(B) Department of Computer Science and Engineering, Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, Coimbatore, India [email protected], [email protected]
Abstract. Albeit Evolutionary Algorithms (EAs) are prominent, proven tools for resolution of optimization problems in the real world, appraisal of their appropriateness in solving wide variety of mathematical problems, from simple to complex, continues to be an active research area in the domain of Computer Science. This paper portrays an evaluation of the relevance of Genetic Algorithm (GA) in addressing the Subset Sum Problem (SSP) of Mathematics and providing empirical results with discussions. A GA with pertinent mutation and crossover operators is designed and implemented to solve SSP. Design of the proposed algorithm are clarified in detail. The results obtained by the proposed GA are assessed among different instances with different initial population by the intermediary solutions obtained and the execution time. This study also adapted the traditional Dynamic Programming (DP) approach, pursuing a bottom-up strategy, to solve the SSP. The findings revealed that the GA approach would be unpreferred on account of its longer execution time. Keywords: Genetic Algorithm · Dynamic Programming · Subset Sum Problem · Algorithm design · Mutation and crossover
1 Introduction Evolutionary Computation (EC) has a repository of algorithms, termed as Evolutionary Algorithms (EAs), commonly applied in pursuit of answers to global optimization problems. EAs are inspired by the biological evolutionary principle of Darwin’s theory. They follow a generate-and-test methodology (trial and error) of searching for optimal solutions. In the exercise of EAs, an initial set of solutions (initial population) is generated, to start the search; this set undergoes an evolutionary search process that generates transient solutions, in their quest for the required optimal solution. Evolution of the population of potential solutions takes place at each consecutive step of the generation. At each step, the less desired candidates in the populations are stochastically removed, and promising candidates are retained. This emulates the natural selection theory of ‘Survival of the fittest’. Each generation of EA follows a series of steps, termed as: fitness evaluation, © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 472–479, 2020. https://doi.org/10.1007/978-3-030-37218-7_53
Comparison of Dynamic Programming and Genetic Algorithm Approaches
473
parent selection, reproduction (mutation and crossover) and survivor selection. The survivors (surviving candidates) are identified based on their fitness, which is calculated using a fitness function relevant to the problem. Two main processes that create diversity in the population are mutation and crossover. The Subset Sum Problem (SSP) is defined as “Given a set of non-negative integers, and a value, sum, determine if there is a subset of the given set, whose aggregate is equal to the given sum”. In the domain of Computer Science, SSP is an important decisionmaking problem, which is NP-Complete. Using SSP as a benchmarking problem, to test the working of new algorithms proposed by researchers, is a common practice among the computer science research community. GA, an instance of EA, follows the general algorithmic framework of EAs. GA is popular for its wide applicability [1, 2], as it supports encoding with all the type of depictions such as binary, integer, real, and permutation-based representations. This paper elucidates details of the design of an effective GA, for solutions to popular mathematical problems. The problem chosen for this study is Subset Sum Problem (SSP).
2 The Subset Sum Problem The SSP issue can be mathematically stated as follows: Given a set of n positive integers, S = {x1 , x2 , . . . , xn } and a non-negative integer, sum, find a subset, SS of S, with m integers, whose aggregate is equal to sum. For the given, S = {x1 , x2 , . . . , xn }, SS = {y1 , y2 , . . . , ym }, yi ∈ S for i = 1 … m, m yi = sum [iff = if and only if]. is a valid sub set iff i=1 To formulate an SSP as an optimization problem, the objective function and the constraints need to be defined. They are given below. m yi The objective function is Minimize f (SS) = sum − i=1 subject to yi ∈ S for i = 1 . . . m The optimal solution is a SS * with f (SS ∗ ) = 0.
3 Related Works This section presents few recent investigations reported in literature, explaining different approaches for solving SSP. Time entailed in solving SSP problems can be reduced by restricting the number of iterations executed by the algorithm, and altering the process of solving SSP with suitable changes in the algorithm. [3] suggests that number of iterations can be restricted by the user, if required, and can find all subsets with the sum equivalent to the required sum. The approach used in this work for solving SSP is as follows i. ii. iii. iv.
Arrange all the elements of the array in ascending order. Remove the elements greater than required sum. Fill the bit map with zero and find the first partial solution. If the partial solution’s sum is equal to required sum, stop the process, else continue to find the first 1-bin in the bit map until finding the partial solutions with required sum.
474
K. H. Saketh and G. Jeyakumar
Also, if there is a restriction on number of iterations then process might be terminated without finding a solution. SSPs can be solved in pseudo-polynomial time, by Dynamic Programming [4]. In this method, a Boolean 2-dimensional array, A, with size of x × y is created, where x = required sum and y = size of main array. Dynamic programming method can be used in two ways: either bottom-up or top-down approach. The value of any element, A[i] [j] is set to True if the sum of A[i] [0 to j − 1] is equal to I, where I is less than or equal to the given sum, otherwise set to False. Whether there exists a subset can be tested, by checking the element A [required sum] [size of main array] i.e. if it is set to True, it can be inferred that there indeed exists a subset whose elements aggregate to required sum. A work combining the L shortest path algorithm, and the finite-time convergent recurrent neural network, to form a new algorithm, for the L smallest k-subsets sum problem is developed and presented in [5]. This work formed the solution to the subset sum problem, by combining the solutions from a set of sub-problems. [6] presented priority algorithm approximation ratios for solving SSP; this probe is focused on the power of revocable decisions, where accepted data items can be rejected later, to maintain the feasibility of the solution. Besides the various approaches proposed to solve SSP, the SSP is also used to study and analyze different aspects of optimization problem. For example, [7] analyzed the properties of SSP; the authors highlighted the correlation between the dynamic parameters of the problem and the resulting movement in the global optimum. It is also shown the degree of movement of global optimum is relative the problem dynamics. An article details the algorithms in FLSSS, an R package, for solving various subset sum problems is there in [8]. [9] works on a series of assumptions. First, the given array is sorted in ascending order using Quick Sort. Next, a variable called limit point is set such that elements after that point are greater than the required sum, hence need not be used for the algorithm. Initial population is generated on the basis of cumulative frequency. Limit point in the initial population is found and candidates from the population are taken according to the genes of the assumed solutions. Then evolutionary process is performed until a subset with required sum is obtained. An improved GA for solving SSP is presented in [10]. A work analyses the effect of changing the parameters of GA on the results of solving SSP is presented in [11]. A work tried to find the approximate solution of SSP using GA along with rejection of infeasible offspring is presented in [12]. This work introduces a penalty function to discard the candidates not fit for next population. In spite of the previously reported existence of many GA and non-GA approaches to solve SSP, this paper intends to describe solving the SSP issue with a classical GA and a Dynamic Programming approach. Algorithms with both these approaches are implemented to solve the SSP, and a comparative study on their performance is presented in this paper.
4 Design of Experiments The design of experiment includes describing the parameters, and set their values in the chosen problem domain and the algorithm domain. The parameters to be set in the
Comparison of Dynamic Programming and Genetic Algorithm Approaches
475
problem domain of SSP are: size of the main array, elements of the main array and required sum. Under the assumption that all the elements in the main array are distinct, a sample setup of SSP parameters is presented in Table 1. Table 1. A sample parameter set up of SSP. Parameter Meaning
Values
Size_Ma
Size of the main array
5
Elements
Elements in the main array 12, 17, 3, 24, 6
R_Sum
Required sum
21
In the algorithmic domain, the experimental set up of this paper includes solving SSP by classical dynamic programming methodology and GA. In dynamic programming method the process of finding the subset is done, as follows, with a Boolean array. • A 2-dimensional Boolean array, dp [R_Sum+1] [Size_Ma+1] is created. Every row of 0th column is initialized to True, and every column of 0th row is initialized to False. • Method of bottom up approach is used to fill the remaining elements of the dp array. • A variable val is assigned with array [i − 1]. The dp[i] [j] is assigned to dp [i − 1] [j] until j is less than val. Remaining values of the particular row are assigned as follows: dp[i] [j] = dp [i − 1] [j − value], until i = R_Sum+1 to Size_Ma+1. • If the value of dp [R_Sum] [Size_Ma] is True, then it can be inferred that there is a subset that sums to the R_Sum, else there is no subset. • If there is a subset, the values in the subset are printed using the method of backtracking, in which we check from dp [R_Sum] [Size_Ma] to dp [0] [0]. The compound condition, which is to be true for an element is present in the subset is (j − val > 0 && dp [i − 1] [j − val]). As GA follows the common template of EA, the design parameters, commonly used in any EA, are to be set. The design of an EA, to solve an optimization problem includes determination of the following parameters: Population representation (with population size and size of each candidate), Fitness evaluation, Logic of selecting parents for reproduction, Mutation operator, Crossover operator, Logic of selecting survivor and Termination criteria. The values set for design of the GA, in this study’s experiment are explained below. • The population size is set equivalent to Size_Ma. An initial population with the Size_Ma candidates is initialized. In this population, the size of each candidate is also fixed random. The components of each candidate are taken at random, from the main array such that no two elements, among the candidates, are equal. • The fitness of the candidates is calculated as the absolute difference between the sum of the subset and the required sum. If the fitness of a candidate is equal to 0 then it a solution for SSP, whose sum of elements is equal to R_Sum. • Top two fittest candidates in the population are selected as parents for the reproduction. Thus the parents are the candidates whose fitness values is closer to the Rossum, compare to other candidates in the population.
476
K. H. Saketh and G. Jeyakumar
• A K-point crossover is used during the reproduction process. The crossover point is randomly generated between the maximum of the smallest size of the fittest and second fittest chromosomes in the population. The crossover is carried out in such a way that no two components in the candidates are similar, before and after the crossover. • Mutation is done on a random basis. If a particular candidate is selected for mutation then the mutation point is generated random in the range 0 to size of the fittest chromosome. The component at that random point is changed with a random value. • During the stage of survivor selection, the least fittest candidate is swapped with the newly generated candidate. • The above steps are repeated until a candidate has been found with fitness value equivalent to zero. • The number of generations and execution time, consumed by GA to arrive at a solution of the SSP, are recorded for the comparative analysis.
5 Results and Discussions The results obtained, in our intent to solve the SSP by the dynamic programming methodology and GA, are presented in this chapter. 5.1 Dynamic Programming (DP) Approach For the given input array and required sum, by using bottom-up strategy of the DP approach, a 2-dimensional Boolean array dp is populated with True or False, and the value of dp[R_Sum][Size_Ma] resulted in True. Next, using the method of backtracking, elements of the subset are printed. The results, obtained by the Dynamic programming approach, are presented in Table 2, and the content of the Boolean array used is shown in Fig. 1. 5.2 Classical GA Approach The results obtained, for the same parametric set up of SSP by GA, are presented in Table 3. The performance metrics used for the analysis are (1) Success Rate (SR) (2) Average Number of Generation (AvgNoG) (3) Average Execution Time (AvgET ) • The SR is calculated using the Eq. (1) S R = (N oS R#/T oR#) ∗ 100%
(1)
where, NoSR# is the number successful runs and ToR# is the total number of runs. Successful run is an instance of runs, where GA able to obtain Solution subset with the sum of elements = R_Sum.
Comparison of Dynamic Programming and Genetic Algorithm Approaches
477
• The AvGNoG is calculated using the Eq. (2) Max R G R /Max R Avg N oG =
(2)
where, R is the run number MaxR is the maximum number of runs (MaxR is set 10). Gi is the number of generations taken by GA at run i. • The AvgET is calculate using the Eq. (3) Max R Avg E T = E TR /Max R
(3)
R=1
R=1
where ET R is the execution time taken by GA at run i.
Table 2. Results for SSP by DP approach Serial no. Size_Ma Elements 1
5
R_Sum Solution subset
12, 17, 3, 24, 6 21
3, 6,12
Time (s) 1.07 × 10−3
Fig. 1. Output of the 2-dimensional Boolean array after computing using DP
The measured performance metrics, for GA on its performance at solving the SSP designed in the experiment, is presented in Table 3. The values of AvgNoG and AvgET are measured only for the successful runs, and presented in Table 4. The results show that the SR in solving the SSP is 70%. Its AvgNoG is 7.14 and AvgET is 74.28 ms. The findings validate the applicability of GA in solving SSP with a simple experimental set up. On comparing the performance of the GA approach versus the classical Dynamic Programming methodology (Tables 2 and 3), for solutions to the SSP, the downside of GA is found to be the increased execution time, ensued by the stochastic processes at all stages. While Dynamic programming approach consumed approximately 2 ms to find the solution, GA took around 75 ms. However, the performance of GA can still be augmented to achieve higher success rate with lesser AvgNoG and AvgET, by careful selection of its parametric values.
478
K. H. Saketh and G. Jeyakumar Table 3. Results for SSP by GA approach Run#
SSP solved (yes/no) Number of generations
Time (s)
Solution subset
1
Yes
19
82 × 10−3
12, 6, 3
2
No
–
–
–
3
No
–
–
– 6, 12, 3
4
Yes
11
69 × 10−3
5
Yes
0
58 × 10−3
12, 6, 3
4
60 × 10−3
3,12, 6 3, 12, 6
6
Yes
7
Yes
1
75 × 10−3
8
No
–
–
–
9
Yes
7
83 × 10−3
6, 12, 3
8
93 × 10−3
6, 3, 12
10
No
Table 4. The performance of GA at solving SSP Performance metric Value measured SR
70%
Avg N oG
7.14
Avg E T
74.28 × 10−3 s
6 Conclusions The paper presented the details of the design and implementation of a GA framework to solve the SSP. The performance of GA is appraised on three metrics, for different sample runs. As a comparative study, the SSP is also solved using the classical Dynamic Programming approach. GA is able to solve the SSP with 70% success rate. The experimental results, comparing both the approaches, reveal that the execution time of GA based approach is much higher than the classical DP approach. This work can be extended further analyzing the impact on performance, with modifications to the GA approach, with varying values of its inherent parameters. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest ✓ No humans/animals involved in this research work. ✓ We have used our own data.
Comparison of Dynamic Programming and Genetic Algorithm Approaches
479
References 1. Janani, N., Shiva Jegan, R.D., Prakash, P.: Optimization of virtual machine placement in cloud environment using genetic algorithm. Res. J. Appl. Sci. Eng. Technol. 10(3), 274–287 (2015) 2. Raju, D.K.A., Shunmuga Velayutham, C.: A study on genetic algorithm based video abstraction system. In: Proceedings of World Congress on Nature Biologically Inspired Computing (2009) 3. Isa, H.A., Oqeili, S., Bani-Ahmad, S.: The subset-sum problem: revisited with an improved approximated solution. Int. J. Comput. Appl. 114, 1–4 (2015) 4. Jain, E., Jain, A., Mankad, S.H.: A new approach to address subset sum problem. IEEE Explorer (2014) 5. Gu, S., Cui, R.: An efficient algorithm for the subset sum problem based on finite-time convergent recurrent neural network. Neuro-Computing 149, 13–21 (2015) 6. Ye, Y., Borodin, A.: Priority algorithms for the subset-sum problem. J. Comb. Optim. 16(3), 198–228 (2008) 7. Rohlfshagen, P., Yao, X.: Dynamic combinatorial optimization problems: an analysis of the subset sum problem. Soft Comput. 15, 1723–1734 (2011) 8. Liu, C.W.: FLSSS: a novel algorithmic framework for combinatorial optimization problems in the subset sum family. J. Stat. Softw. VV(II), 1–22 (2018) 9. Bhasin, H., Singla, N.: Modified genetic algorithms based solution to subset sum problem. Int. J. Adv. Res. Artif. Intell. 1(1), 38–41 (2012) 10. Wang, R.L.: A genetic algorithm for subset sum problem. Neuro-Computing 57, 463–468 (2004) 11. Oberoi, A., Gupta, J.: On the applicability of genetic algorithms in subset sum problem. Int. J. Comput. Appl. 145(9), 37–40 (2016) 12. Thada, V., Shrivastava, U.: Solution of subset sum problem using genetic algorithm with rejection of infeasible offspring method. Int. J. Emerg. Technol. Comput. Appl. Sci. 10(3), 259–262 (2014)
Design of Binary Neurons with Supervised Learning for Linearly Separable Boolean Operations Kiran S. Raj(&), M. Nishanth, and Gurusamy Jeyakumar Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India {cb.en.u4cse17430, cb.en.u4cse17441}@cb.amrita.students.edu, [email protected]
Abstract. Though the Artificial Neural Network is used as a potential tool to solve many of the real-world learning and adaptation problems, the research articles revealing the simple facts of how to simulate an artificial neuron for most popular tasks are very scarce. This paper has in its objective presenting the details of design and implementation of artificial neurons for linearly separable Boolean functions. The simple Boolean functions viz AND and OR are taken for the study. This paper initially presents the simulation details of artificial neurons for AND and OR operations, where the required weight values are manually calculated. Next, the neurons are added with learning capability with perceptron learning algorithm and the iterative adaptation of weight values are also presented in the paper. Keywords: Neural networks Supervised learning Boolean operations Binary neurons
Weight adaptation
1 Introduction In human bodies, the Biological Neural Network (BNN) integrates the processing elements called neurons to form the nervious system. These processing elements are to process the information sent to them as external inputs with dynamic nature. Therese neurons are interconnected through many synapses. The neural network in computer science tries to replicate the BNN and tries to analyze data inputs to predict the accurate output by Artificial Neural Network (ANN). The ANNs are designed to learn and to perform a task by analyzing the given examples with learning. The ANNs are the frameworks consisting of suitable learning algorithms to analyze the complex data inputs to get the desired outputs. The applications of ANNs includes character recognition, image processing, stock market prediction, video analytics etc. In nature, neurons have several dendrites (inputs), a cell nucleus (processor) and an axon (output). The biological neurons can be modeled as mathematical model to design the artificial neurons. These Artificial neurons are elementary units in an artificial neural network. Artificial neurons are designed to mimic aspects of their biological © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 480–487, 2020. https://doi.org/10.1007/978-3-030-37218-7_54
Design of Binary Neurons with Supervised Learning
481
counterparts. Neurons work for producing a desired output with an activation function for a given set of inputs and a set of weights. This papers presents design and implementation details of artificial neuron for performing common Boolean operations. Initially, the neurons are implemented without the learning capability. Their performances are verified with the expected outcomes of the Boolean operations. Then, the neurons are added with learning capability with perceptron learning algorithm. The working these learning neurons also explained in the paper. The remaining part of paper is organized in to 6 sections. The Sect. 2 explains the working of artificial neurons, the Sect. 3 presents the works related the work presented in this paper, the Sect. 4 discusses the design and implementation of the neurons without learning, the Sect. 5 explaining the working of neurons with learning, finally the Sect. 6 concludes the paper.
2 Artificial Neurons The artificial neuron receives one or more inputs and sums them to produce an output. Usually each input is separately weighted, and the sum is passed through a nonlinear function known as an activation function or transfer function. Specifically, in an ANN (Artificial Neural Network) we do the sum of products of inputs and their corresponding weights and apply an activation function to it get the output of that layer and feed it as input to the next layer (as it is shown in Fig. 1).
Fig. 1. The schematic diagram of an artificial neuron
The main purpose of an activation function in an ANN is to convert an input signal of a node in an ANN to an output signal. The activation function makes the neural network more powerful and adds ability to it to learn complex and complicated data set and represent non-linear complex arbitrary functional mappings between inputs and outputs.
3 Related Works This section presents the recent studies available in the literature related to our system. There has been great effort and significant progress in the field of neural network recently due to their efficiency and accuracy. For example, a simple neural network design composed of neurons of two types to realize the kth basic Boolean (logic) operation is proposed in [1]. The above neural network performs the Boolean operation
482
K. S. Raj et al.
based on recursive definition of “basic Boolean operations”. A work presenting the modelling of neural Boolean networks by non-linear digital equations is shown in [2], where new memory neural systems based on hyper incursive neurons (neurons with multiple output states for the same input) instead of synaptic weights are designed and implemented. Similar to that, the work presented in [3] depicts the reduction of memory space and computation time using Boolean neural networks in comparison to representation of Boolean functions by usual neural networks. Threshold functions and Artificial Neural Networks (ANNs) are known for many years and have been thoroughly analyzed. An implementation of the basic logic gates of AND and XOR by Artificial Neuron Network using Perceptron and Threshold elements as Neuron output functions is described in [4]. One of the various applications of Neural network includes the implementation of logic gates with the help of Neurons. An approach for the realization of such logic gates by applying McCulloch-Pitts model has been presented in [5]. Deep neural networks (DNNs) have shown unprecedented performance on various intelligent tasks such as image recognition, speech recognition, natural language processing, etc. An analysis a fully parallel RRAM synaptic array architecture that implements the fully connected layers in a convolutional neural network with (+1, −1) weights and (+1, 0) neurons has been made in [6] and fully parallel BNN architecture (P-BNN) has been proposed in the same work. A research work aimed at implementing a model to generate binarized MNIST digits and experimentally compare the performance for different types of binary neurons, GAN objectives and network architectures is described in [7]. Neural networks are widely used in classification problems in order to segregate two classes of points with a straight line. In [8], a well-adapted learning algorithm for classification tasks called Net Lines is presented. It generates small compact feed forward neural networks with one hidden layer of binary units and binary output units. Neural network has lead not only to the development of software but also to the development of hardware such as Neural network emulators. A detailed implementation of a fully digital neural network emulator, mainly composed of 128 serial adders that can emulate neural layers of any size, only depending on the amount of RAM is shown in [9]. It can be viewed as a 128-processor specialized SIMD machine. The embedded computer vision systems which emulate the human vision are used in wide range of applications. These applications require high accuracy, low power, and high performance where the neural networks come into use. A case study in [10], proposes a neuron pruning technique which eliminates almost part of the weight memory. In that case, since the weight memory is realized by an on-chip memory on the FPGA, it achieves a high-speed memory access. Similar examples for research works based on FPGAs and neuron pruning technique are [11] and [12]. Considering the wide applicability of Artificial Neural Network as a tool to solve most common learning problems around us [13–18], this paper presents the learning principle of artificial neurons in using them for solving Boolean AND and OR operations. The details of the design and implementation of the AND and OR neurons with and without learning are explained forth coming sections.
Design of Binary Neurons with Supervised Learning
483
4 Binary Neurons Without Learning
Fig. 2. Binary threshold function
A binary neuron which is trained to execute the basic Boolean functions such as Bitwise AND, OR, NOR and NAND is implemented with the help of Binary Threshold Functions (Shown in Fig. 2). Following the linear way of manual classification by providing the appropriate weight and the corresponding linear constants (depending upon the input), an equation is constructed to execute the task of performing the bitwise operations. The equation to determine the activation value of the neuron is y ¼ ððW0 X0 Þ þ ðW1 X1 Þ þ ðW2 X2 ÞÞ ð1Þ
Assuming weights as (through an Ad-hoc method) W0 = −1.5, W1 = 1 and W2 = 1, the code snippet for realizing the binary AND operation with the AND-neuron in shown in Fig. 3. The performance of the AND-neuron is verified with the truth table for all possible inputs of X1 and X2 and found it is correct. The verified results are presented in Table 1. Assuming weights as (through an Ad-hoc method) W0 = −0.5, W1 = 1 and W2 = 1, the code snippet for realizing the binary OR operation with the OR-neuron in shown in Fig. 4. The performance of the OR-neuron is verified with the truth table for all possible inputs of X1 and X2 and found it is correct. The verified results are presented in Table 2.
Fig. 3. Code Snippet for Binary AND neuron. Fig. 4. Code Snippet for Binary OR neuron.
484
K. S. Raj et al.
Table 1. Verification of the results for ANDneuron X0 1 1 1 1
X1 0 0 1 1
X2 0 1 0 1
W0 −1.5 −1.5 −1.5 −1.5
W1 1 1 1 1
W2 1 1 1 1
f 0 0 0 1
Table 2. Verification of the results for ORneuron X0 1 1 1 1
X1 0 0 1 1
X2 0 1 0 1
W0 −0.5 −0.5 −0.5 −0.5
W1 1 1 1 1
W2 1 1 1 1
f 0 1 1 1
5 Binary Neurons with Learning A distinguishing is made to segregate the expected inputs to the expected outputs. Falling into two categories which correspond to the values where the outputs would be ‘0’ and ‘1’ respectively. On segregation, a learning algorithm is applied which calculates the cost of error at each iteration and correspondingly minimizes the error from the original requirement. The learning rate coefficient is a factor by which the error of the function is minimized and hence the neuron is trained to perform the particular task. The code snippet for AND-neuron with learning is show in Fig. 5.
Fig. 5. Code Snippet for Binary AND neuron with learning.
The adaptation of weights using the learning algorithm is done using the Eq. (2). Wi þ 1 ¼ Wi g Xi where i is the iteration index, and the η is the learning rate.
ð2Þ
Design of Binary Neurons with Supervised Learning
485
An illustrative execution of the AND neuron with perceptron learning algorithm is shown in Table 3, and the code snippet is shown in Fig. 5. The initial values assigned for the weights are W0 = 11, W1 = −8, W2 = 7. The table shows the weight adaptation for the input values of X1 = 1 and X2 = 0, as a sample. It is seen from the results that the AND-neuron is able to learn the required weight values in three iterations itself. The adapted weight value for the correct operation of Boolean AND operation, in this case, are W0 = 9, W1 = −10 and W2 = 7. Table 3. Verification of the results for AND-neuron with learning Iteration # X0 0 1 1 1 2 1
X1 1 1 1
X2 W0 0 11 0 10 0 9
W1 −8 −9 −10
W2 7 7 7
Actual output Desired output Error Error % 1 0 −1 50 1 0 −1 50 0 0 0 0
An illustrative execution of the OR neuron with perceptron learning algorithm is shown in Table 4, the code snippet is shown in Fig. 6. The code explains that the learning happens ireratively, untill the error becomes zero. The initial values assigned for the weights are W0 = 1, W1 = −8, W2 = 9. The table shows the weight adaptation for the input values of X1 = 1 and X2 = 0, as a sample. It is seen from the results that the OR-neuron is able to learn the required weight values in five iterations. The adapted weight value for the correct operation of Boolean OR operation, in this case, are W0 = 5, W1 = −4 and W2 = 9.
Fig. 6. Code Snippet for Binary OR neuron with learning.
486
K. S. Raj et al. Table 4. Verification of the results for OR-neuron with learning Iteration # X0 0 1 1 1 2 1 3 1 4 1
X1 1 1 1 1 1
X2 0 0 0 0 0
W0 1 2 3 4 5
W1 −8 −7 −6 −5 −4
W2 9 9 9 9 9
Actual output Desired output Error 0 1 1 0 1 1 0 1 1 0 1 1 1 1 0
Error % 50 50 50 50 0
6 Conclusions This paper presented the details of simulating artificial neurons to implementing Boolean functions. The implementation details for OR and AND functions are shown in this paper. This paper demonstrated the use of perceptron learning algorithm for artificial neuron to adapt their weight vector to perform the required task. Thus the paper showed the results obtained with different simulation setups: OR without learning, AND without learning, OR with learning and AND with learning. The details presented in this paper reveals that the artificial neuron can be simulated for all type of Boolean operations. The work presented in this paper can be extended further to include simulation of all the Boolean functions vix NAND, XOR and XNOR with and without learning.
References 1. Mandziuk, J., Macukow, B.: A neural network performing boolean logic operations. Opt. Mem. Neural Netw. 2(1), 17–35 (1993) 2. Dubois, D.M.: Hyperincursive McCulloch and Pitts neurons for designing a computing flipflop memory. In: Computing Anticipatory Systems: CASYS 1998 - Second International Conference, pp. 3–21 (1999) 3. Kohut, R., Steinbach, B.: Boolean neural networks. In: Proceedings Kohut 2004 (2004) 4. Ele, S.I., Adesola, W.A.: Artificial neuron network implementation of boolean logic gates by perceptron and threshold element as neuron output function. Int. J. Sci. Res. 4(9), 637–641 (2013) 5. Srinivas Raju, J.S., Kumar, S., Sai Sneha, L.V.S.S.: Realization of logic gates using MccullochPitts neuron model. Int. J. Eng. Trends Technol. 45(2), 52–56 (2017) 6. Sun, X., Peng, X., Chen, P.-Y., Liu, R., Seo, J., Yu, S.: Fully parallel RRAM synaptic array for implementing binary neural network with (+1, −1) Weights and (+1, 0) neurons. In: 23rd Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 574–579 (2018) 7. Dong, H.-W., Yang, Y.-H.: Training generative adversarial networks with binary neurons by end-to-end back propagation. Comput. Res. Repository (2018). arXiv:1810.04714 8. Torres-Moreno, J.-M., Gordon, M.B.: Adaptive learning with binary neurons. arXiv:0904. 4587 (2009) 9. Skubiszewski, M.: A hardware emulator for binary neural networks. In: International Neural Network Conference, Springer, Dordrecht (1990)
Design of Binary Neurons with Supervised Learning
487
10. Fujii, T., Sato, S., Nakahara, H.: A threshold neuron pruning for a binarized deep neural network on an FPGA. IEICE Trans. Inf. Syst. 101(2), 376–386 (2018) 11. Kohut, R., Steinbach, B.: The structure of boolean neuron for the optimal mapping to FPGAs. In: Proceedings of the VIII-th International Conference CADSM 2005, pp. 469–473 (2005) 12. Training with states of matter search algorithm enables neuron model pruning, Technical report - Kanazawa University, November 2018 13. Bindu, K.R., Aakash, C., Orlando, B., Parameswaran, L.: An algorithm for text prediction using neural networks. In: Lecture Notes in Computational Vision and Biomechanics, vol, 28, pp. 186–192 (2018) 14. Kumar, P.N., Seshadri, G.R., Hariharan, A., Mohandas, V.P., Balasubramanian, P.: Financial market prediction using feed forward neural network. In: Communications in Computer and Information Science, pp. 77–84 (2011) 15. Suresh, A., Harish, K.V., Radhika, N.: Particle Swarm Optimization over back propagation neural network for length of stay prediction. Procedia Comput. Sci. 46, 268–275 (2015) 16. Senthil Kumar T., Sivanandam, S.N.: An improved approach for detecting car in video using neural network model. J. Comput. Sci. 10, 1759–1768 (2012) 17. Padmavathi, S., Saipreethy, M.S., Valliammai, V.: Indian sign language character recognition using neural networks. Int. J. Comput. Appl. 1, 40–45 (2013) 18. Anil, R., Manjusha, K., Sachin Kumar, S., Soman, K.P.: Convolutional neural networks for the recognition of Malayalam characters. In: Advances in Intelligent Systems and Computing, vol. 328, pp. 493–500 (2015)
Image De-noising Method Using Median Type Filter, Fuzzy Logic and Genetic Algorithm S. R. Sannasi Chakravarthy(&) and Harikumar Rajaguru Department of ECE, Bannari Amman Institute of Technology, Sathyamangalam 638401, Tamil Nadu, India [email protected], [email protected]
Abstract. The goal of de-noising the medical images is to get rid of the distortions occurred in the noisy medical images. A new methodology is proposed to overcome the impulse noise affected mammogram images by using hybrid filter (HF), fuzzy logic (FL) and genetic algorithm (GA). The above said method is implemented in three steps: The primary step includes denoising of noisy mammogram images using median filter and adaptive fuzzy median filter respectively. The intermediate step intends to compute the difference vector using the above two filters and it is then given to a fuzzy logic-based system. The system utilizes triangular membership function to generate the fuzzy rules from the computed difference vector value. The last step makes use of genetic algorithm to select the optimal rule. Peak signal to noise ratio (PSNR) value is needed to be found for each population. For obtaining the best fitness value, the new population is formed repeatedly with the help of genetic operator. The performance of the method is measured by calculating the PSNR value. The proposed implementation is tested over mammogram medical images taken from Mammogram Image Analysis Society (MIAS) database. The experimental results are compared with different exiting methods. Keywords: Mammogram algorithm
Impulse noise Median Fuzzy Genetic
1 Introduction For any type of imaging modality, the manual or automatic inspection is required for its final classification or decision making. But if any medical images are affected by noise, then the inspection is of no use which leads to increased risk of human lives. Generally, the noise is uninvited information included in the image and the noise existent in the modality is either additive or multiplicative [1]. The process of restoring medical images assumes that the degradation model is a known one or it can be calculated. During the process, we can consider that the noise is not dependent on spatial coordinates and not correlated relating to the information on an image. The noises are categorized based on the distribution of pixel values or through histogram analysis [1]. Filters are commonly employed to get rid of undesirable noise information from the original image. The paper investigates the mammogram images which are contaminated with salt and pepper (impulse) noise. The work simply involves the pre-processing of © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 488–495, 2020. https://doi.org/10.1007/978-3-030-37218-7_55
Image De-noising Method
489
medical imaginings (mammograms) which are needed for fruitful classification or appropriate diagnosis of the disease. A mammogram image is obtained as an x-ray photograph of the breast and it is used to diagnose the breast cancer among females without any signs or else symptoms [2]. The mammograms considered in the proposed work is taken from the MIAS database. If every rules created from fuzzy logic system be used, then it put away more time for processing; thus simply the optimal rules are selected using genetic algorithm. This will lessen the execution period, upsurge the visual quality and also upturns the PSNR value. The flowchart of the system is given in Fig. 1.
2 Materials and Methods 2.1
Impulse Noise
Impulse noise is a kind of acoustic distortion that has near-instantaneous (impulse-like) unwanted sharp and sudden distortions. The common cause for this category of noise is electromagnetic interference, unwanted scratches on the recording films, explosions like gunfire and bad synchronization in digital means of recording and transportation [3]. If the transmission channel is not good, then the noise variance will distort the image signal with very high negative (minimum) or with very high positive (maximum) value. Thus the mammogram image will distorted with some black (pepper) and white (salt) pixel spots [4]. This type of noise is referred as impulse noise. The noise model for any image is mathematically defined as Si;j ¼
Xi;j Yi;j
with probability with probability
p 1p
ð1Þ
where Xi;j specifies the pixel values in original image (noiseless) and Yi;j denotes the distorted pixel values changed on behalf of original noiseless pixel values in a mammogram image. The degree at which the mammogram image is distorted is specified by the parameter p. The fixed-valued model is embraced of ruined pixels in which the pixel values are ruined with either extreme (salt) or minimum (pepper) of the tolerable pixel range. 2.2
Median and Adaptive Fuzzy Median Filter
The median filter is a kind of non-linear technique, commonly used for digital filtering to remove noise from any type of images. Such noise removal is a popular preprocessing stage to increase the results of further processing (like edge detection and classification on an image) [5]. Since the median filter preserves the edges during noise removal, it is widely employed in digital and medical image processing. The filtering technique assumes that all pixels in the image and looks at its adjacent pixels to check whether it is representative of its environment or not. The technique calculates the middle value among the pixels and simply replaces it using the computed median of those values [6]. The median value is determined by first arranging every
490
S. R. Sannasi Chakravarthy and H. Rajaguru
pixel based on their numerical values from the surrounding neighborhood. The particular pixel being considered is then replaced with the middle (median) pixel value. When the neighborhood being taken have an even quantity of pixels in the taken image, the average between the two center values of the pixel is considered [7]. The adaptive fuzzy median filtering method is intended to overcome the limitation of the above said median filtering technique. Since the neighborhood size of a standard median filter is fixed, its performance gets lessen whenever there is an increase in the density of spatial noise [8]. However, in the adaptive fuzzy median type of filter, there is an adaptation (change) in the neighborhood size during the removal process. This change or variation depends on the median value of pixels present in the current window. The size of this window is more whenever the median value is an impulse [9]. If any image is getting affected highly by noise, then the entire pixels present in the filtering window are same as the extreme values. This will increase the size of the window with a maximum predefined size ðWmax Þ for noise removal processing, after that the replacement of I ðx; yÞ by the calculated median value (determined using the filtering window). While determining the median value, if the pixel values are almost same as the extreme findings in filtering window, then the calculation of pixel values present in filtering window is done irrespective of the extreme values used to determine the median one [8]. At this stage, the fuzzy corrupted value is calculated and then used to check whether or not the filtering window is highly affected by impulse-like noise. The median filters are also used for smoothening (blurring) the input images. Consequently the efficiency in suppression of noise is done at the cost of blurred or distorted image characteristics [10]. Thus an appropriate way to evade this problem is to integrate necessary decision-making control in the filtering process that leads to adaptive fuzzy median filter. 2.3
Fuzzy Logic and Genetic Algorithm
The word fuzzy denotes the one which is not clear or vague. Generally in the practical world, we came across many situations that are not predicted as either true or false [11]. The fuzzy logic gives a very valuable and flexible reasoning in these situations. The system generates rules for the available inputs based on the type of membership function used. Fuzzification and Defuzzification denote the conversion of crisp into fuzzy values and vice-versa [12]. The genetic algorithm is a one used for optimizing either constrained or unconstrained problems. This optimization depends on the selection of natural ones, this implies the driving of biological evolution [13]. The algorithm changes the population of respective solutions periodically. At each stage, the technique randomly chooses the individuals from the present population as parents and thereby utilizes these parents to generate the children for subsequent generation. This process of generation “evolves” for finding an optimal solution. The algorithm is used widely for the applications having discontinuous objective function, non-linear, stochastic or non differentiable [14]. It can also solve the problems of mixed-integer programming with fewer things are not allowed to be integer.
Image De-noising Method
491
3 Proposed Method 3.1
Calculation of Impulse Noise Difference Using Hybrid Filter
The paper includes the usage of two separate filters: median and adaptive fuzzy median type in the first stage for the better filtering of input medical images (mammograms). The output of median and adaptive fuzzy median type is denoted by Md ðx; yÞ and FMd ðx; yÞ. The difference between the outputs of this hybrid filter is d1 ¼ I ðx; yÞ Md ðx; yÞ
ð1Þ
d2 ¼ I ðx; yÞ FMd ðx; yÞ
ð2Þ
where I ðx; yÞ is the input mammogram to the proposed method. These d1 and d2 are given as input to the fuzzy logic system for the generation of fuzzy rules. 3.2
Fuzzy Logic for the Generation of Rules
The FL system utilizes the mamdani method along with max-min composition to get rid of unwanted impulse noise in the acquired mammograms. The process of fuzzification yields a F1 score and using this score value, the decision of concluding that the detected pixels are either affected or not by impulse noise [15]. The decision is made using if F1 Score T; then it is impulse noise distorted if F1 Score T; then it is not distorted
ð3Þ
Using the above condition, the obtained score value is assessed for the impulse noise detection in mammograms. The process of fuzzification is done by using the triangular membership functions and the membership of a component X in crisp set C is termed using a characteristic function as [16] lC ð X Þ ¼
3.3
1 ðTrueÞ if X 2 C 0 ðFalseÞ otherwise
ð4Þ
Genetic Algorithm for Optimization
After defuzzification, the fuzzy output is taken to the genetic algorithm for the purpose of optimization of result. The generation of fuzzy rules are more for the problem and so genetic algorithm is employed for the optimization of fuzzy rules. GA gives good removal of noise even with the different noise content (large state and multimodal state space) [17]. The process of optimization depends on its structure and behaviour of chromosomes for different images. The steps of GA is given as [18] 1. The initial population ð xÞ of GA is taken as randomly selected obtained fuzzy rules. 2. Compute the fitness score of above population ð xÞ
492
S. R. Sannasi Chakravarthy and H. Rajaguru
Mammogram Inputs
Adding of Impulse Noise
Noisy Mammograms
Median Filter
Adaptive Fuzzy Median Filter
Genetic Algorithm for Optimization
Fuzzy logic for Noise Detection
If Noise Detected?
No
Yes Adaptive Fuzzy Median Filter
Impulse Noise Removed Image
Fig. 1. Flowchart of proposed system
Image De-noising Method
493
3. Repeat until a. Select the parent rules from ð xÞ b. Form the population ðx þ 1Þ during the cross-over process on parent rules. c. Perform the mutation on ðx þ 1Þ: d. Testing of fitness value is done for the problem of ðx þ 1Þ: 4. The best optimized fitness value is obtained. After the generation of initial population using above steps, determining the fitness function is performed in order to test the proposed method. The problem considers the parameter, PSNR as a fitness function [19]. The PSNR as a fitness function is given as [20] 2552 PSNR ¼ 10 log10 1 PM PN xy
x¼1
y¼1
ui;j Ii;j
dB
ð5Þ
where uði; jÞ and I ði; jÞ is the pixel elements of the output and original input images and M; N is the input size ðM N Þ respectively. The denominator term of above equation gives the Mean Square Error (MSE).
4 Experimental Results The results are taken from the GA and it is evaluated using standard PSNR metric where its calculation is based on the Eq. (5). The results are calculated for various noise ratio as in the Table 1. Table 1. PSNR values for mdb063 image at different noise density Methods
PSNR 10% Mean filter 20.36 Decision based filter 29.52 Wavelet filter 31.53 Proposed method 58.76
(dB) 30% 17.55 25.51 29.01 56.94
50% 11.84 22.63 27.52 56.31
70% 7.87 19.94 23.48 55.46
Form the above table, the proposed method works well than other noise removal systems. And the results are plotted graphically as in Fig. 2.
494
S. R. Sannasi Chakravarthy and H. Rajaguru
Comparison of PSNR (dB) Values 70 60 50 40 30 20 10 0 10%
30%
50%
70%
PSNR (dB) Mean Filter
Decision Based Filter
Wavelet Filter
Proposed Method
Fig. 2. PSNR value comparison of proposed method
5 Conclusion and Future Work A methodology for the removal of impulse noise using hybrid filters is proposed. The filters used are median and adaptive fuzzy median filters. After taking the differences, fuzzy rules are generated using fuzzy logic system. This step is for the detection of impulse noise in the mammogram images. But there is a need for optimization of these generated fuzzy rules, which is done by genetic algorithm. The comparative simulations shows that efficiency of the proposed work is higher than the existing filters. The future work of this work is implementation of the proposed system for clinical images with different optimization algorithms. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest. ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Leipsic, J., LaBounty, T.M., Heilbron, B., Min, J.K., Mancini, G.J., Lin, F.Y., Earls, J.P.: Adaptive statistical iterative reconstruction: assessment of image noise and image quality in coronary CT angiography. Am. J. Roentgenol. 195(3), 649–654 (2010) 2. Abirami, C., Harikumar, R., Chakravarthy, S.S.: Performance analysis and detection of micro calcification in digital mammograms using wavelet features. In: 2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), pp. 2327–2331. IEEE (2016)
Image De-noising Method
495
3. Liu, L., Chen, C.P., Zhou, Y., You, X.: A new weighted mean filter with a two-phase detector for removing impulse noise. Inf. Sci. 315, 1–16 (2015) 4. Gupta, V., Chaurasia, V., Shandilya, M.: Random-valued impulse noise removal using adaptive dual threshold median filter. J. Vis. Commun. Image Represent. 26, 296–304 (2015) 5. Chakravarthy, S.S., Subhasakthe, S.A.: Adaptive median filtering with modified BDND algorithm for the removal of high-density impulse and random noise. Int. J. Comput. Sci. Mob. Comput. IV(2), 202–207 (2015) 6. Gan, S., Wang, S., Chen, Y., Chen, X., Xiang, K.: Separation of simultaneous sources using a structural-oriented median filter in the flattened dimension. Comput. Geosci. 86, 46–54 (2016) 7. Chen, Y.: Deblending using a space-varying median filter. Explor. Geophys. 46(4), 332–341 (2015) 8. Wang, Y., Wang, J., Song, X., Han, L.: An efficient adaptive fuzzy switching weighted mean filter for salt-and-pepper noise removal. IEEE Signal Process. Lett. 23(11), 1582–1586 (2016) 9. Chen, Y., Zhang, Y., Shu, H., Yang, J., Luo, L., Coatrieux, J.L., Feng, Q.: Structure-adaptive fuzzy estimation for random-valued impulse noise suppression. IEEE Trans. Circ. Syst. Video Technol. 28(2), 414–427 (2016) 10. Erkan, U., Gökrem, L., Enginoğlu, S.: Different applied median filter in salt and pepper noise. Comput. Electr. Eng. 70, 789–798 (2018) 11. Nguyen, H.T., Walker, C.L., Walker, E.A.: A First Course in Fuzzy Logic. CRC Press (2018) 12. Suganthi, L., Iniyan, S., Samuel, A.A.: Applications of fuzzy logic in renewable energy systems–a review. Renew. Sustain. Energy Rev. 48, 585–607 (2015) 13. Metawa, N., Hassan, M.K., Elhoseny, M.: Genetic algorithm based model for optimizing bank lending decisions. Expert Syst. Appl. 80, 75–82 (2017) 14. Gai, K., Qiu, M., Zhao, H.: Cost-aware multimedia data allocation for heterogeneous memory using genetic algorithm in cloud computing. IEEE Trans. Cloud Comput. 1, 1–1 (2016) 15. Zadeh, L.A.: Fuzzy logic—a personal perspective. Fuzzy Sets Syst. 281, 4–20 (2015) 16. Nayak, P., Devulapalli, A.: A fuzzy logic-based clustering algorithm for WSN to extend the network lifetime. IEEE Sens. J. 16(1), 137–144 (2015) 17. Li, X., Gao, L.: An effective hybrid genetic algorithm and tabu search for flexible job shop scheduling problem. Int. J. Prod. Econ. 174, 93–110 (2016) 18. Ding, Y., Fu, X.: Kernel-based fuzzy c-means clustering algorithm based on genetic algorithm. Neurocomputing 188, 233–238 (2016) 19. Cheng, J.H., Sun, D.W., Pu, H.: Combining the genetic algorithm and successive projection algorithm for the selection of feature wavelengths to evaluate exudative characteristics in frozen–thawed fish muscle. Food Chem. 197, 855–863 (2016) 20. Fardo, F.A., Conforto, V.H., de Oliveira, F.C., Rodrigues, P.S.: A formal evaluation of PSNR as quality measurement parameter for image segmentation algorithms. arXiv preprint arXiv:1605.07116 (2016)
Autism Spectrum Disorder Prediction Using Machine Learning Algorithms Shanthi Selvaraj(&), Poonkodi Palanisamy, Summia Parveen, and Monisha Department of Computer Science and Engineering, SNS College of Technology, Saravanampatti, Coimbatore 641035, Tamilnadu, India [email protected], [email protected], [email protected], [email protected]
Abstract. The objective of the research is to foresee Autism Spectrum Disorder (ASD) in toddlers with the help of machine learning algorithms. Of late, machine learning algorithms play vital role to improve diagnostic timing and accuracy. This research work precisely compares and highlights the effectiveness of the feature selection algorithms viz. Chi Square, Recursive Feature Elimination (RFE), Correlation Feature Selection (CFS) Subset Evaluation, Information Gain, Bagged Tree Feature Selector and k Nearest Neighbor (kNN), and to improve the efficiency of Random Tree classification algorithm while modelling ASD prediction in toddlers. Analysis results uncover that the Random Tree dependent on highlights chosen by Extra Tree calculation beat the individual methodologies. The outcomes have been assessed utilizing the execution estimates, for example, Accuracy, Recall and Precision. We present the results and identify the attributes that contributed most in differentiating ASD in toddlers as per machine learning model used in this study. Keywords: Machine learning Random Tree Disorder Feature selection Classification
Accuracy Autism Spectrum
1 Introduction Communication and behaviour is affected by a neuro developmental disorder termed as Autism Spectrum Disorder (ASD) [1]. Primary identification can significantly decrease ASD but, waiting time for an ASD diagnosis is long [2]. The rise in the amount of ASD affected patients across the globe reveals a serious requirement for implementation of easily executable and efficient ASD prediction models [3–6]. Currently, very inadequate autism datasets are accessible and several of them are genetic in nature. Hence, a novel dataset associated with autism screening of toddlers that contained behavioural features (Q-Chat-10) along with the characteristics of other individual’s that is effective in detection of ASD cases from behaviour control is collected [3–6]. Clinical information mining [7] is the utilization of information mining methods to clinical information. The classification in healthcare is a dynamic area that provides a deeper insight and clearer understanding of causal factors of diseases.
© Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 496–503, 2020. https://doi.org/10.1007/978-3-030-37218-7_56
Autism Spectrum Disorder Prediction Using Machine Learning Algorithms
497
In machine learning classification [8, 9] refers to a factual and numerical strategy for predicting a given information into one of a given number of class values. Feature Selection [8] is a mathematical function that picks a subset of significant features from the existing features set in order to improve the classifier performance and time and space complexity. In this paper, we have used the ASD Toddlers Dataset to evaluate the performance of six feature relevance algorithms viz. Chi Square, Recursive Feature Elimination (RFE), Correlation Feature Selection (CFS) Subset Evaluation, Information Gain, Bagged Tree Feature Selector and k Nearest Neighbor (kNN), and Random Tree classification algorithm. Existing research in the field of machine learning and ASD research is succinctly presented in the following paragraphs. Recently machine learning methods are widely used to improve the diagnosis process of ASD [10–12]. [13] used different machine learning algorithms like Tree models (random forest, decision tree) and Neural Network and Tree models in order to reduce the ASD diagnostic process time duration [14]. Piloted an experimental review for comparison of six major machine learning algorithms viz. Categorical Lasso, Support Vector Machine, Random Forest, Decision Tree, Logistic Regression, and Linear Discriminant Analysis, in order to choose the best fitting model for ASD and ADHD. [15] studied the common issues associated to psychiatric disorders inclusive of external validity, small sample sizes, and machine learning algorithmic trials without a strong emphasis on autism. The authors suggested that one of these classifiers, used individually or in combination with the previously generated behavioural classifiers clearly focused on discriminating autism from non-autism category that is encompassing, could behave as a valuable triage tool and pre-clinical screening to evaluate the risk of autism and ADHD [14]. [16] used FURIA (Fuzzy Unordered Rule Induction Algorithm) on ASD dataset for forecasting autistic symptoms of children aged between 4-11 Years. The results exposed that FURIA fuzzy rules were able to identify ASD behaviours with 91.35% accuracy of classification and 91.40% sensitivity rate; these results were better than other Rule Induction and Greedy techniques. [17] and [18] applied Naïve Bayes, SOM, K-Means, etc. on a set of 100 instances ASD dataset collected using CARS diagnostic tool [19]. Used Random Forest algorithm to predict ASD among 8-year old children in multiple US sites. The results revealed that the machine learning approach predicted ASD with 86.5% accuracy. The above appraisals have concentrated on the comprehensive domain of mining clinical data, the trials encountered, and the results inferred from analysis of clinical data.
2 Methodology The proposed methodology for ASD prediction in toddlers is portrayed in Fig. 1. In this research, ASD patient data collected using Quantitative Checklist for Autism in Toddlers (Q-CHAT) data provided by ASD Tests app have been used for investigation [3–6]. The dataset used to train the classifier and evaluate its performance were downloaded from Kaggle [20]. The dataset contains 1054 samples with 17 features
498
S. Selvaraj et al.
inclusive of class variables. According to WHO report, around 1% of the population has ASD, but this study samples get around 69% Toddlers of the data with positive ASD. It is because the test parameters have only qualitative properties of ASDs.
ASD Data collected from Q-Chat
Data Preprocessing
ASD Training Data
Chi 2, CFS, RFE, Info_Gain, Tree, kNN
ASD Test Data
Feature Selection Algorithms Feature Subset
Random Tree Classifier
Random Tree Classifier
10-Fold Cross Validation
Performance Metrics (ℜACC, ℜSEN, ℜSPE)
PREDICTED ASD PATTERNS
Fig. 1. Proposed prediction model for ASD data
In order to identify the most significant features for classification, six feature selection algorithms viz, Chi Square Co-efficient [21, 22], Information Gain [23–26] based on ranked features, Recursive Feature Elimination (RFE) [27, 28], CFS [23–26, 29] using Best First search method, Tree Classifier [30] and kNN [31] were used. We utilized the feature subsets returned by the feature selection algorithms to estimate the performance of the Random Tree Classifier [23–26], in ASD prediction. An in-depth evaluation revealed the relevance of the features and the predictive power of the classifier in ASD prediction. The confusion matrix [8] for ASD Prediction is given in Table 1. For a binary classification problem (ASD_Positive, ASD_Negative), the following performance parameters [4, 8] were utilized for the classifier performance evaluation [8, 23–25]: Accuracy (ℜACC), Sensitivity (ℜSEN) and Specificity (ℜSPE). The measures are represented as follows:
Autism Spectrum Disorder Prediction Using Machine Learning Algorithms
499
Table 1. Confusion matrix of ASD class variables Actual class
Predicted class ASD_ Positive ASD_ Negative ASD_Positive True Positive (TP) False Negative (FN) ASD_Negative False Positive (FP) True Negative (TN)
jTP þ TN j jTP þ TN þ FP þ FN j
ð1Þ
Blue for the captured fire image. Since the most important component of the fire is the R component, it must be compared with a threshold value RedThreshold. Fire similar alias may result in the detection of fire to be false, to avoid such conditions, the saturation value of the fire must be considered. Thus based on these conditions three decisions are deduced for extracting fire pixels from the video frames. The decisions are described as follows [5]: rule1 : Red [ RedThreshold rule2 : Red Green [ Blue ð255 Red Þ SaturationThreshold rule3 : Saturation RedThreshold If ðrule1Þ AND ðrule2Þ AND ðrule3Þ ¼ TRUE 00
00
then It is a fire pixel 00 00 else It is not a fire pixel SaturationThreshold denotes the saturation value of Red color component of the fire pixel is RedThreshold for the same fire pixel. The saturation value will degrade when the Red component increases in the pixel. 2.2
Dynamic Analysis of Flame
Few fire-like regions from the flame image may be extracted as real fire pixel by chromatic analysis method. This happens in two particular cases: substances which are not burning but possess color similar to fire and backgrounds illuminated by solar reflections, high-intensity lights, and other similar fire like light sources. In the former case, the materials possessing a good amount of red color and fire-like color may lead to false fire pixel extraction. In the other case, when the background is illuminated with solar reflections, burning fires and artificial fire-like lights. Together, they influence fire pixel extraction, making the true fire detection complex and unreliable.
Dynamic and Chromatic Analysis for Fire Detection and Alarm Raising
791
Thus, in order to confirm a burning fire is real or fake, further to chromatic analysis of fires, the dynamic (movement of fire region) disorder and movement are adopted to reduce the detection of fire aliases. Natural fire does not have a constant shape and size. The fire dynamics include sudden movement of flames due to the air flow direction, changeable shapes, and growth rate. To improve the reliability in fire identification, our proposed method uses the combination of both rate of growth of flames and the disorder characteristics of fire and to check if it is a fake fire like region or a real fire. The flame disorder value is measured from the number of fire pixels of the flame difference of two continuous video frames. The final decision rule for this segment of the algorithm is [5]: If
ðjFireDisordert þ 1 FireDisordert jÞ FireDisorderThreshold FireDisordert 00 00 Then Real fire 00 00 Else Flame alias
Here, FireDisordert ¼ Firet ðx; yÞ Firet1 ðx; yÞ Firet(x, y) and Firet-1(x, y) denote the present and previous video frames of fire region respectively. FireDisordert and FireDisordert+1 denote the disorder of the present and next video frames of fire region respectively. FireDisorderThreshold is the predefined firedisorder threshold that helps to discriminate fire from fire-like objects. If all the above decisions are completely satisfied, then it can be concluded that flame is likely to be real-fire. Further, to increase the reliability of the fire detection the disorder of fire region checking should be calculated for d times. The variables, d and FireDisorderThreshold are completely relying upon the statistical data of conducted experiments. 2.3
The Rate of Growth of the Fire Pixels
The growth of the burning fire is dominated by fuel type and air-flow. In the initial burning phase, the flame size gets towards the increasing approach. To identify this growth feature, the fire pixels in the video frame is calculated at regular time intervals and compared. Let the unknowns, ki and ki+1 denotes the quantity of extracted fire pixel of the present frame and that of the next frame respectively. If ki+1 > ki is greater than p times at an repeated duration of tF during the time period T, then this result concludes that there is a probable growth of the fire and this increases the authentication of a real fire. In the algorithm, the variables tF, T and p are completely relying upon the statistical data of the conducted experiments (Fig. 2).
792
P. S. Srishilesh et al.
Fig. 1. (a) Initial burning of flame [3] (b) Extraction of Fire pixels (c) Fire spread
Fig. 2. Amount of fire pixels extracted during the burning of paper fuel
2.4
Smoke Detection
The burning of fire causes smoke, so this can be used as one of our feature for early detection of fire. Generally, smoke displays Grayish colors with two levels: light-gray and dark-gray. These grayish colors of smoke can be well described as the Intensity I in the HSI model. Gray levels vary from LightGrey1 to LightGrey2 and DarkGrey1 to DarkGrey2 for Light and Dark levels respectively. LightGrey1 Intensity LightGrey2 DarkGrey1 Intensity DarkGrey2 The diffusion of the particles and the airflow causes the change of shape in smoke. Therefore, the dynamic features of the smoke can be characterized by growth and disorder of the extracted smoke pixels from the video frames. To calculate the dynamic features of the smoke, the below-mentioned rule can be used [5]: If
ðjSmokeDisordert þ 1 SmokeDisordert jÞ SmokeDisorderThreshold SmokeDisordert 00 00 Then Real smoke 00 00 Else Smoke alias
Dynamic and Chromatic Analysis for Fire Detection and Alarm Raising
793
Here, SmokeDisordert+1 denotes the disorder in the smoke of the present image and the next image, SmokeDisordert denotes the disorder in the smoke of the present image and the previous image. SmokeDisorderThreshold denotes the threshold value of the smoke, calculated using experimented results of the fire. The smoke growth depends mainly on fuel type and air flows. Let the unknowns, ni signify the quantity (number) of smoke pixels derived from the video frame at any time ti. A real smoke can be classified if ni+1 ni for h times according to experimental results.
3 Fire Alarm Raising The false alarm raising is discouraging to humans. The earlier algorithm states that the fire pixels are checked with an adaptive threshold value based decision function which has high sensitivity to change. [10] To overcome this problem, iterative checking of the fire pixels is done, to ensure the correctness of the raising of the alarm. Even though the spread of fire is observed from various locations through cameras, the accuracy of proving that the growth of the fire is increasing is low. To avoid the above problem,
Fig. 3. Flowchart for fire alarm raising
794
P. S. Srishilesh et al.
and improve reliability, the iterative growth checking on the spread of the fire is checked. If the extracted fire pixels from the video frames increases with time, then it can be said that flame is spreading to a higher rate. Let the variables, N denote the number of times of comparing ki and ki+1 at repeated durations of time tR during the time period T and the value R denotes ki+1 ki. R=N 0:7 R=N 0:9 0:7 R=N 0:9
Fire is about to spread out Fire spreads out Determined by fuel and airflow
This is suitable only for the spread of general burning. For violent fuel and rapid flames, we use another strategy, where we check if ((ki+1−ki)/ki) > S. Here, S denotes the growth rate of flames and S 2.
4 Result We tested our fire detection algorithm on various video clips obtained from the internet and captured from real environments. The details about the video clips and the results are tabulated below (Fig. 4).
Video frame
Description
No. of Resolutio Accurac frame n y s
Burning of firewood during the night.
640 x 352
62
98.39%
Burning of dried shrubs 400 x 256 in forest floor in the day time
101
97.03%
Fig. 4. RGB model – grayscale model of fire and smoke.
Dynamic and Chromatic Analysis for Fire Detection and Alarm Raising
Burning of a tree in the forest 400 x 256 during day time Starting with no flame and gradually fire catches up in dark room.
100
426 x 240 1700
100%
99.12%
Small forest fire in the 400 x 240 daytime.
101
49.50%
Street view during the night. There is no fire in this video.
720 x 1280
573
98.78%
Video of entry to the house in an illuminated environment .
720 x 1280
253
97.63%
176 x 144 1370
70.73%
Gradual burning of paper from no-fire to total burning of paper.
Burning of sofa cushion 426 x 172 in a dark room.
94
75.53%
Burning of a wooden 426 x 240 plank in the day time.
500
41.4
Fire in the corner during daytime.
501
99.60%
640 x 360
Fig. 4. (continued)
795
796
P. S. Srishilesh et al.
5 Conclusion The proposed methodology for fire detection is based on analyzing Chromatic and Dynamic feature of the video. The aliased flame can be detected by checking the spread in fire by analyzing dynamic features. With smoke detection, the presence of fire is confirmed. Thus, using Image processing through CCTV cameras, the fire, and its spread can be detected easily, and fire alarm can be raised. The proposed methodology for fire detection is based on analyzing chromatic and dynamic features of the video frames. The chromatic feature is adopted for fire-pixel extraction based on three decision rules applied on the RGB color model and saturation of red component. The real-flame and fire-alias are distinguished based on the dynamic features of the extracted fire-pixels. To reduce raising a false fire-alarm, the amount of fire-pixels extracted are compared over specified intervals of time. The main challenge that was faced in this research was selecting the right thresholds for the various RGB color model constraints. The main reason for this is that the color related values of flame depend on the type of fuel, amount of oxygen present and other background illuminations. The experimental results obtained from our algorithm shows that an accuracy of 94% on average can be achieved. The testing is done on various kinds of flames with different backgrounds, time and illuminations. Higher accuracy can be achieved by setting the thresholds depending upon the specific environment where the static cameras are attached. The limitation of the proposed methodology is that the dynamic features are greatly affected due to the random movements of various in the flame, but still, this cost-effective method can be implemented in forest areas, mountains where human interactions are very less. Datasets Used 1. https://www.nist.gov/video-category/fire 2. https://www.kaggle.com/csjcsj7477/firedetectionmodelkeras-for-video/version/1 3. http://signal.ee.bilkent.edu.tr/VisiFire/Demo/FireClips/ 4. https://www.youtube.com/watch?v=MnlR-J1qX68 Acknowledgment. This proposed work is a part of the project supported by DST (DST/TWF Division/AFW for EM/C/2017/121) titled A framework for event modeling and detection for Smart Buildings using Vision Systems to Amrita Vishwa Vidyapeetham, Coimbatore.
References 1. Nandi, C.S.: Automated fire detection and controlling system (2015) 2. Chi, R., Lu, Z.-M., Ji, Q.-G.: Real-time multi-feature based fire flame detection in video. IET Image Process. 11, 31–37 (2016) 3. Lo, S.-W., Wu, J.-H., Lin, F.-P., Hsu, C.-H.: Cyber surveillance for flood disasters. Sensors 15, 2369–2387 (2015) 4. Gunay, O., Toreyin, B.U., Kose, K., Cetin, A.E.: Entropy functional based online adaptive decision fusion framework with application to wildfire detection in video. IEEE Trans. Image Process. 21, 344–347 (2012)
Dynamic and Chromatic Analysis for Fire Detection and Alarm Raising
797
5. (Chao-Ho) Chen, T.-H., Wu, P.-H., Chiou, Y.-C.: An early fire-detection method based on image processing. In: ICIP, pp. 1708–1710 (2004) 6. Sowah, R.A., Ofoli, A.R., Krakani, S.N., Fiawoo, S.Y.: Hardware design and web-based communication modules of a real-time multisensor fire detection and notification system using fuzzy logic. IEEE Trans. Ind. Appl. 53(1), 559–566 (2017) 7. Celik, T.: Fast and efficient method for fire detection using image processing (2014) 8. Philips III, W., Shah, M., da Vitoria Lobo, N.: Flame recognition in video. Pattern Recogn. Lett. 23, 319–327 (2002) 9. Chen, S., Bao, H., Zeng, X., Yang, Y.: A fire detecting method based on multi-sensor data fusion. In: IEEE International Conference on Systems, Man and Cybernetics, 15–88 October 2003, vol. 14, pp. 3775- 3780 (2003) 10. Ho, C.-C., Kuo, T.-H.: Real-time video-based fire smoke detection system. In: IEEE/ASME International Conference on Advanced Intelligent Mechatronics 2009, AIM 2009, pp. 1845– 1850 (2009) 11. Bondarenko, V.V., Vasyukov, V.V.: Software and hardware complex for detecting forest fire. In: 2012 11th International Conference on Actual Problems of Electronics Instrument Engineering, APEIE, pp. 138–142 (2012) 12. Rinsurongkawong, S., Ekpanyapong, M., Dailey, M.N.: Fire detection for early fire alarm based on optical flow video processing. In: 2012 9th International Conference on Electrical Engineering/Electronics Computer Telecommunications and Information Technology, ECTI-CON, pp. 1–4 (2012) 13. Chen, X.-H., Zhang, X.-Y., Zhang, Q.-X.: Fire alarm using multi-rules detection and texture features classification in video surveillance. In: 2014 7th International Conference on Intelligent Computation Technology and Automation, ICICTA, pp. 264–267 (2014) 14. Kumar, S., Saivenkateswaran, S.: Evaluation of video analytics for face detection and recognition. Int. J. Appl. Eng. Res. 10(9), 24003–24016 (2015) 15. Athira, S., Manjusha, R., Parameswaran, L.: Scene understanding in images. In: Corchado Rodriguez, J., Mitra, S., Thampi, S., El-Alfy, E.S. (eds.) Intelligent Systems Technologies and Applications 2016. ISTA 2016. Advances in Intelligent Systems and Computing, vol. 530. Springer, Cham (2016)
Smart Healthcare Data Acquisition System Tanvi Ghole(B) , Shruti Karande, Harshita Mondkar, and Sujata Kulkarni Department of Electronics and Telecommunication, Sardar Patel Institute of Technology, Mumbai, India {tanvi.ghole,shruti.karande,harshita.mondkar,sujata kulkarni}@spit.ac.in
Abstract. A general healthcare system monitors the parameters and simply display the readings. The records are maintained on paper. This paper aims in building an independent offline system for maintaining records in regions where network connectivity is an issue. After providing basic identification information it automatically logs vital parameters of patients acquired from the sensors. The records are maintained digitally and the system does a personalized data analysis for each patient, as “normality” is subjective to every patient. This system is trained to classify such minute details and create a cluster for each patient individually. As and when patient visit for checkup the data is added to his cluster and analysis is performed on it. Various suggestions are provided to the patient in order to avoid worsening the condition. It is a platform that provides the doctor with the patient’s entire history when they visit a non-digitized area. Keywords: Personalized healthcare · Data analysis · Machine learning · Clustering of data · Classification algorithm
1
Introduction
A general health-care systems monitors the parameters and simply display the readings. Mostly the records are maintained on paper. In recent times, there have been medical advancements leading to digitization and personalized medicines. But still there is a lack of basic health care services in the villages; causing inefficiency in checking the vital signs to identify minor disease. Even today the population in rural areas have to travel large distance even for regular checkups. Pregnant women, babies and aged people may not be able travel that far. Lack of internet connectivity, absence of basic medical facilities and non-availability of patient’s medical history increases the demand for better medical solutions. Thus, the objective is to provide intelligent solution by providing a hassle-free offline health-care system. Cluster analysis or clustering and predictive analysis can be used to provide intelligent solutions in the healthcare sector. In clustering, objects exhibiting similar properties are grouped together in a set forming a cluster. c Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 798–808, 2020. https://doi.org/10.1007/978-3-030-37218-7_88
Smart Healthcare Data Acquisition System
799
The proposed system aims at taking it a notch higher. The records of the patients are maintained digitally and it does a personalized data analysis for each patient. As normality is subjective, it may be normal for one patient and not for the other. It also provides suggestions to the patients of what is to be done and what is to be avoided, in order to prevent worsening the condition.
2 2.1
Related Work Literature Review
A government initiated VSTF survey was conducted on 7th June,2018 in various villages in Sudhagad taluka by students and teachers from Sardar Patel Institute of technology. While conducting the survey in Chive one of the villages, we came face to face with the problems faced by the people living over there. Some of the problems are listed below: • Medical checkups in anganwadi are arranged every 3-months for pregnant women, primary school going children and adolescent girls. • The caretakers appointed by government give medical suggestions based on previous experiences rather than on proper diagnosis. • Proper medical facilities are not available in the villages (neither clinic nor hospitals). • Villagers have to visit Pali for any medical help. 2.2
Existing Solutions on Vital Parameters
Health Care Monitoring System in Internet of Things (loT) by Using RFID [1]: This paper describes a complete IoT and RFID based monitoring system. The sensors senses the parameter and send it to mobile device using RFID technology. Limitations: The patient must be in the range of RFID to acquire the parameters [1]. Heart Rate Monitoring System Using Finger Tip and Processing Software [2]: This paper deals with measurement of heart rate from fingertip using optical methodology which involves IR transmitter and infrared detector rather than any conventional monitoring system. Limitations: It is arduino based, so it can be used for study purpose only [2]. 2.3
Existing Solutions on Zigbee Based System
A Zigbee-Based Wearable Physiological Parameter Monitoring System [3]: This system deals with blood pressure and heart rate where the sensors and circuitry was placed in wrist wrap and at finger tip. Future Scope: It can be further improved by implementing noise reduction techniques and use of oximeter [3].
800
T. Ghole et al.
Transmitting Patient Vitals over a Reliable ZigBee Mesh Network [4]: System is designed that converts pulses acquired from fingertips into beats per minute. Later ZigBee is to display dynamically on a remote system [4]. 2.4
Existing Solutions on Clustering Analysis
Clustering Analysis of Vital Signs Measured During Kidney Dialysis [5]: Analysis of vital information of kidney dialysis is projected in this paper. Vital signs such as pulse rate and respiration rate are analyzed. Analysis of eight patient belonging to different age group was done. For this, a hierarchical clustering method was used and to observe the similarities in vital parameters multi-dimensional dynamic time warping distance is applied. If any parameters deviates from the physiological rhythms then the abnormal conditions can be determined according to the behaviour of the deviation [5].
3 3.1
Proposed System Sensor Integration
The system comprises of various modules which are integrated together. One module comprises of heart rate and blood pressure and the other temperature. The band is mounted on the wrist in order to take heart rate and blood pressure reading. Once the sensor acquire the data, it is sent to to Arm Cortex M0 Discovery Board. The data is send to a host system.
Fig. 1. Hardware block diagram for smart health-care data acquisition system
ARM Cortex. The Cortex-M0 is a processor which is highly energy efficient with very low gate count. Its 32-bit performance is combined with high code density by thumb instruction set. It also provides power control of system components is optimized. For low power consumption integrated sleep modes are available. Increase in sleep mode time or slower processor clock is achieved by
Smart Healthcare Data Acquisition System
801
fast code execution. For time-critical applications, it is deterministic and has high-performance interrupt handling. The number of pins required for debugging is reduced by serial wire debug [6]. Temperature Sensor MCP9700 At the point when the sensor interacts with the body of the patient it gains the temperature in Fahrenheit. For precise readings, temperature sensor must be held either in the palms or the armpits [7]. Features • Analog Temperature Sensor is tiny • Temperature Measurement Range is wide – Extended Temperature is −40 ◦ C to +125 ◦ C – High temperature is −40 ◦ C to +150 ◦ C • Accuracy is ±2 ◦ C (max.) for 0 ◦ C to +70 ◦ C • Operating Current is low i.e.6 µA [7] 3.2
Blood Pressure and Heart Rate Sensor
(1) Systolic Blood pressure: It is the pressure experienced by the artery walls while the blood flows when the heart beats. (2) Diastolic Blood pressure: It is the pressure arteries experience due to blood flow between the two consecutive heart beats. (3) Pulse Rate: It is the number of times a heart beats in a minute. The sensor used provides us with systolic and diastolic blood pressure as well as pulse rate. These values are then compared with the threshold values given in Fig. 2 and the data is analyzed [9].
Fig. 2. General values for blood pressure
802
3.3
T. Ghole et al.
Test Data Generation
Figure 3 explains the block diagram of the software module of the proposed system. Java based desktop application accepts data from the patient. An excelsheet is exported that consists of the patient data. This data-sheet is further loaded as test data. Data is sampled and the model is trained with respect to this selected data. k-mean and hierarchical clustering is used to analyze the data.
Fig. 3. Software flow
3.4
Data Analysis
Orange Data Mining Software: For data visualization, data mining and machine learning an open source software named as Orange is used. It includes a graphical front-end for scrutinized information investigation and hunch information representation, and utilized as a python library. Bio-informatics libraries are used to cluster the data of a person. This software is used for hierarchical clustering of data. Algorithm: The data sheet is created by taking instances of vital parameters of individual patient from web-page. After every new entry the database is updated. This data is stored in .tab file which is loaded and sub-sampled. These subsampled instances are feed to k-mean model. [10] K-mean model uses euclidean distance between the data points and as the distance between the two points reduces the data instances are similar to each other. A dendrogram(tree diagram) represents the discovered clusters and the distance between these clusters with annotations of required parameter. Scatter plot presents the data clusters of informative projections like ECG v/s gender. 3.5
Serial Interface
Tera-Term: To read the data transmitted on the serial port of pc using ARM board we used tera-term which is a open source tool. It bolsters SSH 1 and 2, telnet and sequential port associations. It likewise has a worked in large scale scripting dialect and a couple of other helpful modules.
Smart Healthcare Data Acquisition System
803
Fig. 4. Heart rate, systolic and diastolic blood pressure display
Fig. 5. Heart rate threshold
4 4.1
Experimental Results and Analysis Hardware
The vital parameters were measured and displayed successfully. Figure 4 displays the heart rate, systolic blood pressure and diastolic blood pressure on Tera Term VT terminal. Data is serially received at the port and it’s separated values are displayed respectively. For every new set of sensed data, data is displayed on new line with previously existing values. These results lie within the specified range given in Fig. 5. Figure 6 displays the body temperature value of a person. This measured value is compared with the general threshold value for temperature given in Fig. 7.
804
T. Ghole et al.
Fig. 6. Temperature display
Fig. 7. Temperature threshold
4.2
Software
Data Acquisition: Fig. 8 demonstrates a desktop application that is created using java 8 to accept data from patient and keep individual records of each patient. It accepts patient’s name,date of birth , gender, weight, height. Heart rate systolic and diastolic blood pressure and temperature measured directly from sensors. These details are stored in a database and xml file is generated as depicted in Fig. 9.
Smart Healthcare Data Acquisition System
805
Fig. 8. Desktop application
Fig. 9. Exported excel-sheet
Data Analysis: Data analysis based on hierarchical clustering model of selected data is projection using visualization models i.e. scatter plot and tree diagram. Figure 10 represents a dendrogram of ECG. The annotations can be selected depending on area of focus. Here the dendrogram is generated on bases of ECG annotations and the instances in the clustered data is labeled depending on the stages it belongs i.e. normal, left vent hypertrophy and ST-T abnormal. The selected cluster in dendrogram indicates the instances belonging to same category and they are highlighted in the scatter plot given in Fig. 11. Figure 11 is a scatter plot of ECG v/s gender. The scatter plot is used to display informative projections. The size, shape and colour of the symbols is
806
T. Ghole et al.
Fig. 10. Dendrogram(ECG)
Fig. 11. Scatter plot (ECG v/s gender)
customized. There are six clusters that are created because of five categories, two on Y-axis(gender) namely male and female and three on X-axis(ECG stages) namely normal, left vent hypertrophy and ST-T abnormal. Blue coloured points are under normal category, Red points or bubbles indicate critical situation of left-vent hypertrophy and green points indicate ST-T abnormal category. The size of the bubbles indicate the age group falling under the category. The opaque bubbles indicate the selected instances of the cluster.
Smart Healthcare Data Acquisition System
5
807
Conclusions
An offline healthcare system is a boon to the areas where internet connectivity and basic healthcare is an issue. The doctors visit once every 2 to 3 months which implies basic healthcare is not available to them. The records maintained on paper could be lost or damaged. It implies that the entire patient history may not be available at one place. Aforementioned issues were resolved by providing a basic vital parameter checkup and by storing the patient’s details in an offline database. A clustering model is also used to provide the doctor with the patient’s entire history when they visit such an area. The readings observed from the proposed system and an actual professional were approximately equal.
6
Future Scope
In areas where internet connectivity is available a centralized database could be maintained so that the patient’s history is readily available at a remote location. In order to make the system portable, an offline mobile application could be implemented provided the device has optimum memory and wireless connectivity or Li-Fi technology. Tele-medicine technology can be used to provide expert medical solutions and direct connectivity to doctors. Compliance with Ethical Standards All authors declare that there is no conflict of interest No humans/animals involved in this research work. We have used our own data.
References 1. Khan, S.F.: Health Care Monitoring System in Internet of Things (loT) by Using RFID (2017) 2. Pawar, P.A.: Heart rate monitoring system using IR base sensor and Arduino Uno. In: 2014 Conference on IT in Business, Industry and Government (CSIBIG) (2014) 3. Malhi, K., Mukhopadhyay, S., Schnepper, J., Haefke, M., Ewald, H.: A zigbeebased wearable physiological parameters monitoring system. IEEE Sens. J. 12(3), 423–430 (2012) 4. Filsoof, R., Bodine, A., Gill, B., Makonin, S., Nicholson, R.: Transmitting patient vitals over a reliable zigbee mesh network. In: 2014 IEEE Canada International Humanitarian Technology Conference - (IHTC) (2012) 5. Yamamoto, K., Watanobe, Y., Chen, W.: Clustering analysis of vital signs measured during kidney dialysis (2007) 6. Infocenter.arm.com (2018). http://infocenter.arm.com/help/topic/com.arm.doc. ddi0432c/DDI0432Ccor-tex-m-r0p0-trm.pdf. Accessed 10 Oct 2018 7. CVA, K.: J. Orthop. Bone Disord. 1(7) (2017). https://www.medwinpublishers. com/JOBD/JOBD16000139.pdf
808
T. Ghole et al.
8. American Diagnostic Corporation - Core Medical Device Manufacturer. Stethoscopes, Blood Pressure, Thermometry, and EENT. Adctoday.com (2019). https:// www.adctoday.com/learning-center/about-thermometers/how-take-temperature. Accessed 28 Jan 2019 9. Blood Pressure Sensor - Serial output [1437]: Sunrom Electronics/Technologies. Sunrom.com (2018). https://www.sunrom.com/p/blood-pressure-sensor-serialoutput. Accessed 9 Oct 2018 10. Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24, 881–892 (2002) 11. Disabled World. Blood Pressure Chart: Low Normal High Reading by Age Table. Disabled World, 28 Jul 2018. http://www.disabled-world.com/calculators-charts/ bloodpressurechart.php. Accessed 8 Oct 2018 12. Filsoof, R., Bodine, A., Gill, B., Makonin, S., Nicholson, R.: Transmitting patient vitals over a reliable ZigBee mesh network. In: IEEE Canada International Humanitarian Technology Conference - (IHTC) (2014)
A Sparse Representation Based Learning Algorithm for Denoising in Images Senthil Kumar Thangavel(&) and Sudipta Rudra Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India [email protected], [email protected]
Abstract. In the past few decades, denoising is one of the major portions in imaging analysis and it is still an ongoing research problem. Depending upon some pursuit methods an attempt has been made to denoise an image. The work comes up with a new methodology for denoising with K-SVD algorithm. Noise information has been extracted using the proposed approach. With reference to heap sort image patches are learnt using dictionary and then it is updated. Experimentation says that introduced approach reduces noise on test. The proposed approach is tested on test datasets and the proposed approach is found to be comparatively good than the existing works. Keywords: Image denoising learning Representation
Dictionary K-SVD Image patches Sparse
1 Introduction In current and advanced developments for modern digital imaging analysis denoising plays a important role. Image steganography analysis is the elementary issue of noise elimination and information recovery in image processing that preserves corner and edge details of the image [1]. Image noise is also related to variation of pixels in a image. The variation is related to artifacts in the image. The variation is due to the changes in spatial and frequency domain. The noise varies from image to video. The noise can be removed by suitable algorithms. Noise in image is unwanted and the output form it is unexpected. An image is a digital representation of pixels that means the intensity values. Noise in an image cannot be avoided but they can be minimised. Noise in image is unwanted and the output form it is unexpected. An image is a digital representation of pixels that means the intensity values. Noise in an image cannot be avoided but they can be minimised. Here x represents original intensity value and n refers to noise. Noise can be identified based on earlier information regarding the intensity of pixels. On averaging similar pixels the probability of getting a clear patch is maximum. Noise affects information sometimes. In real time applications like face detection high noise may cause problem, also in low resolution images if SNR is high then the image is not suitable for processing. Even illumination problem results in noise, burring results in noise. The elimination of noise from an image is called denoising, there are many ways © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 809–826, 2020. https://doi.org/10.1007/978-3-030-37218-7_89
810
S. K. Thangavel and S. Rudra
for denoising an image. Applying filters can one of the methods for denoising and some of the filters that can be applied for denoising are mean, median and Weiner filters. y ¼ x þ ny ¼ x þ n Here x is intensity value and n is noise. Noise can be identified based on earlier information regarding the intensity of pixels. On averaging similar pixels, the probability of getting a clear patch is maximum. Periodic noise happens through electromagnetic inference. We can see the periodic noise as discrete spikes and the occurrence of noise reduction can be done using notch. If the raise of noise is found in borders then low pass filter is a remedy for it as it reduces the noise levels. Even the surrounding temperate could have a sufficient effect on noise. Some noise can be even caused by quantization of pixels and it can be removed by using Gaussian filter. There are some noise with random dot patterns and then can be fused with pictures and this results in snow, which is a from of electronic noise. The noise also depends on the shutter speed, even abnormal shutter speed can also result in increased noise. Noise in images can be cleared suing both spatial and frequency domain filters. Some noises are signal dependent Image noise is also related to variation of pixels in a image. The variation is related to artifacts in the image. The variation is due to the changes in spatial and frequency domain. The noise varies from image to video. The noise can be removed by suitable algorithms. Periodic noise could be a regular changes in the image. It can be suitably found and predicted using suitable filters. The sensors that are as part of camera also has a effect on the image characteristics. The sensors can be influenced by heat, fill factor. The sensor data can also be fused for understanding the image characteristics. The uncertainity of noise is the major challenges. It is highly challenging in video as the noise varies in location from frame to frame. Learning the parameters and model the same can help for denoising. Generated model should be able to reconstruct the error in the image set. Patch based approaches divide the image into regions. Each regions is further applied with a set of tasks for learning the noise features. They are then approximated for classifying the type of noise. Architectures namely KrispNet are suitable for noise reduction. The noise as part of audio content in moving pictures can removed using codec. Noise can be removed by improving the preprocessing at input layers. The deep learning could be considered when there are several thousands of images. The uncertainity of image pixels contributing to noise can be considered as a motivation for using deep learning. The fully convolution layer can handle the error in image. Image noise is a part of acquistion. It affects the brightness variation in random manner. Denoising is the basis pre-processing technique for removing the noise in an image/video. Prior to denoising, analyse the noise estimation and type of noise is an important. In most cases Gaussian noise is occurred in image during transmission. Many denoising methods are used such as Un-biased risk estimator, weighted averaging, edge preserving and segmentation for specific application. The distribution function can be identified for type of noise. The value is considered as a feature vector of the any specific noise of the image.
A Sparse Representation Based Learning Algorithm
811
Gaussian noise can be handled using gaussian filter. Noise in cameras can be handled with suitable cameras. It can be used for image enhancement and reconstruction. The results vary on type of noise. Classification is a type of associating the image with a class label. The class label can be a single or multiclass. Training the data with less images will lead to overfitting. Noise is considered to be one of the major performance affecting factor while analysing an image or video, the noise might be due changes in environment, defective lenses, motion based etc… the image can be successfully analysed only of it can be successfully removed with noise. For noise removal several filters has been proposed. The existing algorithms focusses on reducing the noise the largest noise value. That is the region or patch with noise value more than the threshold is minimized. The proposed approach aims to remove noise from image patches and recover the original image from noise-filled image [2, 3]. Researchers have come up with various image denoising formulations with an objective to safeguard the image quality [1], because the noise could have harsh effect on varying applications namely [4–8], signal processing along with video processing [9]. Michal Aharon et al. [10], proposed a technique for dictionary learning and updation of non color image regions. In the past few decades, the K-SVD algorithm has been depicted for dictionary learning for sparse image representation [11]. It has been found out that a sparse representation technique has proved extremely successful technique for Gaussian noise removal. In this paper, we proposed a novel denoising method. Our main aim is to combine the heap sorting technique [12, 13] with dictionary learning method [12, 10] with an objective to remove noise. In this section we discuss about denoising applications and the techniques that we use to denoise an image. There are state of the art methods that focuses on spatial level denoising [14–16]. Michal Aharon et al. [10] came up with an image representation using K-Means clustering technique to handle denoising issue. Dabov et al. [17], exhaustive window matching technique for noise removal. Dabov et al. [18] proposed a filter that monitors shape regions and uses PCA technique for denoising with presenting the disadvantage of very high computational complexity. Zhang et al. [19] an image-denoising process based on PCA and here neighbourhood pixels are grouped. This technique uses the block matching method to group among the pixels with similarly local structures and transformed each group of pixels through locally learned PCA. He et al. [20] a technique using SVD for learning local basis. At stage one dictionary initialization process has been done to provide data overlapping patches. In the second stage denoising phenomenon is applied and at the final stage the dictionaries get updated. L. Ganesh et al. [21] fused two images for noise removal. images are acquired by sensors and other capturing devices. Non-linear filters are used to remove noisy images. image fusion algorithm for impulse noise removal. Then on the filtered image the image fusion techniques are done and the density of noise using image ranges from 10 to 50%. J. Harikiran et al. [22] fused images to remove salt and pepper noise and it can be removed by statistic filters. Fuzzy filters can also be used for noise reduction [23] and all the above said filters reduce the impulse noise Rong Zhu et al. [24] came up with an algorithm that reduces noise and gives noise matrix. The grey value across all the pixel is replaced by median of neighbourhood. Local histogram is used to preserve the original pixel values from noise. the histogram gives the impulse noise pixel value. It is observed that the proposed approach works with better
812
S. K. Thangavel and S. Rudra
precision than existing. J. Aalavandan et al. [25] came up with a method by which the median filter is switched and noise removal is done using two stages, at stage 1 the noise regions are identified and in stage 2 the noise that surpasses above the threshold are removed. Video data has challenges due to motion. Moving pixels are hard to classify due to nature of the video. However it is possible to model the spatial motion of the pixels based on the problem [43]. PSNR can be used as a performance metric. This is part of every video codec in design. A other way is to understand the spatio temporal nature in video and model them as blocks using Laplacian. Motion characteristics can be classified as slow or fast based on the problem characteristics. The estimation error can be considered based on Peak Signal to Noise Ratio [44]. Approaches have been developed that considers the image structure and categorizes the type of noise. As part of the approach noise with PSNR between 20 and 50 db is added and is checked for its estimation in the algorithm [45]. Average PSNR is used for estimating the performance. Certain practices adopt the approach of minimization problem for handling noise. Poisson and Gaussian Noise has been considered for analysis. The approach is compared with algorithms VBM3D, VBM4D, RPCA [46]. Night vision present more challenges as the noise variations are hard to find. Approaches with Bilateral filter has also been proposed [47]. An approach has been followed by Ashish Raj et al. [30] to handle parallel image problem with gaussian noise. The problem is seen as an quadratic optimization problem that has to be minimized. Huanjing Yue et al. [31] used variance maps based on Bayer pattern. Meisam Rakhshanfar et al. [32] followed an approach to overcome noise. This approach detects the damaged edges and recovers the same using soft thresholding. Philippe Martin-Gonthier et al. [33] analysed the noise using power spectral density, they started analysing noise in frequency domain. Zhu Lin [34] used NL-means algorithm for removing noise. Singular Value Decomposition is related with matrix decomposition. The diagonal values are called as singular values. It takes a singular valued matrix by calculating the decomposition. It provides a way for achieving dimensionality reduction. Both Gaussian and impulse noise are removed using this proposed approach. Decomposing pixels into region results in image quality reduction as the pixels affected further affects the neighbouring pixels. Based on NL means algorithm an adaptive means algorithm is also proposed and it could remove mixed noise at a single step. Section 2 presents the proposed denoising approach in detail with formulation. Section 3 discusses the experimental results. Findings are presented in Sect. 4.
2 Proposed Methods In this paper, we address the start of the art image-denoising as a radical problem for image analysis where Y is additive mean, Gaussian noise X and standard deviation r is considered. Thus the representation becomes, Y ¼ Xþr
A Sparse Representation Based Learning Algorithm
813
Motivated by K-SVD [10], we proposed the strategies for denoising with layers such as dictionary learning, patch extraction, and updating and the last layer deals with denoising methodology. Here, we represent the basic K-SVD process with a constrain mindx jX DY jp2 representing the basic K-SVD phenomenon for image denoising process; Subject to 8, where, xi ¼ ek for some k; mindx jY DX jp2 ; Subject to 8 where, k yk ¼ 1. Again, for P updating, mindx jX DY jp2 : Subjected to 8;NXI np or mindx kxi k subjects to 8i , kX DY kF2 2 min np . Here D is considered to be a fixed coefficient matrix. X 1 2 X X w E kX DY kF2 ¼ i i i 2 In this study, our proposed algorithm fills the values in the dictionary regions after the training phase [21] and updation happens continuously. 2.1
Stage-1
Among all these methods being presented in the literature section, K-SVD algorithm being presented by Michal Aharon et al. [10] has become one of the wide potential candidates. This approach aims to find the best coefficient vector x while the dictionary U is provided. At this stage coefficient matrix is calculated. With using the dictionary updation rule, the K-SVD extracts noisy elements. At the process of dictionary updation stage [22], the optimal function of the K-SVD algorithm is being represented as follows: F P kZ DX kF2 ¼ Z N1 M dx 2 P ¼ Z N6¼M dx dj ¼ kEK dx kF2 The kth row in X is defined as xtj which is not mentioned as due to the mathematical ambiguity and as well as the dx is the k th columns in dictionary D. Matrix EK is represented as the error matrix in where the kth atom is removed. The definition of atom dk is as follows: dk ¼ fjj1 k; xtj ðiÞ 6¼ 0g: With reference to the above process we could infer that after the training process the dictionary gets updated. the training set is used in an iterative manner the dictionary learning for patches gradually gets improved. Let be r - noisy pixels and let x is the characteristic matrix that is being defined as follows
814
S. K. Thangavel and S. Rudra
x ¼ f1 if ði; j 2 0Þ0; otherwise Then the above equations are being formulated as follows !F X I X X X x E K dx X dx þ X E2 þ rij J N6¼M 2
In the above equation the noise matrix and characteristics matrix are multiplied. E 2 is a fidelity operator covering all the patch-based atom. Algorithm 1: Dictionary Initialisation Input- noisy image. Output- unordered image regions. Step-1: Let i, j pixels of the input data Step-2: Set the region size (a, b), step size and loops. Step-3: Column vectors Patches, Step-4: K- Number of iterations Step-5: Unordered pixel extraction with the proportion of Step-6: Stop
Here from algorithm depicted above, it is inferred that from an input, a window of size of a b – reference patch from the input image and its function as f ða; bÞ. The aforementioned algorithm after continuous iteration extracts overlapped patches. 2.2
Stage-2
The overlapped and unordered image is given as input to the sequential algorithm in stage 2 and the output of the algorithm is patches for image denoising. Algorithm-2: Linear Algorithm Input: Crossed over image regions. Output: Identify of regions for image noise minimization. and other regions f ( ) = b and let the Step-1: Noise REGION Loop count K=1 Step-2: region ( ,) selection and max criterion fitting Addition to the residual patches and process gets updated. Step-3:
, Linear space projection
Step-4: If K < 1 → End. else Increment K by 1 and move to step 2 Step-5: Exit the algorithm.
A Sparse Representation Based Learning Algorithm
815
The noise pixels are initialized and the residual patches are selected. The iteration starts with this process. Then the selected patches are allowed to fit the criterion mentioned in step 2. Let X → the support-vector of r K (X) → column vectors j of K for X.
0 0 2 M ¼ maxk2T=K ðX Þ K ð X Þ K ð X Þ1 K ð X Þ k The stopping condition satisfies when K < 1. Given, set of atoms x ¼ x1 ; x2 ; x3 ; . . .. . .. . .. . .. . .xn; xn þ 1 . Here, for the sequential algorithm, that often aims at detecting a set of dictionary vector, represented as D ¼ d1 ; d2 ; d3 ; . . .. . .. . .. . .. . .dn; dn þ 1 . For which, each atom n 2 N atoms can be Pm well represented by a combination of fdJ gM j¼1;2...n . Such that y ¼ n¼1 al bl and such, coefficients al are zero or close to zero. The algorithm can be formulated with respect to P max 3 2 the area and the denoising phenomenon represented as minD;ci ;pmax pi¼1 4kbi Dci k ksci k2 subject to kdi k = 1.0, 1.0 i n. mindx jX DY jp2 Subject to 8NYI np Here max heap property is utilized [24] and is given as is given as S½ParentðiÞ SðiÞ Which means the value of parent node is higher than the child. can have. Thus in the proposed heap structure the maximum value is stored at the root. Z ¼ f1;
if jX DY jp2 NYI 0;
if jX DY jp2 ¼ NYI 1;
if jX DY jp2 NYI
We know that NXI np represented above. In such, the optimal performance for the dictionary D is shown below in the equation. ! F X I X X X maxEK ;z=Y EK dy Z dY þ Y E2 þ rij J N6¼M 2
Here E 2 [ np ; such that the near-optimal node selection is simplified. Therefore, the time complexity for denoising is minimized. With respect to dictionary search is done with the formula given below. maxEK ;z=Y k
P
P
EK dy k Z
N6¼M
dy
kxi Dci k2
W
F P PI þ ðXE2 Þ þ J rij 2
816
S. K. Thangavel and S. Rudra
In the above Equation W represents the patch atoms provided. The max heap property is as follows. 1N
0 B B B B1 þ B B @
C C C 1 C F P C P P P I 2 dy þ maxE ;z=Y k EK dy k Z r C ðXE Þ þ K N6¼M J ij A 2
kxi Dci k2 W
After the atoms are selected the sorting begins. To compute the sparse matrix minimization technique is followed. sparse matrix is found out and finally, with the constraint as W >> 1, the sorting process starts in ascending fashion.
minD;ci ;pmax
!F X dy þ ukci k2 kx Dci k2 skci k2 Z 4 M6¼N
pmax X 3 i¼1
2
With the induction of this process in K-SVD [10] we could get better results with respect to PSNR. The proposed approach may learn with less accuracy from the trained sets but the noise removal is inferred to be comparatively good [10]. This methodology removes impulse noise in images. 2.3
Stage-3
The next phase presents the dictionary updating process. The entire patched from the above step are given as input. So, for this phase fixing all the nij and for each atom dn; , where n varies from 1, 2, 3….in Dictionary D. Select the region kl which will be used by these atoms ;k l ¼ ðn; mÞjnij ðlÞ 6¼ 0. So for each of the patches (a, b) 2 kl . The residual is completed using the below said formula 8 among each patch; (a, b) that belongs to k l elij ¼ dij u Dnij þ dJ and xlij ¼ Dij x represents candidate pixels among the region of small image size pffiffiffi pffiffiffi n n from the location ða; bÞ of the image.
A Sparse Representation Based Learning Algorithm
817
Now set El ¼ ðelab Þa;b2kl , i, j for the image window and is stated below in algorithm 3. Algorithm 3: Updating process Input → Deteriorated Image Output→ Recovered Image Step-1: Set , a, b input image segments Step-2: Initialise U, S, V vectors for ; Step-3: Across all the patches [U, S, V] = svd ( ) k; in image with the entries of s1, v1, dij. Step-4: Coefficient replacement of atom Step-5: Stopping condition verification. else then switch to algorithm-2 (for heap sort). Step-6: Recovered images as = Step-7: Stop.
3 Experiments and Results We present our results with four of the grayscale test images of size X Y such as Barbara ð512 512Þ; Boat (512 512), House (512 512) and Lena (512 512). As these grayscale test images are generally used to validate the state-of-the-art noise removal techniques. The noise deviation (r) is varied from 10 to 50 and as the intensity of each pixel ranges from 0 to 255. the proposed approach is tested with the grayscale images as shown in Fig. 1.
Fig. 1. Grasycale images used to test the proposed methods (a) Lena (2) Barbara (3) House (4) Boat
(a) Evaluation measures The peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) [26, 27] is adopted to evaluate the experimental results of denoising methods.
818
S. K. Thangavel and S. Rudra
Peak Signal-to-Noise Ratio (PSNR): PSNR is one of the most widely used image quality measures in the imaging analysis. The PSNR for the given input is shown below. ^ =255: PSNR ¼ 20 log10 ð X Þ 10 log10 X X ^ is the restored image or the reconstructed image and X is the original Where, X image in all of the experiments with dictionary size as 64 256, whereas image patch size as 7 7. Therefore, this is enough to visualize the convergence. In this study, we have taken the window size= 10 and step size= 20. Structural Similarity Index Measure (SSIM): Computation of similarity between two images is given based on the recovered image. Xk ^ ¼1 ^ SSIM X; X SSIMJ X; X J¼1 k
2r1 r2 r12 2r12 Where SSIMJ ¼ 2 ¼ 2 2 r1 r2 r1 þ r2 r1 þ r22 ^ [ Recovered image Herein, X X → Original Input data and the corresponding similarity values are stored in Table 1. 3.1
Results
The performance of the proposed approach is compared with methods as discussed above [21]. The programming language used to implement the proposed approach is python and visualizations are done using MATLAB. As shown in Figs. 2, 3, 4 the proposed approach is tested with various images such as Lena, Barbar and Boat respectively as shown in Table 1. The dictionary being is shown in Fig. 5, each of size shown as 7 7 pixel. The proposed approach is executed for 1000 using the dataset with 1000 images each as shown in Fig. 5. Figures 6, 7, 8, 9, 10 and 11 shows the variation in PSNR and SSIM for the proposed approach.
A Sparse Representation Based Learning Algorithm
819
Table 1. Computing of PSNR (dB) and SSIM values of different denoising methods on test images with different noise levels Input
Noise Levels K-SVD LPG-PCA BM3DSAIST Proposed (⌠) SAPCA PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM
Barbara 10 20 30 40 50 Boat 10 20 30 40 50 House 10 20 30 40 50 Lena 10 20 30 40 50 Avg.
28.85 23.20 21.50 19.39 18.54 32.68 27.48 26.39 25.48 23.99 30.65 25.70 24.23 21.53 20.36 29.11 23.95 22.73 20.18 19.13 24.253
0.913 0.896 0.844 0.819 0.796 0.898 0.856 0.832 0.814 0.805 0.928 0.916 0.883 0.854 0.813 0.897 0.786 0.774 0.759 0.735 0.840
36.84 35.58 33.67 31.89 26.24 34.58 31.56 30.55 29.67 28.35 35.46 33.25 31.79 29.32 28.99 34.12 30.24 28.35 26.85 25.72 31.151
0.913 0.896 0.844 0.819 0.796 0.927 0.925 0.916 0.812 0.809 0.936 0.929 0.921 0.810 0.801 0.927 0.861 0.823 0.810 0.806 0.864
24.74 23.72 22.90 22.29 21.84 34.58 31.56 30.55 29.67 28.35 31.31 29.67 28.49 27.23 27.15 27.63 26.70 25.43 25.38 25.01 27.21
0.931 0.929 0.896 0.873 0.851 0.916 0.905 0.887 0.856 0.843 0.927 0.917 0.897 0.876 0.854 0.925 0.897 0.862 0.847 0.813 0.885
38.56 36.77 33.54 31.56 28.92 39.32 37.77 34.45 30.26 28.75 43.56 40.75 37.89 35.14 31.82 41.58 39.24 36.87 31.98 27.36 35.305
0.924 0.907 0.875 0.843 0.819 0.923 0.917 0.892 0.873 0.849 0.917 0.879 0.854 0.837 0.825 0.929 0.917 0.885 0.864 0.843 0.879
52.93 46.65 40.36 33.37 31.74 47.64 45.28 43.44 42.67 38.79 51.62 49.77 48.15 38.55 30.72 52.90 47.90 44.10 32.56 25.36 42.225
0.932 0.927 0.915 0.909 0.887 0.926 0.918 0.906 0.814 0.807 0.935 0.931 0.927 0.920 0.917 0.931 0.925 0.919 0.906 0.899 0.9076
Fig. 2. (a) Original image. (b) Noisy Image (c) K- SVD (d) LPG –PCA (e) BM3D- SAPCA (f) SAIST (g) Proposed method.
820
S. K. Thangavel and S. Rudra
Fig. 3. Visual test results on Barbara image computed by various algorithms. (a) Original image. (b) Noisy Image (c) K- SVD (d) LPG –PCA (e) BM3D- SAPCA (f) SAIST (g) Proposed method.
Fig. 4. Boat image computed – Existing vs Enhancement
Fig. 5. Visual representation of trained dictionary images of the test images computed from proposed method. (a) Lena, (b) Barbara, (c) House, (d) Boat.
A Sparse Representation Based Learning Algorithm
Fig. 6. .
Fig. 7. Proposed approach vs existing approaches on Lena and Barbara
Fig. 8. Proposed approach vs existing approaches on (c) House, (d) Boat.
821
822
S. K. Thangavel and S. Rudra
Fig. 9. Proposed approach vs existing approaches on Lena
Fig. 10. Proposed approach vs existing approaches on Barbara
A Sparse Representation Based Learning Algorithm
823
Fig. 11. Proposed approach vs existing approaches on Lena and Barbara
Fig. 12. Proposed approach vs existing approaches on Boat.
4 Conclusions From this research we could infer that the formulated approach gives better PSNR, SSIM compared to other approaches due to max heap sorting strategy. The proposed approach is said to be a hybrid makeup of dictionary learning and heap sort technique. The results presented in Sect. 4 show how the proposed approach reconstructs the deteriorated images The quantitative and qualitative results on various test images show that our proposed algorithm can be used to denoise an image effectively.
824
S. K. Thangavel and S. Rudra
Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Wang, G., Wang, Z., Liu, J.: A new image denoising method based on adaptive multiscale morphological edge detection. Math. Probl. Eng. 2017, 11 (2017). https://doi.org/10.1155/ 2017/4065306. Article ID 4065306 2. Palhano Xavier de Fontes, F., Andrade Barroso, G., Coupé, P., et al.: Real time ultrasound image denoising. J. Real-Time Image Process. 2017, 15–22 (2011). https://doi.org/10.1007/ s11554-010-0158-5 3. Guo, Q.: An efficient SVD-based method for image denoising. IEEE Trans. Circ. Syst. Video Technol. 26(5), 868–880 (2016) 4. Han, Y., Xu, C., Baciu, G., Li, M.: Lightness biased cartoon-and-texture decomposition for textile image segmentation. Neurocomputing 168, 575–587 (2015). https://doi.org/10.1016/ j.neucom.2015.05.069 5. Liang, H., Weller, D.S.: Comparison-based image quality assessment for selecting image restoration parameters. IEEE Trans. Image Process. 25(11), 5118–5130 (2016) 6. Hao-Tian, W., et al.: A reversible data hiding method with contrast enhancement for medical images. J. Vis. Commun. Image Represent. 31, 146–153 (2015) 7. Gu, K., Tao, D., Qiao, J.F., Lin, W.: Learning a no-reference quality assessment model of enhanced images with big data. IEEE Trans. Neural Netw. Learn. Syst. 99, 1–13 (2017) 8. Hung, K.W., Siu, W.C.: Robust soft-decision interpolation using weighted least squares. IEEE Trans. Image Process. 21(3), 1061–1069 (2012) 9. Mairal, J., Sapiro, G., Elad, M.: Learning multiscale sparse representations for image and video restoration. Multiscale Model. Simul. 7(1), 214–241 (2008) 10. Aharon, M., et al.: K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process. 54(11), 4311–4322 (2006) 11. Yang, J., et al.: A new approach to sparse image representation using MMV and K-SVD. In: Blanc-Talon, J., Philips, W., Popescu, D., Scheunders, P. (eds.) ACIVS 2009. LNCS, vol. 5807. Springer, Heidelberg 12. Hwang, H., Haddad, R.: Adaptive median filters: new algorithms and results. IEEE Trans. Image Process. 4, 499–502 (1995) 13. Ko, S.J., Lee, Y.H.: Center weighted median filters and their applications to image enhancement. IEEE Trans. Circuits Syst. 38, 984–993 (1991) 14. Tomasi, C., Manduchi, R.: Bilateral filtering for gray and color images. In: IEEE International Conference on Computer Vision, ICCV - 1998, pp. 839–846 (1998) 15. Dai, T., Lu, W., et al.: Entropy-based bilateral filtering with a new range kernel. J. Signal Process. 137, 223–234 (2017) 16. Buades, A., Coll, B., et al.: A non-local algorithm for image denoising. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR-2007, pp. 60–65 (2007) 17. Dabov, K., et al.: Image denoising by sparse 3-D transform-domain collaborative filtering. IEEE Trans. Image Process. 16(8), 2080–2095 (2007) 18. Dabov, K., et al.: BM3D image denoising with shape-adaptive principal component analysis. In: Proceedings for Workshop of Signal Processing and Adaptive Sparse Structural Representation (SPARS), Saint-Malo, France, April 2009, HAL Id: inria-00369582
A Sparse Representation Based Learning Algorithm
825
19. Zhang, L., et al.: Two-stage image denoising by principal component analysis with local pixel grouping. J. Pattern Recogn. 43(4), 1531–1549 (2010) 20. Gan, T., Chen, W., et al.: Adaptive denoising by singular value decomposition. IEEE Signal Process. Lett. 18(4), 215–218 (2011) 21. Ganesh, L., Chaitanya, S.P.K., Rao, J.D., Kumar, M.N.V.S.S.: Development of image fusion algorithm for impulse noise removal in digital images using the quality assessment in spatial domain. IJERA 1(3), 786–792 (2014) 22. Harikiran, J., Saichandana, B., Divakar, B.: Impulse noise removal in digital images. IJCA 10(8), 39–42 (2010) 23. Chauhan, J.: A comparative study of classical and fuzzy filters for impulse noise reduction. IJARCS 3(1), 416–419 (2012) 24. Zhu, R., Wang, Y.: Application of improved median filter on image processing. J. Comput. 7 (4), 838–841 (2012) 25. Aalavandan, J., Santhosh Baboo, S.: Enhanced switching median filter for de-noising ultrasound. IJARCE 3(2), 363–367 (2012) 26. RatnaBabu, K., Arun Rahul, L., Vineet Souri, P., Suneetha, A.: Image denoising in the presence of high level salt and pepper noise using modified median filter. IJCST 2(1), 180– 183 (2011) 27. Sreedevi, M., Vijay Kumar, G., Pavan Kumar, N.V.S.: Removing impulse noise in gray scale images using min max and mid point filters. IJARCS 2(6), 377–379 (2011) 28. Huai, X., Lee, K,. Kim, C.: Image noise reduction by dynamic thresholding of correlated wavelet intensity and anisotropy. In: 2010 International SoC Design Conference, Seoul, pp. 28–33 (2010) 29. Chen, H.: A kind of effective method of removing compound noise in image. In: 2016 9th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), pp. 157–161. IEEE, October 2016 30. Raj, A., Wang, Y., Zabih, R.: A maximum likelihood approach to parallel imaging with coil sensitivity noise. IEEE Trans. Med. Imaging 26(8), 1046–1057 (2007) 31. Yue, H., Liu, J., Yang, J., Nguyen, T., Hou, C.: Image noise estimation and removal considering the Bayer pattern of noise variance. In: 2017 IEEE International Conference on Image Processing (ICIP), Beijing, pp. 2976–2980 (2017) 32. Rakhshanfar, M., Amer, M.A.: Low-frequency image noise removal using white noise filter. In: 2018 25th IEEE International Conference on Image Processing (ICIP), Athens, pp. 3948– 3952 (2018) 33. Martin-Gonthier, P., Magnan, P.: CMOS image sensor noise analysis through noise power spectral density including undersampling effect due to readout sequence. IEEE Trans. on Electron Devices 61(8), 2834–2842 (2014) 34. Lin, Z.: A nonlocal means based adaptive denoising framework for mixed image noise removal. In: 2013 IEEE International Conference on Image Processing, Melbourne, VIC, pp. 454–458 (2013) 35. Skretting, K., et al.: Recursive least squares dictionary learning algorithm. IEEE Trans. Signal Process. 58(4), 2121–2130 (2010) 36. Michael, E., Sapiro, G., et al.: Sparse representation for color image restoration. IEEE Trans. Image Process. 17, 53–69 (2008) 37. Carlsson, S., et al.: Heaps with bits. Theor. Comput. Sci. 164, 1–12 (1996) 38. Gao, L., et al.: Image denoising based on edge detection and pre-thresholding Wiener filtering of multiwavelets fusion. Int. J. Wavelets Multiresolut. Inf. Process. 13(5), 15 (2015). Article ID1550031 39. Wang, Z., et al.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13, 600–612 (2014)
826
S. K. Thangavel and S. Rudra
40. Larkin, K.G.: Structural Similarity Index Simplified. Nontrivialzeros Research, pp. 1–4, May 2015 41. Gautam, K.S., Senthil Kumar, T.: Video analytics-based intelligent surveillance system for smart buildings. Springer Soft Comput. 23, 2813–2837 (2019) 42. Gautam, K.S., Senthil Kumar, T.: Video analytics-based facial emotion recognition system for smart buildings. Int. J. Comput. Appl. 1–10 (2019) 43. Roy, S., Sengupta, S.: An improved video encoder with in-the-loop de-noising filter for impulse noise reduction. In: 2006 International Conference on Image Processing, Atlanta, GA, pp. 2605–2608 (2006) 44. Ghazal, M., Amer, A., Ghrayeb, A.: A real-time technique for spatio–temporal video noise estimation. IEEE Trans. Circuits Syst. Video Technol. 17(12), 1690–1699 (2007) 45. Amer, A., Dubois, E.: Fast and reliable structure-oriented video noise estimation. IEEE Trans. Circuits Syst. Video Technol. 15(1), 113–118 (2005) 46. Li, W., Yin, X., Liu, Y., Zhang, M.: Robust video denoising for mixed Poisson, Gaussian and impule noise. In: 2017 2nd International Conference on Image, Vision and Computing (ICIVC), Chengdu, pp. 459–462 (2017) 47. Miyamae, K., Gohshi, S.: Noise level detection in general video. In: 2018 International Workshop on Advanced Image Technology (IWAIT), Chiang Mai, pp. 1–4 (2018)
A Collaborative Mobile Based Interactive Framework for Improving Tribal Empowerment Senthilkumar Thangavel1(&), T. V. Rajeevan2, S. Rajendrakumar3, P. Subramaniam4, Udhaya Kumar4, and B. Meenakshi1 1
2
Department of Computer Science, and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India [email protected], [email protected] Department of Social Work, Amrita Vishwa Vidyapeetham, Coimbatore, India [email protected] 3 Centre for Sustainable Future, Department of Chemical Engineering and Materials Science, Amrita Vishwa Vidyapeetham, Coimbatore, India [email protected] 4 Tribal Research Centre, Department of Tribal Welfare, Government of Tamil Nadu, Udhagamandalam, India [email protected], [email protected]
Abstract. Handling the user interactions and improving the responsiveness of the system through collaborative Recommendation system is the proposed framework. The application will have an interactive form that takes the user details. On valid user access the user is provided with options under a gallery as: Location wise Statistics, Tribes culture, Tribes Request and Tribal activities. The tribal users should be able to present their food products as part of the system. The tourist guest should be able to view the food products and place orders. Recommendation can be provided to the guest based on the past experience. The application will also facilitate for Geo Tagging of the users including tribes to improve their participation. The application will be developed using Android with cloud namely. The user statistics will be presented for region wise in a more visual manner. The Smart phones play a vital role in everyone’s life. In smart phones we have so many applications and the applications need to be tested. The mobile applications are tested for its functionality, usability, and for its consistency. There are two types of mobile application testing. They can be of non-automatic or automatic testing. In this study automated approach is discussed. An android based mobile testing needs native test applications which is used for testing a single platform. Keywords: Client-server model and android Product details Data visualization Geo-tagging Recommender systems User profile Application framework Application testing
© Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 827–842, 2020. https://doi.org/10.1007/978-3-030-37218-7_90
828
S. Thangavel et al.
1 Introduction Application provides possibilities that did not subsist before, it facilitate as the tools for native tribe to understand and compete on a world level. This gives an opportunity for economic development and the potential to be part of the mainstream society. Simple application can gain the superior of life for native tribes by introducing them to things that make their daily lives easier, in turn this can gain comforts and substantial growth of production in that social group. The important goal of the Undertaking project is to interact with users and to give the sustainable economic growth to the tribal people. The application will have a User Registration form to fill the user details. After validating the details of user it will take to the page that gives the details about Tribal community. In that page we will also have the list of Indigenous foods. The user can place the order. And also the application provide a form to hear the customer reviews. By using application We can share the link on social medias to get more users. The app can also facilitate Geo-tagging for sharing the location and sharing the photos took in tribal areas for getting more users. The app can also add a page to sale the Tribal products for improving their Economic condition. Tribes in India have been underprivileged of possibilities due to different factors. Cardinal factor is difficulty of appropriate base for the improvement idea to scope them. It is broadly accepted that Technologies are potential to show a cardinal function for the betterment of society (Vinay Kumar, Abhishek Bansal). A mobility today, is really a little machine in the hands of the farmer. Investing on this information, we can turn the cell phone into a device that provides fast information about various entities (Augmented Reality in Agriculture). Here is a communication for an effectual user interface between the conventional and rising info systems while initiating the technologies in tribal areas. It is much than because tribal are usual for acquiring data by word of mouth from a best-known and trustworthy source. It is an ambitious undertaking to take this social group to the analogue textual based info group. 1.1
Area of Study
The Nilgiris is one of the smallest districts of Tamil Nadu (Shodh Ganga). It is also called as ‘blue mountains’ located on southwest of Coimbatore district. The average height of hills is around 6500 ft. bound with Kerala and Karnataka states. For administrative purpose the region is separated into six taluks viz. Udhagamandalam, Gudalur, Pandalur, Coonoor, Kotagiri and Kundah (Connie Smith-2007). Tribes like Todas, Kotas, Kurumbas, Irulas, Paniyans and Kattunayakans are found in this district; all tribes are ‘Indigenous groups’ of Nilgiris district (TRC and HADP- 2007). The population of tribe was 4.32% and it was grown up to 4.46% in 2011 (TRC, 2011). Reported to 2011 census, the whole population of the Nilgiris District is 7,35,394 and recorded negative growth compared to the previous census. Scheduled caste comes around 2,35,878 and Scheduled tribe 32,813 representing 4.46% of the whole generic population. The tribal population in Nilgiri district is not equally shared to the six taluks. 3,015 of them surviving in Udhagamandalam, 944 in Coonoor, Kotagiri 1707, Gudalur 3015.
A Collaborative Mobile Based Interactive Framework
829
The population of Scheduled Tribes (STs) in the land, as per Count 2011 is 10.45 crore. STs constitute 8.6% of the country’s total growth population and 11.3% of the total countrified population. People of ST male is recorded 5.25 crore and ST female is 5.20 crore.
2 Related Work GoI has introduced many websites and mobile applications for uplifting tribal communities and develops their social and economic status. The mobile applications come under the fields of Agricultural, Health, Educational, Social welfare applications, egovernance and e-commerce Apps. It works both as a aid supplier and class creator for social group products. It trades social group products through the mesh of its marketing outlets ‘TRIBES INDIA’ in the nation. As a capability material, it also contribute education to ST Artisans and MFP gatherers. TRIFED purchased tribal products worth Rs. 846.96 lakhs as on 31.12.2017. TRIFED has 1213 Individualist/Self Help Groups/organizations, etc., as its empaneled indefinite quantity which are connected to above 69247 tribal families. Data for the previous years are Sl # 1 2 3 4 5
Year 2008–09 2009–10 2010–11 2011–12 2012–13
Purchase tribal products (Rs. in lakhs) 681.78 609.34 656.35 719.58 880.55 (as on 28.2.2013)
TRIFED has entered into statement with e-commerce platforms Like Snap deal and Amazon who will offer their customers assorted Tribal products and produce through their portals www.snapdeal.com and www.amazon.com to facilitate on-line sale. Ministry of Commerce has also made condition for sale of tribal products through TRIFED on www.gem.gov.in. 2.1
TRIFED Method of Wholesale Merchandising
TRIFED intent is to change the resource of the gathering by making up a property industry and make over concern chance for them based on their cultured knowledge and traditional skills whilst assuring fair and equitable payment. It involves designation selling possibilities for selling of social products on a property base, make overing the trade name and rendering different employment. 2.2
Source of Tribal Products
TRIFED has a mesh of 13 Location Agency crosswise the country that determines the sources of tribal products for marketing through Its wholesale selling mesh of TRIBES INDIA outlets. TRIFED has been trading source for respective handicrafts,
830
S. Thangavel et al.
hand-looms and natural & food products through its empaneled providers across the country. The providers consist of respective tribal artisans, tribal SHGs, Organizations/Authority/NGOs operating with tribal. The providers are empaneled with TRIFED as per the guidance for empanelment of suppliers. 2.3
Marketing of Tribal Products
TRIFED has been merchandising tribal products through its Wholesale Outlets located across country and also through expo. TRIFED has implanted a series of 35 personal showrooms and 8 consignment showrooms in association with State level Organizations encouraging their products. 2.4
Exhibitions
The exhibitions for the tribal people own products have been organized by TRIFED under national level and international level Expo in the names of AADILSHIP, AADICHITRA, and OCTAVE. On Process, TRIFED has been dealing National Tribal Expo- Aadilship Tribal artisans/groups/organizations who are invited to take part in this Exhibition and vitrine their wealthy Tribal practice. The important accusative in keeping these consequence is to render an chance to tribal artisans to background their cultural crafts and to move directly with art lovers to learn about their taste and preferences. This aids them in varying their product designs and creations accordantly. The event also considers tribal dance performance at times. It is an try to existing tribal art and culture in sacred manner, which has been accepted well by the user. TRIFED also organizes Aadichitra’- an presentation of tribal paintings, in which tribal paintings are solely demonstrated and sold-out. The tribal creator are also requested to demonstrate their art in this exhibitions. Pleased by the outcome, TRIFED has been dealing Aadichitra’s at respective determination across the country. TRIFED also take part in the OCTAVE – dancing celebration of North Eastern Region, arranged by GoI. TRIFED is also connected to the outcome from 2008-09 forth. TRIFED makes involvement by artisan from North Eastern Region and ply them the possibles to case and sell their products. TRIFED participates in global exhibitions/trade fairs through Export Promotion Council for Handicrafts (EPCH) and India Trade Publicity Administration (ITPO) in different countries for displaying and merchandising the products beginning from artisans. TRIFED is involved in the procurement and selling of tribal art and craft items through its series of 40 marketing sales outlet called TRIBES INDIA which offers a range of tribal products, which include • • • • •
Metal Craft Tribal Textiles Jewelry Tribal Paintings Cane & Bamboo
A Collaborative Mobile Based Interactive Framework
831
• Pottery • Gifts and Novelties • Organic and Natural products The Ministry of Tribal Affairs has been highly-industrial an android based technology called Tribal Diaries for internal monitoring as also conjunctive with agency/authorized concern with implementation of schemes/programs for tribal development. The application is for licensed users. This usage caters a chance for optic activity in terms of pictures, video recording, transferring reports of official tours/inspections and sharing of best practices etc. The application is largely used to get an overview of the Ekalavya Model Residential Schools (EMRSs) funded by the Ministry of Tribal Affairs. The principals of the EMRS are being encouraged to use the application and create projects highlighting the physical infrastructure of the schools, special achievement of the students and share success stories. In so far as grievances are concerned, the Centralized Public Grievance Amends and Monitoring System (CPGRAMS), an on line web-enabled system is the platform which initially aims to change submission of grudges by the afflicted citizens from anywhere and anytime. Besides, grievances are also physically received in the Ministry. The Ministry scrutinizes and takes action for speedy redressal of grievances besides tracking them for their disposal as well through a dedicated Division for the purpose. (Info was rendered by Government of india Shri Jaswantsinh Bhabhor in a written reply in Lok Sabha on 19th March 2018 (PIB- GoI-Ministry of Tribal Affairs). Non-timber forest land products (NTFPs) plays a pivotal function in improvement and living of tribal people crosswise the world. In the world more people are lodging in the forestland, being on NTFPs for subsistence financial gain and livelihood. NTFPs are considered to be important for prolonging rural support, reducing rural poverty, biodiversity conservation and rising rural economic growth. When we see the setting from which a tribal youth operates, we see a few marks that differentiate them from the non-tribal youth. A tribal youth comes from egalitarian society bounded by traditional norms and a low technological level and with less urban impact. The tribal regions were often isolated and lacked proper means of communication which again helped them to follow their own culture and to maintain its diacritical mark (Sachindra). In constructing the India of 21st century the capability of the tribal youth, their diligence, agility, intelligence, and perseverance will prove quite effective. With their dynamic n mental make-up and continuous efforts to make a dent on the national 4evel in all walks of life - social - political - economic - the tribal youth have the potency to rejuvenate their own group, as also the whole nation with us. 2.5
Application for the Tribal Products
The culture of any community is closely associated and assimilated with its history since time immemorial. Further the existence of culture is the base of history. Therefore, the craze to know the way of life of any community requires study and analysis of cultural history of that tribe deeply. If one thinks of cultural history of primitive tribes
832
S. Thangavel et al.
one must turn towards South Odissa, the hub of tribal. So, this Koraput region, the domain of tribal has become center of study and research. GappaGoshti is a mobility based platform particularly planned for countryside India in their native language. Each of its practicality can be approached via standardized keys (left, right, up, down, select, back) and a fewer numerical keys on a phone. Writing textual matter either in English or native language using phone keypad is not an easy task for a common uneducated or semi educated person with countryside background. GappaGoshti gives different services which are helpful for an ordinary man in country-style India. It consists of: • • • • •
social networking or micro-blogging. Weather forecasting in their place. News in the native language with respect to the user. Yellow Pages. Krushi Dyankosh- an agricultural knowledge base.
3 Literature Review The following section gives the overview of the literature based on the improvement of tribal Welfare community using android application. X. Qi proposes a method for user profiling in the form of vector based model. The user profiling is done based on Rocchio algorithm. The user profiling is used to store the personal information of the user and also to analyze the feedback. The process involves three steps namely data collection, profile initialization and updation. X. Wu et al. D. Kit et al. presents a high-fidelity geo-tagging approach for identifying the geographical locations. With the use of automated geo-tagging it is easy to find the current location. Since huge amount of data is posted, it becomes challenging to handle the data. This problem is overcome by using clustering techniques. This gives us a promising result while we share large number of photos or media with the locations. Limitation of the approach is that the location in remote areas cannot be tracked. Y. Liang, E. Kremic et al. suggest an Android based mobile application gives the user friendly environment. It helps the users to login using smart phones with android which is an open source platform. Client- server architecture is used to authenticate the user profile and to maintain the huge amount of data. The authentication and access control is also used for security purpose. Whenever the user is logging it will go clientserver architecture for verification of user information. G. Hirakawa et al, P. Pilloni et al. suggest that a Recommender system is used to filter the information from the vast amount of data using user profile. The recommender system is used for filtering the information about the users, based on their feedback about tribal food and cultural products. This customer feedback is displayed in social media platforms like Facebook twitter, etc. which helps to improve the tourist attraction in that particular location. The disadvantage of this approach is when the user’s preference for the social platform changes.
A Collaborative Mobile Based Interactive Framework
833
Wang et al. suggests that the data cubes for aiding the decision making statement using on-line analytical processing (OLAP) systems. For Illustrating the data elements the interactive visualized system presents a qualified one dimensional cuboid hierarchical tree structure and that represents the data cubes with the usage of two dimensional graphical icons. Users can also explore interactively with the two dimensional data in hierarchical levels. Daniel Pineo et al. executed for valuating and optimizing the visualization automatically with the use of a procedure model of person like vision. The methodology trust upon the neural network technique of early sensor activity process in retina and particular visual cerebral mantle. The neural activity resulted in the form of bionic view and then valuated to produce a measure of visualization powerfulness. Hill climbing algorithm is approached by applying the effectiveness measure and also by attaining the Visualization optimization. Use two visualization parameterizations for 2D flow methodology which can be of streaklet-based and pixel-based visualization. Yusuke Itakura proposed that the eatable products which are shown in stores are on the foundation of worker Judgment. An probe of departmental stores discovered the leading drawbacks. The guidance given by the main office for an employer is to exhibit the eatable products on the basis of the dispersion. Then the data acquirable on the computer storage is shortly unfit for replying customer enquiries. The actual System assorts the sales data accordant to the calendar and timezone. Data is needed regarding the involvement that are mentioned by the employers on the base of eatable display. If the weather condition changes then there will be drop-off in food sale. Anbunathan R et al. proposed an event based framework for designing the complex test cases in an android mobile. Android APK’s like test cases and test scheduler has been approached. In this Framework to test the testcases they use Scheduler algorithm and android components are used to build the testing. These test are executed in an android platform and this framework is developed for an android based mobiles Anbunathan R et al. proposed an automated test generator based on Data Driven Architecture framework. A Tool named Virtual Tool Engineer (VTE) is utilized for generating XML files. The Test cases are seizured in the kind of Sequence diagram. To reduce the test effort by separating the input case while working with the scenario like multiple test cases is an important approach. Zhi-fang LIU et al. proposed the application used in mobility software. In this Service Oriented Architecture (SOA) framework is approached for mobile application testing. Based on SOA framework, test platform has been built. The usage of SOA framework is used to solve the problems in mobile application testing N. Puspika et al. proposed the Sequence testing method. This testing method is used to cover all the flow of application. The Limitation of this testing method is that all the possible input flow should know to implement this method. So, the solution for this limitation is to create a model for this flow of application. The model for this problem is Colored Petri-net model (CPN). The CPN model is used to analyze the sequence of application easily and to generate the test case automatically. S. Cha et al. proposed a middle-ware framework which provides an effective disconnected tolerant network services. The various layer of the OSI model are enforced using mobility management. To maintain mobile services effectively during the mesh separation time the mobility devices and mobile intelligent server are required
834
S. Thangavel et al.
P. Chu et al. proposed a plug-in framework for the novel approach of enforcing the security policies. Existing enforcement system produces a poor result. So, the author proposes a new enforcement security policy based on an android smart phone applications Tian Zheng et al. proposed a framework to perform safety assessment for mobile application. This security assessment is done in both android and iOS platform. In this paper the security testing platform is approached to improve their security monitoring capabilities
4 Methodology Proposed The website and application will be a user friendly platform for the tribal communities and other users like the common public and government officials. It will work as a tribal repository which includes the details of tribal vegetation, social and demographic details of the tribes and tribal products which will be accessible for the people to buy in the platform itself. Promotion of the products is so important, in the website and app the digital promotion can be added. Sales of the tribal products would increase the economic condition of the tribes collecting the feedback from customers would help to develop further keeping aside the negatives. The devices are commonly used by rural and urban people.
A Collaborative Mobile Based Interactive Framework
835
We develop a mobile application for purchasing and marketing the tribal food products. The different framework are discussed below. User Profiling In this mobile based framework, we have two types of users. Firstly, Administrator who has the access to maintain the data and admin alone can the access the user level statistics and product level statistics. The administrator login can be accessed by groups like SHE (Self Helping Group), NGO’s etc., Administrator can view the user profile and the tribal product details. Users are the one who has the registered account in this mobile application mainly Tourists who visit tribal areas. The Users who are registered already will go under authentication and access control for validating the user name and password. New users have to fill in their personal information in the login form and after validation process their account will be created. Users can view, edit and update their profile any time. Product Details Users can purchase food products and also cultural products like pottery, paintings, crafted materials etc. using the mobile application. The payment can be done through ready cash or through on line method by using credit or debit card etc. Then user will automatically be taken to the customer feedback form to review the products. This in turn helps us to improve the economic status of the tribal people and reach information about them globally. The privileged user can approach the administrator for a discount on the purchase and the administrator can review the user’s participation to give some discount on their purchases. Client Server Architecture and Android We use Client server architecture in this project. Here the client is tourists and the server is administrator who run this organization for the welfare of tribal community. The client’s data are maintained in large databases. The mobile OS we use is android. Android is preferred as it gives us a user friendly environment Geo-tagging Geo-tagging approach is used to tag the location when users are sharing their photos and videos. Geo tags are used in social media like Facebook; twitter etc. to share the media and to update the status. Thereby it helps the tribal people and their location to be visible globally, which also helps their growth. So by this method we can attract a lot of other people around the world. Recommender System Recommender system is used to share the customer reviews via on-line shopping sites like amazon, snap deal etc. by using filtering method. Administrator alone can view the user level statistics and product level statistics which are calculated by Recommender system.
836
S. Thangavel et al.
5 Service Oriented Architecture Based Mobile Application Framework To Develop a mobile application we need some of the features that need to fulfill the client needs. The features of a mobile based application framework are discussed below 1. Simplicity of the user: It is a helpful user interface. It make the client side to use the application without any complexity. It gives a chance to perform any activity effortlessly and this wont make any changes in an application. Like this the mobile device reaches the customer. 2. Social area integration: It helps to share the information in a clearest and a profitable way and it makes to know about us by more people. Social media is the only way to reach our thought. This integration in the mobile device becomes a tool to fulfill their needs. 3. Feedback: To develop a mobile application we need customer suggestions. So, Feedback feature in an application is very useful. This feature will fulfill the user needs. 4. Off-line Work: The Work like shopping, payment etc; needs Internet connection. Rather than this applications like Notepad, Ms Office, Memo, etc; needn’t Internet connection so it can be used in off-line too. This feature is an added advantage to satisfy user needs. 5. Security: The mobile devices may have some personal information about the user. So there we need an application which built some privacy for the information. This the feature of security in a mobile application framework 6. Personalization options: This feature gives the user to make their own text formatting styles and adaptable settings option. It allows the application to look as they like. This option is also a way to reach customer needs.
6 Service Oriented Architecture (SOA) Based Testing Technique Z. Liu et al. describes the SOA based mobile application software running on the mobile phone with the operating system like android, windows, blackberry, iOS etc needs to be tested. Testing strategy are as follows. A. Test Designing: • The whole field of the mobile application should be taken by the Service oriented Examiners. • The request needs to be partitioned into separate services. • The mobile application structure needs to be reorganized into three components that includes Information, Services and front-end request. • The part needs to be examined, and concern scenarios should be explained clearly.
A Collaborative Mobile Based Interactive Framework
837
• The concern necessity needs to be categorized as common requirement and mobile application specific requirement. • An Approachability Matrix needs to be processed, and all the test needed to be traced to concern the necessity. B. Test Performance Approach: All of the service component needed to be proved. Service components are divided in to three components are as follows Integration Testing – It is a service oriented component and it should tested for authorizing the data flow through the data integrity and services. System Testing – It is a service based component and it can be tested to authorize the data flow in between the front-end request and information. Performance Testing – It is a service based component and it can be tested for fine tuning and performance that to be optimum. C. SOA Testing Methods: In SOA based Testing methods we are going to implement two methods 1. Service-Level Testing: It admits testing the components for practicality, usability, safety, action and ability. All service needs to be tested separately. 2. Regression testing: On the mobile application this method needs to be tested when there is double holding so as to enhance their stability and the accessibility of the method. The suite will be make over to cover the services which forms an essential form of an application. It can be reused for multiple holding D. SOA based Automated Testing Tools: There are number of SOA application testing tools. SOA Testing tools are chosen based on their execution and accurate results SoapUI: It’s a cost-free tool which is used for web testing. This tool is used for performing functional testing, performance testing and Load Testing Apache Jmeter: It’s a open source which is capable to examine the execution of soap invocation. Jprofiler: It is a testing tool which is used to detect the memory leak or to find the bottleneck performance etc. HP Service Test: It is a functioning tool which is also capable to support the UI Testing and for shared service testing.
7 Components and Structural Based SOA Framework The Service oriented architecture has following components which are discussed below. 1. Adapter-It is a software module for an application or system that allows access to its capabilities through a standard compliant device interface.
838
S. Thangavel et al.
2. Business Process modeling-It is a process that mapping out the business process in terms of what the human participants have to do and what the various application based on business process expected to do. 3. Enterprise Service Bus- It is the center to communicate all the services in service oriented architecture. It tends to be connectivity for various types of middle ware repository of meta data definitions, registries and interfaces of every kind. 4. Service Broker-It is a software in service oriented architecture that brings together the components associated with the rules. 5. Service Registry: The service registry is like a referential data services. The service registry contains the information about services and service definition interfaces and parameters. 6. Service Repository: A database for all Service oriented architecture software and components to keep the stuff in one place and configure the management. It is the medium for all record of polices attributes and processes.
8 Mobile Requirements A. Operating system: Operating System are of many categories. They are Android, IOS, Windows, Symbian, blackberry etc; In this case study, Android OS is going to be implemented. Android Operating System play a vital role in day today’s life. Android is an open source platform in which we develop so many mobile applications. Being Platform dependent the advantage here is off-line mobile applications can be created. B. Mobile Devices: There are two mobile devices such as Smart Phones and mobile phones. In this Study Smart Phones are going to be discussed. Using Smart Phones with Internet connection we can do payment in on line, Shopping in on-line and foods can be ordered. These are the usage of smart phones
9 Conclusion The world is changing into the technological era, majority of the population has either using any websites or mobile applications or using with the help of others. The tribal growth using these ICTs is the modern day changes which holistically change the living standard of the people. Tribal empowerment using technology can be set as a paradigm shift in the development of tribal lives. Uplifting of economic conditions, educational development, improving health and medical conditions and preservation of indigenous cultures. It also provides a way for holistic and sustainable development to tribal community. These developments in community, brings a democratic procedures, voluntary cooperation, self-development and participation of the members in the community. Technological advancement in our country can play crucial role in community development; same technology can be fully exploited for developing tribal community and their standard of living. The innovative ideas based on scientific
A Collaborative Mobile Based Interactive Framework
839
principle and technological application can transform the tribal community by improving economic conditions and preserving social and cultural values of indigenous community. Acknowledgments. We are thankful to the people who contributed their time and shared their knowledge for this research. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest. ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Vinay Kumar, A.B.: Information and communication technology for improving livelihoods of tribal community in India. Int. J. Comput. Eng. Sci. 3(5), 13–21 (2013) 2. Amber, G., Young, S.M.: Cultural Identity Restoration and Purposive Website Design: A Hermeneutic Study of the Chickasaw and Klamath Tribes. IEEE (2014) 3. Andhika, O.A.: Vege Application! Using Mobile Application to Promote Vegetarian Food. IEEE (2018) 4. Annamalai Narayanan, C.S.: apk2vec: Semi-supervised multi-view representation learning for profiling Android. In: International Conference on Data Mining. IEEE (2018) 5. Apurba Saikia, M.P.: Non-timber Forest Products (NTFPS) and their role in livelihood economy of the tribal people in upper Brahmaputra valley, Assam, India. Res. Rev. J. Bot. Sci. (2017) 6. Apurv Nigam, P.K.: Augmented Reality in Agriculture. IEEE (2011) 7. Dang, B.S.: Technology strategy for tribal development. Indian Anthropol. Assoc. 10(2), 115–124 (1980). https://www.jstor.org/stable/41919402 8. GoI. CENSUS OF INDIA 2011 (2011) 9. Jason Henderson, F.D.: Internet and E-commerce adoption by agricultural input firms. Rev. Agric. Econ. 26(4), 505–520 (2004). https://www.jstor.org/stable/3700794 10. Kushal Gore, S.L.: GappaGoshti™: Digital Inclusion for Rural Mass. IEEE (2012) 11. Liu Yezheng, D.F.: A novel APPs recommendation algorithm based on apps popularity and user behaviours. In: First International Conference on Data Science in Cyberspace. Changsha, China. IEEE (2016) 12. Billinghurst, M.: Augmented Reality in Education. Seattle, USA (2002). http://www. newhorizons.org 13. Manisha Bhende, M.M.: Digital Market: E-Commerce Application. IEEE (2018) 14. Murtaza Ashraf, G.A.: Personalized news recommendation based on multi-agent framework using social media preferences. In: International Conference on Smart Computing and Electronic Enterprise. IEEE (2018) 15. Nor Aniza Noor Amran, N.Z.: User Profile Based Product Recommendation on Android Platform. IEEE, Kuala Lumpur, Malaysia (2014). https://doi.org/10.1109/icias.2014. 6869557 16. Radhakrishna, M.: Starvation among primitive tribal groups. Econ. Polit. Wkly 44(18), 13– 16 (2009). https://www.jstor.org/stable/40278961
840
S. Thangavel et al.
17. Ramneek Kalra, K.K.: Smart market: a step towards digital India. In: International Conference on Computing and Communication Technologies for Smart Nation (IC3TSN). IEEE, Gurgaon, India (2017) 18. Sharanyaa, S., Aldo, M.S.: Explore places you travel using Android. In: International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT). IEEE (2016) 19. Lien, S.F., Wang, C.C., Su, J.P., Chen, H.M., Wu, C.H.: Android platform based smartphones for a logistical remote association repair framework. Sensors 14(7), 11278– 11292 (2014) 20. Takumi Ichimura, I.T.: Affective Recommendation System for Tourists by Using Emotion Generating Calculations. IEEE, Hiroshima (2014) 21. Twyman, C.: Livelihood opportunity and diversity in kalahari wildlife management areas, Botswana: rethinking community resource management. J. South. Afr. Stud 26(4), 783–806 (2000). https://www.jstor.org/stable/2637571 22. Liang, Y.: Algorithm and implementation of education platform client/server architecture based on android system, In: 2018 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS), Xiamen, 2018, pp. 251–253. https://doi.org/10.1109/ icitbs.2018.00071 23. Has, M., Kaplan, A.B., Dizdaroğlu, B.: Medical image segmentation with active contour model: smartphone application based on client-server communication. In: 2015 Medical Technologies National Conference (TIPTEKNO), Bodrum, pp. 1–4 2015.https://doi.org/10. 1109/tiptekno.2015.7374546 24. Kremic, E., Subasi, A., Hajdarevic, K.: Face recognition implementation for client server mobile application using PCA. In: Proceedings of the ITI 2012 34th International Conference on Information Technology Interfaces, Cavtat, pp. 435–440 (2012). https://doi. org/10.2498/iti.2012.0455 25. Kordopatis-Zilos, G., Papadopoulos, S., Kompatsiaris, I.: Geotagging text content with language models and feature mining. Proc. IEEE 105(10), 1971–1986 (2017). https://doi. org/10.1109/JPROC.2017.2688799 26. Wu, X., Huang, Z., Peng, X., Chen, Y., Liu, Y.: Building a spatially-embedded network of tourism hotspots from geotagged social media data. IEEE Access 6, 21945–21955 (2018). https://doi.org/10.1109/ACCESS.2018.2828032 27. Kit, D., Kong, Y., Fu, Y.: Efficient image geotagging using large databases. IEEE Trans. Big Data 2(4), 325–338 (2016). https://doi.org/10.1109/tbdata.2016.2600564 28. Pineo, D., Ware, C.: Data visualization optimization via computational modeling of perception. IEEE Trans. Vis. Comput. Graph. 18(2), 309–320 (2012). https://doi.org/10. 1109/TVCG.2011.52 29. Wang, X., Yi, B.: The application of data cubes in business data visualization. Comput. Sci. Eng. 14(6), 44–50 (2012). 10.1109/mcse.2012.17 30. Hirakawa, G., Satoh, G., Hisazumi, K., Shibata, Y.: Data gathering system for recommender system in tourism. In: 2015 18th International Conference on Network-Based Information Systems, Taipei, pp. 521–525 (2015). https://doi.org/10.1109/nbis.2015.78 31. Yuan, N.J., Zheng, Y., Zhang, L., Xie, X.: T-finder: a recommender system for finding passengers and vacant taxis. IEEE Trans. Knowl. Data Eng. 25(10), 2390–2403 (2013). https://doi.org/10.1109/TKDE.2012.153 32. Pilloni, P., Piras, L., Carta, S., Fenu, G., Mulas, F., Boratto, L.: Recommender system lets coaches identify and help athletes who begin losing motivation. Computer 51(3), 36–42 (2018). https://doi.org/10.1109/MC.2018.1731060 33. Hijikata, Y., Okubo, K., Nishida, S.: Displaying user profiles to elicit user awareness in recommender systems. In: 2015 IEEE/WIC/ACM International Conference on Web
A Collaborative Mobile Based Interactive Framework
34. 35.
36.
37.
38.
39.
40.
41.
42. 43. 44.
45.
46.
47.
48.
49.
50.
841
Intelligence and Intelligent Agent Technology (WI-IAT), Singapore, pp. 353–356 (2015). https://doi.org/10.1109/wi-iat.2015.83 Leung, K.W., Lee, D.L.: Deriving concept-based user profiles from search engine logs. IEEE Trans.Knowl. Data Eng. 22(7), 969–982 (2010). https://doi.org/10.1109/TKDE.2009.144 Xie, C., Cai, H., Yang, Y., Jiang, L., Yang, P.: User Profiling in elderly healthcare services in china: scalper detection. IEEE J. Biomed. Health Inf. 22(6), 1796–1806 (2018). https:// doi.org/10.1109/JBHI.2018.2852495 Liang, H., Mu, R.: Research on humanization design based on product details. In: 2008 9th International Conference on Computer-Aided Industrial Design and Conceptual Design, Kunming, pp. 32–34 (2008). https://doi.org/10.1109/caidcd.2008.4730513 Itakura, Y., Minazuki, A.: A study of the support system for displaying food products in convenience stores. In: 2010 IEEE/ACIS 9th International Conference on Computer and Information Science, Yamagata, pp. 421–426 (2010). https://doi.org/10.1109/icis.2010.15 Anbunathan, R., Basu, A.: Automation framework for test script generation for Android mobile. In: 2017 2nd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT), Bangalore, pp. 1914–1918 (2017) Nidagundi, P., Novickis, L.: New method for mobile application testing using lean canvas to improving the test strategy. In: 2017 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT), Lviv, pp. 171–174 (2017) Pandey, A., Khan, R., Srivastava, A.K.: Challenges in automation of test cases for mobile payment apps. In: 2018 4th International Conference on Computational Intelligence and Communication Technology (CICT), Ghaziabad, pp. 1–4 (2018) Murugesan, L., Balasubramanian, P.: Cloud based mobile application testing. In: 2014 IEEE/ACIS 13th International Conference on Computer and Information Science (ICIS), Taiyuan, pp. 287–289 (2014) Gao, J., Bai, X., Tsai, W., Uehara, T.: Mobile application testing: a tutorial. Computer 47(2), 46–55 (2014) Unhelkar, B., Murugesan, S.: The enterprise mobile applications development framework. IT Professional 12(3), 33–39 (2010) Cha, S., Du, W., Kurz, B.J.: Middleware framework for disconnection tolerant mobile application services. In: 2010 8th Annual Communication Networks and Services Research Conference, Montreal, QC, pp. 334–340 (2010) Bernaschina, C., Fedorov, R., Frajberg, D., Fraternali, P.: A framework for regression testing of outdoor mobile applications. In: 2017 IEEE/ACM 4th International Conference on Mobile Software Engineering and Systems (MOBILESoft), Buenos Aires, pp. 179–181 (2017) Chu, P., Lu, W., Lin, J., Wu, Y.: Enforcing enterprise mobile application security policy with plugin framework. In: 2018 IEEE 23rd Pacific Rim International Symposium on Dependable Computing (PRDC), Taipei, Taiwan, pp. 263–268 (2018) Samuel, O.O.: MobiNET: a framework for supporting Java mobile application developers through contextual inquiry. In: 2009 2nd International Conference on Adaptive Science and Technology (ICAST), Accra, pp. 64–67 (2009) Reed, J.M., Abdallah, A.S., Thompson, M.S., MacKenzie, A.B., DaSilva, L.A.: The FINS framework: design and implementation of the flexible internetwork stack (FINS) framework. IEEE Trans. Mob. Comput. 15(2), 489–502 (2016) Zheng, T., Jianwei, T., Hong, Q., Xi, L., Hongyu, Z., Wenhui, Q.: Design of automated security assessment framework for mobile applications. In: 2017 8th IEEE International Conference on Software Engineering and Service Science (ICSESS), Beijing, pp. 778–781 (2017) Qin, X., Luo, Y., Tang, N., Li, G.: Deep eye: automatic big data visualization framework. Big Data Min. Anal. 1(1), 75–82 (2018). https://doi.org/10.26599/BDMA.2018.9020007
842
S. Thangavel et al.
51. Gautam, K.S., Senthil Kumar, T.: Video analytics-based intelligent surveillance system for smart buildings. In: Proceedings of the International Conference on Soft Computing Systems, pp. 89–103. Springer, New Delhi (2016). Soft Computing: 1-25.perceptron 52. Gautam, K.S., Senthil Kumar, T.: Discrimination and detection of face and non face using multi-layer feed forward perceptron. In: Proceedings of the International Conference on Soft Computing Systems, (Scopus Indexed). AISC. vol. 397, pp. 89–103. Springer, ISSN No 2194-5357
IoT Cloud Based Waste Management System A. Sheryl Oliver1(&), M. Anuradha1, Anuradha Krishnarathinam1, S. Nivetha1, and N. Maheswari2 1
Department of CSE, St. Joseph’s College of Engineering, Chennai, India [email protected] 2 SCSE, VIT University, Chennai, India
Abstract. IoT based waste management system refers to the collection of waste in a smarter way using modern technologies. Whenever waste is dumped into the smart bin, the infrared sensors senses the amount of waste present inside it. This concept is achieved by using a programmable board called Arduino UNO which can be plugged in to any computer interface. The bin is partitioned into three levels where an infrared transmitter and receiver are placed at each level. Whenever waste is dumped inside the percentage of bin that is filled is tracked. The volume is determined by using Ultrasonic sensors. An infrared transmitter and receiver are positioned near the base to check whether the system is functioning properly or not. If the waste reaches level 1 then it means 30% of the bin is filled and if waste reaches level 2 then it means 60% of the bin is filled. In order to track the amount of waste collected the infrared sensors are monitored by Arduino UNO. The results are output in an IoT cloud platform called ThingSpeak. After the smart bin is 90% filled, a message is sent to the municipal corporation. This is achieved by transmitting information through web application programming interface embedded to the IoT cloud platform. The smart bins will have Ethernet module embedded with Arduino. This enables the system to access the network. The entire system is programmed in such a way that whenever the volume of waste collected reaches a certain level, the municipal corporation is notified. Keywords: Ultrasonic sensors Arduino Ethernet module IoT cloud platform Waste management system Sustainability Smart bins
1 Introduction Waste is defined as unwanted or useless materials that are unfit for our consumption or usage. Waste Management is the field which requires more concern in our society. In a country like India where the population grows by 3–3.5% every year managing waste surely possess a problem. As the population increases the amount of waste generated also rises considerably. The inevitable truth is that any kind of waste is hazardous to the society. The surroundings are affected if waste is not segregated and disposed properly. Natural elements such as air and water have higher probability of being affected by waste. Harmful gases released by waste affects the air whereas industrial wastes that are dumped into river affects the water bodies proportionately. The most peculiar problem which we notice in our daily life is lack of proper waste management. If not handled © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 843–862, 2020. https://doi.org/10.1007/978-3-030-37218-7_91
844
A. S. Oliver et al.
properly, it may spoil the ambience of the environment. Also, lack of proper waste collection means the dumped waste tends to overflow the bins. Not only they spread diseases but also serves as breeding ground for mosquitoes. The act of collecting waste effectively have provoked thoughts in many for ages. Therefore, lessening the impact of waste disposal is the need of the hour. This project helps in maintaining the environment clean. The concept of smart cities is in a budding stage though our Prime Minister initiated the idea a few years back. He insisted upon the idea of building smart cities all over India. With increasing number of cities, the due responsibilities also increase. Cleanliness can be achieved by disposing waste properly. This can be ensured by implementing the concept of IoT based waste collection system. The major issue faced by the current waste management system is the inability to outline a healthy status for bins. By implementing this project, the drawback can be mitigated. With the internet gaining momentum, human interaction with real life objects and machines will make the world even more interesting. As a result of high demand in computing researching focused beyond connecting computers. This sensational interest lead to the concept of Internet of Things. Here, Machine to Machine and Human to Machine interaction can take place with ease. Despite the invention of this concept years back, this field is still in the developing stage as far as technology is concerned. Industries such as home automation and transportation are rapidly growing with Internet of things. High speed active internet connection is essential for Internet of Things. This technology can be defined as the connection between humans and other devices. Every single event and equipment in our daily life can be tracked using the concept of IoT. Most of the processes are carried out with sensor deployed systems in which the signals are converted into data and these are sent to a centralized device connected to the network. The main advantage of the Internet of Things is that the events take place automatically by eradicating the need for manual intervention. Thus, IoT is revolutionizing the field of science and technology today. By employing this concept, the project aims at achieving its objectives such as cost savings and efficiency. Our country has a problem with disposing of wastes because of the voluminous deposits. About millions of tons of wastes are being disposed of every day. The sources for these waste includes industries, workplaces and houses. The strategies used are: • House to House collection • Drive in Service for collection • Group based collection and Colony bins Table 1. Types of waste collection along with its sustainability indicators No Sustainability indicators 1 2 3 4 5
Area improvement People convenience Staff convenience Handling extra waste from festival Frequency and reliability
Door to door ✓ ✓ X X
Curb site ✓ X ✓ ✓
Block collection ✓ X ✓ ✓
Community bins X X X ✓
X
✓
X
✓
IoT Cloud Based Waste Management System
845
Among these, the colony bins proved to be the most common and effective way of collecting waste. But this kind of organizing waste is not very enthralling. Most often the waste tends to overwhelm the bins (Table 1). This situation leads to problems like: • How the waste can be disposed safely? • How the recycling factor can be improved? This work provides effective way of solving the problems as stated above. • Cutting down on fuel costs. • Automated alerts to the central municipal server.
2 Related Work Our Cloud-based Garbage Alert System is implemented by using the concept of Internet of Things and the solution is programmed with the help of Arduino UNO. Ultrasonic sensors are utilized to track the volume of bin filled. Using this input, the conditions are framed in such a way that when the bin is almost full the municipal web server will be notified. Once the alert is sent to the municipal web server, the cleaning process takes place after verification. This is achieved by means of using Radio Frequency Identification tags which confirms the status of the cleaning process. In order to optimize the routes that are used by the garbage collectors, Dijkstra’s algorithm is used. This algorithm is implemented to optimize the routes taken and subsequently cut down on fuel costs. Smart bin: Smart Waste Management System is a wireless sensor network-based system and is used for data monitoring and delivery. Here, the duty cycle of bins is optimized along with other operational units. This is done so as to reduce the amount of power consumed and also to maximize the operational time. Solid Waste Monitoring System is developed using the integration of technologies such as Geographical Information System (GIS), Radio Frequency Identification System (RFID), General Packet Radio Service (GPRS) and Global Positioning System (GPS). The main objective of this project is to enhance the efficiency in terms of responding to the enquiries of the customers especially in case of emergency situations. The RFID tags are embedded in the trucks which are used for acknowledging the wastes after collection. Trucks are tracked using the GPS module which provides its position in concern with its latitudinal and longitudinal values. The bin levels are monitored continuously and once it reaches a certain threshold, a message is sent to the web server of the Municipal corporation which in turn mapped to the Geographical Information System. The Geographical Information System also contains a camera to visualize the type of waste dumped in the bins.
846
A. S. Oliver et al.
3 Problem Definition
Fig. 1. A municipal garbage bin turning into a wasteland
The following are some of the problems which are faced by our community (Fig. 1): When waste is not collected, they tend to overflow the drains. This not only causes flooding but also spreads a variety of diseases. • Generally, trash bins are open all the time. This may fascinate the cattle which are in need of food. Products like milk obtained from those cattle can have a deleterious effect on human beings. • Insects like mosquitoes and flies tend to survive in such waste accumulated environments. This causes lots of diseases. • Waste filled environments may provide shelter to rats. They spoil the environment by spreading diseases. Additionally, they bite electric cables causing inconvenience. • Burning solid waste in open air leads to air pollution. It is a known fact that ozone layer is fast depleting. This reason will make the situation worse. • Waste when left uncollected will make the environment unclean. This reduces the aesthetic value of the environment. • Greenhouse gases like Methane and Carbon dioxide are emitted when waste is left unattended for longer time.
4 System Architecture Smart bins use infrared sensors to detect the presence of waste that is dumped inside and thereby calculating the volume of waste filled in. Technically speaking, the infrared sensors senses each time when the waste is dumped inside the bin. Infrared sensors play the role of obstacle detection. Here the infrared sensors are placed at various levels of the bin. The sensors at the bottom ensure the system is functioning as intended. This is based upon the fact that the waste will settle at the bottom first. Depending on the percentage of waste filled and conditions specified, the information is shared with the concerned department (Fig. 2).
IoT Cloud Based Waste Management System
847
Fig. 2. The automated waste collection system.
4.1
Design of System Architecture
Fig. 3. Design of System Architecture
In this system, we have used both infrared and ultrasonic sensors for two reasons (Fig. 3). Commonality: Both the sensors are used to detect the waste if they are thrown inside the bin. As a matter of fact, ultrasonic sensor measures volume of the entire bin (distance) whereas IR sensors will output binary values (true or false). Though they both serve the same purpose, ultrasonic sensors are used to address reliability issues. To make the output more accurate, we use both IR and ultrasonic sensors to cross-verify the output values. Per say one of the IR sensors is damaged, then ultrasonic sensor readings will help to understand deviation thereby stopping the system from malfunctioning. Infrared sensors are used to measure waste level by level. Typically, they are used to fix levels for bin (0%, 30%, 60%, 90%). The levels are determined depending upon the dimensions (height) of bin with 1:3 ratio for every level. For example: If height of the bin is (say) 18 cm then 18/3 = 6 cm is the difference between each level. In other words, IR sensors at level 1 would be 6 cm from base of the bin. Level 2 sensor at 12 cm from base. Level 3 sensor at 18 cm from base. The height values are for illustration purposes only. The dimensions of the bin are not part of this research. (It can be any value). The IR sensor measuring 0% is on the right side which shows true value if no waste is filled.
848
A. S. Oliver et al.
30%, 60% and 90% determines urgency of notification. If it’s in between 60%– 90% then municipal can plan accordingly to collect waste. Ultrasonic sensors measure the volume of waste inside the bin. This sensor is used to detect the presence of waste. The Arduino UNO microcontroller is programmed to interface the sensors with the ThingSpeak cloud platform. Power Supply is provided by connecting laptop to the Arduino Board. • Arduino UNO: Programming kit that stores the program • Power supply: Supply power for the programming kit so that it can function without interruption • IoT web server: Send sensor (data values) via application programming interface to cloud platform. Acts as a medium that communicates between hardware and cloud platform (Fig. 4). 4.2
Design of Garbage Bin
See Fig. 4
Fig. 4. Design of trash bin
5 Top View
Fig. 5. Representing front view design of the bin using 3D Builder
IoT Cloud Based Waste Management System
849
One wall of the bin is partitioned into three levels and a pair of infrared sensors (transmitter and receiver) present at each level of the bin. These sensors are placed at fixed heights so that accurate measurements can be made. One pair of infrared sensors are placed at the base level to ensure that no malfunction takes place. The bin should be constructed in such a way that it can withstand harsh conditions like rain. In a largescale implementation it is suggested that a bolted wall can be raised so that the electronic circuitry can be concealed (Figs. 5, 6, 7, 8). 5.1
Front View
See Fig. 6
Fig. 6. Representing top view of the bin using 3D Builder
5.2
Bottom View
See Fig. 7
Fig. 7. Representing bottom view of the bin using 3D Builder
850
5.3
A. S. Oliver et al.
Side View
See Fig. 8
Fig. 8. Representing side view of the bin using 3D Builder.
6 Module Design The module design is explained through a flowchart and is shown in Fig. 9. The components used for designing the smart bin are given below. The module design consists of three phases: Connection phase: To make the hardware interact with the cloud platform, the code programmed in Arduino kit is connected to Internet via Ethernet port that communicates via DHCP. This helps in dynamically connecting to all IoT smart bins that are available as a neighbor in the network. Collection phase: When waste is thrown inside the thrash, the readings are recorded by IR sensors to observe if waste was thrown or not. If thrown, the different level sensor values are considered. Else IR sensor at the base level (0%) will show true value that indicates that the bin is empty Verification phase: To address reliability issues, the values recorded by ultrasonic sensor is used for verification. This ensures that the IR sensors values sync with ultrasonic sensor values and there is no malfunctioning in the system. For example: If IR sensor at 0% shows true value (Meaning bin is empty) but ultrasonic sensor shows 10 cm as output (Meaning bin is filled up to some level) then it means that output is contradictory. So we can predict a system malfunction and stop the program from disrupting the automated system. Once verification phase is successful, notification is sent to municipal corporation.
IoT Cloud Based Waste Management System
Fig. 9. .
6.1
Ethernet Module ENC28J60
851
852
A. S. Oliver et al.
ENC28J60 Stand-Alone Ethernet Controller IC features a host of features to handle most of the network protocol requirements. This Ethernet LAN module connects directly to almost all microcontrollers. 6.2
Arduino Uno
Arduino/Genuino Uno is a microcontroller board based on the ATmega328P (datasheet). It has 14 digital input/output pins (of which 6 can be used as PWM outputs), 6 analog inputs, a 16 MHz quartz crystal, a USB connection, a power jack, an ICSP header and a reset button. 6.3
Ultrasonic Sensor
The HC-SR04 ultrasonic sensor uses SONAR to determine the distance to an object like bats do. It offers excellent non-contact range detection with high accuracy and stable readings in an easy-to-use package. Its range lies between 2 cm to 400 cm.
IoT Cloud Based Waste Management System
6.4
853
Infrared Sensor
IR transmitter employs a LED to produce light waves of infrared wavelength. The receiver uses a specific light sensor to detect light waves of the infrared spectrum. When an object is close to the sensor, the light waves bounce off the object into the light sensor. This results in a large jump in the intensity, which we already know can be detected using a threshold.
7 Implementation 7.1
Pseudo Code
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
Declare the pin numbers. Initialize the pin to corresponding input and output channels of the board. Initialize Ethernet connection. Get mac address and check with Ethernet buffer If mac address matches the buffer then setup DHCP connection. Initialize echo pin as input and trigger pin as output Read the pulse of Echo pin and store it in a variable “duration” Perform distance = (duration/2)/29.1 to calculate the distance in centimeters. Write the ultrasonic sensor values to the stash using web API Key. Read the digital pins as input for infrared sensor If IRSensor1 == TRUE then Print “Obstacle from sensor 1. Level 1 is full” If IRSensor2 == TRUE then Print “Obstacle from sensor 2. Level 2 is full” If IRSensor3 == TRUE then Print “Obstacle from sensor 3. Level 3 is full”
(11) Write the Infrared sensor values to the stash using Web API Key (Figs. 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20).
854
A. S. Oliver et al.
Fig. 10. Connecting Arduino pins to Ethernet module (ENC28J60)
Fig. 11. Connecting ultrasonic sensor with arduino board
Fig. 12. Connecting IR sensor with Arduino UNO
IoT Cloud Based Waste Management System
8 Experimental Results 8.1
Ethernet Module
See Fig. 13
Fig. 13. Output displayed by serial monitor enabling internet access.
Fig. 14. Output displayed in IoT cloud platform “ThingSpeak”
855
856
A. S. Oliver et al.
Fig. 15. Distance displayed by serial monitor using ultrasonic sensor values.
Fig. 16. Graph of sensor values displayed in “ThingSpeak”
IoT Cloud Based Waste Management System
Fig. 17. Infrared sensor values in “ThingSpeak”
9 Working Model See Figs. 18, 19, 20
Fig. 18. Behind the bin
857
858
A. S. Oliver et al.
Fig. 19. Top view
Fig. 20. Front view
10 Conclusion and Future Enhancements This work describes a model using an automated system which reduces manual labour. Unlike the traditional smart bins manufactured by green tech life, it requires minimal effort to keep the environment clean. Though garbage bin is present a tidy environment cannot be guaranteed because of the lack of proper waste management. Issues like proper waste management and segregation are what our project aims to overcome. The smart bins located across the city can be monitored by municipal office remotely and necessary action can be taken whenever required. Hence, we conclude that employing modern technologies for collecting waste will not only benefit the public but also
IoT Cloud Based Waste Management System
859
economical in the long run. In the current scenario, nearly 1 lakh rupees is invested for setting up a single garbage bin whereas the smart bins setup does not even sum to onetenth of its total cost. In a large-scale implementation, many harmful gases may be emitted by the wastes in garbage’s hence use of gas sensors to detect such gases and use of solar panels for power supply lies in the part of the future enhancements of this work. Compliance with ethical standards ✓ All authors declare that there is no conflict of interest ✓ No humans/animals involved in this research work. ✓ We have used our own data (Figs. 21, 22, 23, 24, 25, 26).
Appendix Sample Screens See Figs. 21, 22, 23, 24, 25, 26
Fig. 21. Home page of the website
Fig. 22. About page of the website
860
A. S. Oliver et al.
Fig. 23. Check status page of the website
Fig. 24. Check status page showing Ethernet connection values
IoT Cloud Based Waste Management System
861
Fig. 25. Status page of website showing values of IR sensors
Fig. 26. Contact form page of the website
References 1. Kumar, N.S., Vijayalakshmi, B., Prarthana, R.J., Shankar, A.: IoT based smart garbage alert system using arduino UNO. In: 2016 IEEE Proceedings of the International Conference on Region 10 Conference (TENCON) (2016) 2. Mamun, M.A. A.I., Hannan, M.A., Hussain, A.: Real time solid waste bin monitoring system framework using wireless sensor network 3. Folianto, F., Low, Y.S., Yeow, W.L.: Smartbin: smart waste management system 4. Catania, V., Ventura. D.: An approach for monitoring and smart planning of urban solid waste management using smart-M3 platform 5. Arebey, M., Hannan, M.A. Basri, H., Begum, R.A., Abdullah, H.: Solid waste monitoring system integration based on RFID, GPS and camera 6. Ali, M.L., Alam, M., Rahaman, M.A.N.R.: RFID Based E-monitoring system for municipal solid waste management
862
A. S. Oliver et al.
7. Longhi, S., Marzioni, D., Alidori, E., Di Bu, G.: Solid waste management architecture using wireless sensor network technology 8. Huang, Y.F., Baetz, B.W., Huang, G.H., Liu, L.: Violation analysis for solid waste management systems: an interval fuzzy programming approach. J. Environ. Manage. 65, 431–446 (2002) 9. Bashir, A., Banday, S.A., Khan, A.R., Shafi, M.: Concept, design and implementation of automatic waste management system 10. Hong, I., Park, S., Lee, B., Lee, J., Jeong, D., Park. S.: IoT-based smart garbage system for efficient food waste management 11. Daniel, V., Puglia, P.A., Puglia, M.: RFID: a guide to radio frequency identification, Technology Research Corporation 12. Flora, A.: Towards a clean environment: a proposal on sustainable and integrated solid waste management system for university Kebangsaan Malaysia. Report from Alam Flora (2009) 13. Sinha, T., Kumar, K.M., Saisharan, P.: Smart Dustbin. International J. Ind. Electron. Electr. Eng. 3(5), 2347–6982 (2015) 14. Visvanathan, C., Ulrich, G.: Domestic solid waste management in south asian countries – a comparative analysis, 3 R South Asia Expert Workshop, Kathmandu, Nepal. (all reference are very old New latest reference may be added after year 2014) (2006) 15. Hannan, M.A., Arebey, M., Basri, H.: Intelligent solid waste bin monitoring and management system. Aust. J. Basic Appl. Sci. 4(10), 5314–5319 (2010) 16. Oliver, A.S., Maheswari, N.: Identifying the gestures of toddler, pregnant woman and elderly using segmented pigeon hole feature extraction technique and IR-threshold classifier. Indian J. Sci. Technol. 9(39) (2016). https://doi.org/10.17485/ijst/2016/v9i39/85578 17. Anuradha, M., Oliver, S.A., Justus, J.J.: IOT based monitoring system to detect the ECG of soldiers using GPS and GPRS. Biomed. Res. 29(20) 2018
Vision Connect: A Smartphone Based Object Detection for Visually Impaired People Devakunchari Ramalingam(&), Swapnil Tiwari, and Harsh Seth Department of Computer Science Engineering, SRMIST, Chennai, Tamil Nadu, India [email protected], [email protected], [email protected]
Abstract. Visual impairments have long afflicted common people. Visual aids have also been around for a long time to help blind people to overcome the challenges and difficulties they face in everyday life. The invention of the Braille script signifies the result of incessant effort to help them lead normal, selfdependent lives. Thus, a lot of research has been done and money has been spent on various techniques to improve on these existing technologies. The proposed technique is a system for visually impaired people which not only detects obstacles for them in real-time but also helps them find various objects of their need around them. This technique is an integration of a smartphone application with an object recognition system. The object recognition module is implemented using Tensorflow. The smartphone camera is used to detect objects and the feedback is given in the form of speech output. The accuracy of the model is 85% which is higher as compared to an existing smartphone based systems using YOLO. Keywords: Object detection assisted
Visually impaired Smartphone Speech
1 Introduction The Blind People’s Association, India, released a report that put the fraction of Indian people who are visually impaired at nearly 1.9% [1]. Millions of people in this world live with some form of visual affliction that hinders them from leading normal, fruitful or even independent lives. Visual challenges induce social awkwardness and low self confidence among them, and they are often alienated from their peers at a young age. Consequently, such people require constant care and help them going out and around, and often with daily routines. With time some people learn to walk with sticks but it is also just to make sure that they are not stepping on or colliding with something. The past few years have witnessed the emergence of new technologies [2] to help visually impaired people, but most of the devices launched are expensive. The economically weak have fewer options, and fewer still are reliable. Recent advances in Information Technology (IT), and mobile technology especially, have made it possible for people with disabilities like visual impairment to lead better, more comfortable lives using IT-based assistive technologies. Technology has, © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 863–870, 2020. https://doi.org/10.1007/978-3-030-37218-7_92
864
D. Ramalingam et al.
in some measure, and can, significantly, help make individuals more capable of participating in social activities and leading normal, independent lives. The domain of ITbased assistive technologies, and the range of assistance they can possibly provide, is immense. Of late, ubiquitous computing, a model of human-computer interaction designed to suit the natural human environment, has seen a rising trend. With truly ubiquitous computing, a user needn’t even be aware that they’re using Information Technology for everyday tasks. Mobile assistive technologies are the result of numerous efforts to deliver assistance to disabled people and intelligently applied ubiquitous computing. Mobile phones are cheaper, portable, lightweight, and easier to operate; aids delivered via phones will easily permeate into the general population, being devoid of the stigma against the more traditional aids. The proposed model is build upon this advantage with the aim of helping the visually impaired. The application will inform the user about the environment irrespective of whether it’s indoor or outdoor without having to use any sensor or chip.
2 Related Work The After the innovation of The Voice in 1974 [3], major efforts were directed towards helping visually impaired people become more and more self-dependent, so that they need not rely on a companion for seemingly simple tasks like crossing the road or finding an object. This particular problem has attracted a lot of attention ever since, both in the public and scientific spheres. The technological advancements that have been made to alleviate the effects of visual impairment have all sought to make the afflicted more capable of perceiving their immediate environment. Real time detection system [4] involved the usage of a camera to detect text written or printed on objects. They used MSER to extract the most stable features and localized using OCR. The smartphonebased system [5] used a smartphone to take the input in the form of images as well as give the output in the form of audio and a server where the processing will be done. (a) The Voice The vOICe was developed at the University of Birmingham by Leslie Kay in 1974. The vOICe is an image to audio device which captures an image from a webcam. The capital letters in vOICe mean ‘Oh, I see!’. (b) The Vibe The Vibe [6] is a device similar to the vOICe. It is also a visuo-auditory device which employs a wide angle camera which is placed in front of the user. (c) Martinez et al.‘s approach Recently, an approach similar to the vOICe was developed by Martinez et al. [7]. It is also an object recognition device which converts image to speech information. It has a different image processing method as compared to the vOICe. The image is scanned from the center to the left and then to the right. The Fig. 1 is an example of an image being scanned by the Martinez et al’s approach.
Vision Connect: A Smartphone Based Object Detection for Visually Impaired People
865
Fig. 1. Martinez et al.‘s approach
(d) Prosthesis Substituting Vision by Audition [PSVA] PSVA [8] was the result of a combination of the retina present in human eyes and the human cochlea in our ears. The combination used an inverse representation of the cochlea. This model has pixelated representations of the retina and the fovea. (e) Features extraction SIFT [9–11] was used to match localized key points of same objects captured in different scenes. SURF [12] is faster than SIFT in the localization process of key points. It maintains the image properties.
3 Proposed Model – Vision Connect The proposed model uses a smartphone-based application to help the visually impaired find day to day objects as well as avoid any obstacle in real time without being dependent on someone else. The application will use the back camera of the smartphone and capture images of the objects in front of it. These images will be used by the image recognition module and the detection of the same will be performed. The final output will be given in the form of speech feedback to the user. The results indicate that people with visual impairments can gauge the direction to an object in their path, as well as know what the object is. This helps them gain a more comprehensive understanding and awareness of their immediate environment. A visually impaired person might have a very time finding his way around or looking for some object even avoiding obstacles. Keeping this in mind, we focused on the day to day object detection in real time. If either the user is new to a particular environment or he is trying to find something, assistance from the application can be very well used. 3.1
Smartphone Application
The development of app was done for Android devices. It is one of the leading OSs for smartphones these days. Since most of the affordable smartphones have an Android operating system, we have started the development with Android for now. The mobile
866
D. Ramalingam et al.
app was written in the Java Language using the software Android Studio. The application deals with various requirements and settings for the app to work on the smartphone. A Tensorflow trained model was used for object recognition. The processing was done using Google’s Tensorflow API.
Fig. 2. Layout of Vision Connect - Speech to object Detection model
The layout of the developed android application is depicted in Fig. 2. It implements all these features when used. It will perform a hardware check and request for permissions like camera and storage. It will generate logs. It will set the back facing lens of the camera. It will also keep the screen orientation in check. When the user opens the application, it accesses the back facing lens of the mobile to capture the images. The objects in front of the lens will be continuously captured by the app and the processing will be done. Upon recognition of the object, the speech feedback is given in the form of the output. The speech output conveys the name of the object or obstacle followed by the relative position of it with respect to the camera lens. Along with the type of the object this also helps give the visually impaired person an idea of the detected object’s location. Moreover, the bounding box around the detected object will also have the name of the object with the probability. 3.2
Image Recognition
The aim of the app is to detect objects and obstacles in the environment around the visually impaired and the blind people to help them in their day to day activities without being dependent on anyone else or relying on touch sensations which are neither accurate and can even turn out to be pretty dangerous for them. Object detection by image recognition is the heart of the system with the objective to assist the targeted user base. Image recognition is the process of mapping the set of pixels from an image to desired objects for which the ground truths have been provided. This pixel mapping is done by feeding the various pixel sets of the image in a neural network and then performing various convolutions on those pixels which provide us with the desired pixels for detecting the object. We perform image recognition in our model on the
Vision Connect: A Smartphone Based Object Detection for Visually Impaired People
867
image data which is converted into the speech data. The application turns on the back camera of the smartphone and begins to capture images in real time via the dynamic movement of the mobile. These images are then worked upon by the system to identify the object or the obstacle in the user’s environment.
Fig. 3. The Object Recognition framework
The image recognition framework, shown in Fig. 3, comprises of an object detector, computation of conditional probabilities and object position detection module. The computation of conditional probabilities is done with a threshold of 0.5 to facilitate the decision making capability of the system in order to give the user an output when the object is recognized. This processing is done in the smartphone itself and not a server. Hence this will not put an extra burden of having an internet connection all the time and it won’t fail in places where there the user may face network issues. The image recognition is done with the help of Google’s API Tensorflow. It has a number of software which provides developers with the ecosystem to develop deep learning models. 3.3
Auditory Presentation
Speech is the most important pretext for communication specially for differently abled people having vision impairment. The advancements in artificial intelligence and language processing systems provide us with tools to recognize speech data and produce speech feedbacks. The final output regarding the detected image is given to the user as audio. The audio generated would have the object’s name and its position. This would give the user an idea about what the object is and its relative position with respect to the mobile camera. This was implemented using TextToSpeech class which is provided by Android. This class also provides a number of useful features like setting the pitch of the output speech audio, setting the rate with which the speech is given as the output.
868
D. Ramalingam et al.
4 Results and Discussions The aim was to build a recognition system which could use the live feed from the smartphone camera to carry out the object detection and recognition. The system should be insensitive of the scale and the orientation of the phone. Tensorflow was applied for this module. It is actually an open source software library used across multiple disciplines including neural networks. It is one of the most powerful open source frameworks developed by Google. It is the base on top of which Google Cloud Vision and AlphaGo are built upon. It also supports multiple languages for programming. The added advantage is that TensorFlow Lite brings model execution to a variety of devices, including IOT devices and mobile devices, providing more than a 3x boost in inference speedup over original TensorFlow. This facilitated the way to get machine learning on either a Raspberry Pi or a phone. The application was tested on various day to day objects and Fig. 4 depicts some of the recognized objects by it.
Fig. 4. Object detection by the application
The Fig. 5 is a Precision-Recall curve for Person class evaluating the prediction skill of the model. The curves are plotted for various threshold values.
Fig. 5. Precision-Recall curve for detection of individual
Vision Connect: A Smartphone Based Object Detection for Visually Impaired People
869
One of the most widely accepted and used metric for testing accuracy of an object detection models like R-CNN, Darknet, etc is mAP (Mean Average Precision). Therefore, we have used mAP as the evaluation metric for our model with different IOU (Intersection over union) thresholds to depict the precision-recall tradeoffs over IOUs. As the recall increases, precision values become dynamic and change drastically. Therefore, it is necessary to find the mean average precision by calculating the area under the precision-recall curve for a various threshold. Our result demonstrates a mAP value of 56.38 which is well over a required 50 for a model to have true positives. Considering IOU threshold at 0.7, Precision = 0.83 and Recall = 0.95. F1 Score formulated using, F1 ¼ 2 precision recall=ðprecision þ recallÞ ¼ 1:03 Accuracy ¼ ðTP þ TNÞ=ðTP þ TN þ FP þ FNÞ ¼ 0:85
5 Conclusion The Vision Connect model involves the implementation of an Android application which would help the visually impaired find objects around them and detect obstacles in their paths in real time. The model developed was successfully deployed in the android application. It incorporates a TensorFlow trained model which was used for image recognition in real time. The application was tested for many day to day objects that a visually impaired person may have in his/her environment. This application is actually an audio-visuo implementation and so the detected object is given as output in the form of audio feedbacks. There are many traditional ways used by the blind and the visually impaired to detect objects and obstacles in their environments. Compared with these the effectiveness of our application is good. Without putting in extra expense for a visuo-auditory device or having to rely on one’s own self, this application can make the person’s life so much easier. It eliminates the feeling of being dependent on someone else all the time and will give the user a sense of freedom. Tests done prove that the application meets the basic requirements and the audio output can make the user familiar to the environment around them making them feel confident about finding their way around. In the future, the application can be improved in terms of accuracy alone with the scope of the objects that can be detected by it. In addition to this an integration of an interactive assistant in the app would make it more reliable and accurate and that would in turn make it more effective as well. This could allow the user to look for a particular object real time. The assistant can also have a feature of allowing the user to tell them to remind some important event or information like reminding the user they have an appointment or meeting to go to. The interactiveness can also allow the user to ask the assistant to remember where they have kept a particular object.
870
D. Ramalingam et al.
Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest. ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Blind People’s Association India, https://www.bpaindia.org/ 2. Oh, Y., Kao, W.L., Min, B.C.: Indoor navigation aid system using no positioning technique for visually impaired people. In: Proceedings of 19th International Conference on HumanComputer Interaction (HCI 2017), pp. 390–397, Canada (2017) 3. Auvray, M., Hanneton, S., O’Regan, J.K.: Learning to perceive with a visuo - auditory substitution system: localisation and object recognition with ‘The vOICe’. Perception 36, 416–430 (2007) 4. Deshpande, S, Revati, S.: Real time text detection and recognition on hand held objects to assist blind people. In: Proceedings of IEEE International Conference on Automatic Control and Dynamic Optimization Techniques (ICACDOT) (2016) 5. Lin, B.S., Lee, C.C., Chiang, P.Y.: Simple smartphone-based guiding system for visually impaired people. Sensors 17(6), 13–71 (2017). MDPI 6. Durette, B., Louveton, N., Alleysson, D., Herault, J.: Visuo-auditory sensory substitution for mobility assistance: testing the VIBE. In: Workshop on Computer Vision Applications for the Visually Impaired, Marseille, France (2008) 7. Martnez, B.D.C., Vergara-Villegas, O.O., Snchez, V.G.C., Domnguez, H.J.O., Maynez, L. O.: Visual perception substitution by the auditory sense. In: LNCS, vol. 6783, pp. 522– 533, Springer (2011) 8. Renier, L., Laloyaux, C., Collignon, O., Tranduy, D., Vanlierde, A., Bruyer, R., De Volder, A.G.: The ponzo illusion with auditory substitution of vision in sighted and early-blind subjects. Perception 34(7), 857–867 (2005) 9. Mohane, V., Gode, C.: Object recognition for blind people using portable camera. In: Proceedings of 2016 IEEE World Conference on Futuristic Trends in Research and Innovation for Social Welfare (Startup Conclave) (2016) 10. Lowe, G.D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004) 11. Jabnoun, H., Benzarti, F., Amiri, H.: Visual substitution system for blind people based on sift description. In: Proceedings of International Conference of Soft Computing and Pattern Recognition, pp. 300–305 (2014) 12. Chincha, R., Tian, Y.: Finding objects for blind people based on SURF features. In: Proceedings of 2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops, pp. 526–527, Atlanta (2011)
Two-Phase Machine Learning Approach for Extractive Single Document Summarization A. R. Manju Priya(&) and Deepa Gupta Department of Computer Science and Engineering, Amrita School of Engineering, Bengaluru, Amrita Vishwa Vidyapeetham, Bengaluru, India [email protected], [email protected]
Abstract. Summarization is the process of condensing the information with minimal loss of information and its importance has grown with the increased information availability. This paper explores two-phase machine learning approach for Single document summarization where in Phase I sentence selection is done using Bayesian Network and in Phase II coherence summary is generated using Affinity Propagation. Phase I model has explored variety of features as incidence features, scoring features including psycholinguistic features. The proposed model is built on the benchmark dataset DUC 2001 and tested on DUC 2002 dataset using Rouge scores. The proposed two-phase machine learning approach gives significantly improved results compared to many existing baseline models. Keywords: Extractive text summarization Affinity Propagation Rouge Psycholinguistic features Bayesian Network Duc 2001 Duc 2002
1 Introduction With vast increase in the unstructured textual data that are not organized into traditional databases, summarization has become an important area of research. We need automatic text summarization to condense the text without losing the relevant information to handle the ever-increasing mass of information. Summarization is the process of condensing by losing information which are not more relevant to the concept of the document. Summaries provide sneak in to the document, without reading the entire document. Summarization requires understanding of the document, selecting the relevant sentences, reformulating the sentences and assembling the sentences in a coherent way [1]. This is a difficult cognitive task for humans and this shows the difficulty in automating summarization [2]. Benefits of summarization includes reduced time to read the document, easy document selection for researchers, improved effectiveness of document indexing and keyword extraction and is less biased than a human summarizer. Even after six decades of research the automatic summarizers are yet to reach the perfection of human summarizer. Summarization can be categorized into different types [3] based on the method of generation (Extractive, Abstractive, Sentence Compression), number of documents used to create the summary (Single Document, © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 871–881, 2020. https://doi.org/10.1007/978-3-030-37218-7_93
872
A. R. Manju Priya and D. Gupta
Multi Document), function of the summary [4] (informative, indicative), document genre (News, Literary, Social Network), context (generic, query-guided, update) and target audience (Profile, Non-Profile). This paper focuses on generating generic, extractive, informative single document summary with enhanced content coverage using two-phase machine learning approach. In this work, around eighty-eight features including features based on Psycholinguistics were explored to generate the summary. Rest of paper is structured as follows: Sect. 2 describes the related work in the field, Sect. 3 describes the proposed two-phase machine learning method in detail, Sect. 4 explains the experimental setup, Sect. 5 discusses about the experimental results and Sect. 6 provides the conclusion and future work.
2 Related Work This section summarizes the extractive summarization related papers. Kupiec et al. [5] built a Naive Bayes Classifier to identify summary sentence using fixed-phrase, sentence length cut-off, upper-case feature, paragraph feature and thematic word as features. Aone et al. built on the Kupiec approach and used the presence of signature words to exploit the linguistic dimension of the corpus. Lin modelled the problem with decision tree and concluded that for query guided summarization, certain parameters are independent of other and no isolated characteristic or individual parameter is sufficient. Other researchers proved that high performing extraction strategies is provided by Artificial Neural Networks (ANN) and Hidden Markov Model (HMM). Although machine learning approaches provide good extraction strategy, the sentences selected where not cohesive outside the context. The parameters tuned during the learning phase may not work out well for a different type of document. Methods to improve the strategy of selecting the important sentences which improves the content and context of the summary [6] has been proposed in the literature. Non-negative matrix factorization method has been used to select more meaningful sentences [7]. Identifying the sentence features that help in efficient text summarization is discussed in many papers [8, 9]. Topic heading and sentence position has come up as dominant features for text summarization. Genetic Algorithm has been applied to text summarization [10, 11]. The efficiency of various supervised machine learning algorithms on text summarization is studied in few papers [12, 13]. Yeh et al., used LSA in semantic analysis for text summarization through Text Relational Mapping [14]. Mathai et al., proposed iterative LSA for semantic based concept extraction in legal domain [15]. Graph-based algorithms were applied to extract the word relations to identify subsuming relationship between the sentences to improve the saliency of the summary [16]. Representation of the document in bipartite graph format was implemented to optimise importance of the content, coherence and non-redundancy in the summary [17]. Semi-graph concepts were explored to generate extractive summaries [18]. Chan proposed the use of textual continuity as a shallow linguistic feature for sentence extraction [19]. Discourse tree structure of the document was studied to improve the summary quality [20]. Extraction of the keywords through lexical chain and keyword extraction based on topic centrality was implemented to improve keyword based
Two-Phase Machine Learning Approach
873
summarization [21]. Chidambaram et al., proposed combining summarization algorithms with Latent Dirichlet Allocation (LDA), a topic modeling method [22]. Remya et al., explored sparse coding techniques and dictionary learning for document summarization [23]. This paper focuses on improving the selection of summary sentences from the document such that major concepts of the document is covered through machine learning methods by exploring various features of the sentence. In this work, we compare the performance of two-phase approach with certain standard algorithms and validate the contribution of the individual phases in summary generation.
3 Proposed Methodology Figure 1 shows the general workflow of the proposed two-phase machine learning approach for summary generation which consists of four main modules as discussed below. 3.1
Pre-processing
The raw text is subjected to pre-processing before feature extraction. First text is split into sentences and further tokenized and split into words. This is followed by POS tagging, removal of Stop words and punctuation and lemmatization. The POS tagging was done using Stanford POS Tagger. 3.2
Feature Extraction
Three groups of features are extracted as Incidence Features, Psycholinguistic Features and Scoring Features and is explained in the following. Incidence Features: Importance of the sentence can be ascertained by presence of certain grammatical structures. Hence, the count of POS Tags present in the sentence for adjective, noun, pronouns, verbs, etc. is used for generation of Incidence features. Psycholinguistic Features: Psycholinguistic features extracted from the MRC Psycholinguistic database [24] is shown in Table 1. In order to extract the score from Psycholinguistic database, the Stanford POS tags were mapped to syntactic category of the word in MRC database. Scoring Features: They are divided into six groups based on source of information namely Frequency Group, Similarity Group, Wordnet Group, POS Tag Group, Position Group, and Length Group. The idea of scoring features is captured from multiple references [13, 18, 25, 26]. 1. Frequency Group: Features under this group are predominantly dependent on the frequency of the occurrence of the word in the document. Term Score is average of normalized term frequency of the words in the sentence as shown in Eq. (1). Normalized term frequency of a word is computed using Eq. (2).
874
A. R. Manju Priya and D. Gupta
PREPROCESSING
POS Tagging
Training
Tokeniza on
Stop Word & Punctua on Removal
FEATURE ENGINEERING Feature Extrac on
Data Balancing
Feature Selec on
PHASE - I Train Bayesian Network Model
Bayesian Network Summary
Bayesian Network Model PHASE - II
Two Phase Approach Summary
Affinity Propaga on
Similarity Matrix Genera on
Fig. 1. General workflow of proposed two phase approach for extractive single document summarization
P Term Score ¼
Normalizd Term Frequency of words in the sentence Number of words in the sentence
Normalized Term Frequency½ x ¼
Count of occurence word x in the document Total Number of words in the document
ð1Þ ð2Þ
where x is any word in the document. TF-IDF of the word is calculated using the standard method. Summation of tf-idf score of nouns in the sentence form the tf-idf score of the sentence. TF-IDF Greater than Average is computed as shown in Eq. (3). tf idf greater than average ¼
0 tfidfS \tfidfd 1 otherwise
ð3Þ
where tfidfs is the tf-idf score of the sentence and tfidfd is the average tf-idf score of the document.
Two-Phase Machine Learning Approach
875
Table 1. Psycholinguistic Features and its description S. No 1
Psycholinguistic feature
Psycholinguistic feature description
Concreteness
2
Kucera-Francis written frequency Kucera-Francis number of categories Kucera-Francis number of samples Thorndike-Lorge written frequency Brown verbal frequency
Colerado, Gilhooly-Logie and Pavio norms are merged to generate concreteness Francis and Kucera norms are merged to create occurrence of word’s frequency Francis and Kucera norms are merged to create occurrence of word’s frequency Francis and Kucera norms are merged to create occurrence of word’s frequency Lorge and Thorndike’s L Count is used to get the occurrence frequency Brown’s London-Lund Corpus of English Conversation is used derive the occurrence frequency in verbal language Familiarity of the word in print Imageability Meaningfulness ratings from the Toglia and Battig
3 4 5 6 7 8 9 10 11
Familiarity Imageability Meaningfulness Colorado Norms Meaningfulness - Pavio Norms Age of acquisition
Meaningfulness from the norms of Pavio Logie and Gilhooly norms are used to determine the age of acquisition
Keywords is the sum of term frequency of top k words in the document that occur in the given sentence. 2. Similarity Group: Title Similarity and Average Cosine Similarity are two features extracted under this group. Title Similarity is the measure of similarity of the sentence with the first sentence in the document. Average cosine similarity is the average of the similarity of the sentence with every other sentence in the document. 3. Wordnet Group: Polysemy and Hypernymy features are extracted from wordnet 4. POS Tag Group: Proper nouns Count, Has Numeral are computed based on number of proper nouns and presence of cardinal tag in the sentence respectively. Centrality is the fraction of the nouns present in the sentence is computed using Eq. (4). Centrality ¼
Count of noun words in the sentence that also occur in other sentence ð4Þ Number of unique noun words in the document
5. Position Group: Position feature is the location of the sentence in the document and position value feature which represents the importance of the sentence is computed using Eq. (5).
876
A. R. Manju Priya and D. Gupta
position value ¼
8
n and (pix[x, y][1] - pix1[k, l][1]) > n and (pix[x, y][2] - pix1[k, l][2]) > n) while ((pix[x, y][0] pix1[k, l][0]) > n and(pix[x, y][1] pix1[k, l][1]) > n and (pix[x,y][2] - pix1[k, l][2]) > n): if (l < im.size[1] - 1): l += 1 pixels[k, l] = (0, 0, 255)
3.2.4 Image Compression Joint Photographic Experts Group (JPEG) lossy compression is used for reducing the size of image as the motive is just to indicate where the change has occurred. The image is compressed to 5% quality of original, which is sufficient for viewing the changed areas. This is required, as many images for a website will be stored, which
An Image Processing Based Approach for Monitoring Changes
981
would consume a lot of space, if stored in original high quality. Therefore, for utilizing the space efficiently, image is compressed and stored in a Queue. 3.3
Experimental Results
Figure 3 shows the screenshot taken of a news webpage at a particular time instant, say 10:00 am. After predefined time interval, say 2 h, cron job is executed again and next screenshot (Fig. 4) is saved with its timestamp. The algorithm compares this version (12:00 am) with the previous version (10:00 am) of the webpage to check if there’s change in content. If change in content is detected, it is highlighted in blue color and a rectangular box is formed enclosing the modified portion as can be seen in Fig. 5. The URL of the webpage along with the image highlighting the modified content is appended to the queue.
Fig. 3. Screenshot taken at particular time instant [3]
Fig. 4. Screenshot taken predefined time interval [3]
Fig. 5. Changes highlighted in blue color and enclosed in green rectangular box by ImageComparison algorithm
982
R. Kothari and G. Vyas
4 Conclusion and Future Scope Thus, in this paper we describe a novel approach to monitor website changes and notify the user about it. The approaches using image comparison techniques are available in resource efficient ways to monitor changes. It can be used to monitor time crucial events, such as discounts, offers or arrival of new products and for competitor intelligence. Future scope for this approach includes predicting the frequency of checking the webpages or detect areas to monitor. Text could be extracted from the screenshot using optical character recognition in order to provide the user with the changed content. Compliance with Ethical Standards All author states that there is no conflict of interest. We used our own data. Animals/human are not involved in this work.
References 1. Shobhna, M.C.: A survey on web page change detection system using different approaches. Int. J. Comput. Sci. Mobile Comput. 2(6), 294–299 (2013) 2. Mallawaarachchi, V., Meegahapola, L., Alwis, R., Nimalarathna, E., Meedeniya, D., Jayarathna, S.: Change Detection and Notification of Webpages: A Survey 3. Aignesberger, M.: Watcher Product (2006). http://www.aignes.com 4. Wysigot. http://www.wysigot.com 5. WebCQProduct (2006). http://www.cc.gatech.edu/projects/disl/WebCQ 6. Copernic Technologies: Copernic Tracker Product (2006). http://www.copernic.com/en/ products/tracker/tracker-features.html 7. Douglis, F., Ball, T.: Tracking and viewing changes on the web. In: 1996 USENIX Technical Conference 8. Chakravarthy, S., Subramanian, C.: Automating change detection and notification of web pages. In: Proceedings of 17th International Conference on DEXA. IEEE (2006) 9. Hosting Facts. https://hostingfacts.com/website-monitoring-services/ 10. Jain, S.D., Khandagale, H.P.: A web page change detection system for selected zone using tree comparison technique. Int. J. Comput. Appl. Technol. Res. 3(4), 254–262 (2014). ISSN: 2319–8656 11. Kuppusamy, K.S., Aghila, G.: CaSePer: an efficient model for personalized web page change detection based on segmentation. J. King Saud Univ. Comput. Inf. Sci. 26, 19–27 (2014) 12. Woorank. https://www.woorank.com/en/blog/how-will-changing-ip-address-impact-seo
Vehicle Detection Using Point Cloud and 3D LIDAR Sensor to Draw 3D Bounding Box H. S. Gagana1(&), N. R. Sunitha1, and K. N. Nishanth2 1
Department of CSE, SIT, Tumkur, India [email protected],[email protected] 2 Solution Architect, KPIT Technologies, Bengaluru, India [email protected]
Abstract. In recent times, Autonomous driving functionalities is being developed by car manufacturers and is revolutionizing the automotive industries. Hybrid cars are prepared with a wide range of sensors such as ultrasound, LiDAR, camera, and radar. The results of these sensors are integrated in order to avoid collisions. To achieve accurate results a high structured point cloud surroundings can be used to estimate the scale and position. A point cloud is a set of Data points used to represent the 3D dimension in X, Y, Z direction. Point cloud divides data points into clusters that are processed in a pipeline. These clusters are collected to create a training set for object detection. In this paper, the cluster of vehicle objects and other objects are extracted and a supervised neural network is trained with the extracted objects for the binary classification of the objects representing vehicles or other objects. By learning global features and local features the vehicle objects represented in the point cloud are detected. These detected objects are fitted with a 3D bounding box to represent as a car object. Keywords: Autonomous drive 3D LiDAR Vehicle detection 3D bounding box Point cloud
1 Introduction Vehicle detection and classification are critical for traffic management. Various sensing techniques are used for vehicle detection, including loop detectors and video detectors. Almost all of the vehicle detection and classification tools in conventional practice are based on fixed sensors mounted in or along the right side of the vehicle [13, 14]. This research examines the possibility of considering vehicles mounted with sensors instead of the fixed detectors, thereby enabling alternative monitoring techniques such as the moving observer [4, 5] that were previously too labor intensive to be viable. In autonomous driving, 3D LiDAR scanners play a vital role and create depth data of the environment. Though generating huge Point cloud datasets with point cloud labels involves a substantial quantity of physical annotation, this exposes the effective growth of supervised deep learning algorithms which is frequently data-hungry. The data that we have collected from this framework supports both auto driving scenes as well as user-configured scenes. © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 983–992, 2020. https://doi.org/10.1007/978-3-030-37218-7_104
984
H. S. Gagana et al.
LiDAR is one of the critical technologies in the development and deployment of autonomous vehicles as shown in Fig. 1.
Fig. 1. LiDAR [3]
It is also a part of the development mix in other hot topics like border security, drone mapping and much more. In autonomous driving LiDAR is comprised of a number of spinning laser beams and is usually mounted on the roof of a car, and spins 360° around its axis, measures light intensity and distance, frequency of LiDAR is 50 k to 200 k pulses per second(HZ). We can also collect the LiDAR data in different forms like Airborne, Ground, and Satellite. In this paper, LiDAR (Light detection and Ranging) is used to monitor the ambient traffic around a probe vehicle. The LiDAR does not distinguish between vehicles and non-vehicles, as such, much of the work is focused on segmenting the vehicles from the background. A vehicle classification method based on the length and height of detected vehicles is developed to classify the vehicles into six categories. The classification accuracy is over 91% for the ground truth data set.
2 Related Work In the recent years, Vehicle detection is extensively examined by many researchers. In [1, 3] the usage of convolution neural network to train the machine to identify objects and draw bounding boxes around them is explained. Also, in [3] authors have made use of RGB depth images and made 3D bounding boxes on camera image as well as on LiDAR image. In [2] a methodology is shown on how to identify the moving objects. In [7] authors have made mobile mapping system showing camera and LiDAR together and have provided some features such as 3D images based modeling and visualization of street view along with synchronization between camera and LiDAR images. They have also mentioned about the automated tools to extract 3D road objects. In [10], the working of the LiDAR system and creation of the Digital Elevation Models is discussed. In [8], authors have done a case study on object recognition in public places.
Vehicle Detection Using Point Cloud and 3D LIDAR Sensor
985
In [11], authors have made use of the KITTY Datasets and compared the camera images with LiDAR images. In [6] 3D detection of pedestrians is carried out. Also, similar work is shown in [3]. In [6], authors have split the pedestrians diagram into two parts namely trunk and legs. This is done to ensure that other objects are not considered as pedestrians and obtain a very accurate human profile. In [9], authors have done the work on buildings. They have visualized LiDAR and 3D RGBD images together. Also, in [9, 10] they have established a Region of Interest where all the detection should take place which removes the time taken for unwanted detections. In [3], detection using 2D bounding information is processed and converted to 3D projection of point clouds into images by quantization and later convolution neural network applied. But this would affect the natural 3D variation and accuracy of the point cloud.
3 Approach and Model Architecture With the help of feature descriptor, 3D features are extracted and techniques of machine learning applied on point cloud data of input image to observe dissimilar forms of object classes. Then, the same procedure is applied to categorize different images captured by LiDAR data. Figure 2 is the proposed architecture. Here, LiDAR uses more point cloud datasets to filter unwanted points like noise, ground removal, etc. This process is known as segmentation. These segmented and filtered images are sent as input to the neural network to get the output of point cloud mapping. Then we detect the objects like car, pedestrian, road using two data sets used namely CITYSCAPES DATASET [12] shown in Fig. 4 and KITTY DATASET [11] shown in Fig. 5. These two datasets are used for training purpose.
Fig. 2. System architecture
Fig. 3. Image of point cloud [4]
986
H. S. Gagana et al.
This paper deals with the displaying of LiDAR datasets. The data sets are read from point clouds and displayed as a 3D mesh. This is achieved in two parts here the first part being the conversion of data to a suitable format. Then the second is data is to display a data as a 3D mesh using applications like VTK/Mayavi. A point cloud is a set of data points in space and are usually taken from 3D scanners, signifies a group of multi-dimensional points (usually in X, Y, and Z) which represents 3D information about the world. Each 3D point can contain more information like RGB colour, distance, intensity. Point clouds are integrated from hardware sensors like Time –of –flight cameras, 3D Scanners, etc. Each 3D point can contain more information like RGB colour, distance, intensity. Point clouds are integrated from hardware sensors like Time–of –flight cameras, 3D Scanners, etc. The LiDAR measures the distance of the objects surrounding the test vehicles using the point cloud as shown in the Fig. 3. Convolution neural network is a class of deep learning used to analyze the visual imagery in computer vision applications. Here we use 3D convolution neural network to achieve for Visualization, animation, rendering, and mass customization applications. The proposed approach is implemented in the following steps and explained clearly in Experiments and Results (Fig. 6): • • • • •
Visualizing the point cloud data Identifying and classifying objects of interest: vehicle and other objects Storing the identified data in efficient manner. Detecting the object. Drawing the 3D bounding box
Fig. 4. Cityscape dataset image [9]
Fig. 5. Kitty dataset image [5]
Fig. 6. Workflow of proposed approach
Implementation Details We are using CITYSCAPE DATASET and KITTY DATASET as a training dataset. The input point cloud is passed to segmentation process in order to extract clusters. Then the clusters are extracted by Noise filtering and Ground removal. The Noise filtering process is done by statistical outlier removal process using PCL library which removes the noisy data represented by outlier. Ground removal process makes a
Vehicle Detection Using Point Cloud and 3D LIDAR Sensor
987
significant role in segmentation and extraction of objects. To extract the ground from point cloud before segmentation we focus on the interest of object segmentation but not ground segments, for this ground removal model fitting method RANSAC was used. For this method PCL includes functionality and is used in implementation, these point clouds passed to the Point Net network model as training set for binary classification. Then this trained data set is evaluated with some unseen data and then we draw bounding box to represent the identified objects.
4 Experiments and Results To analyze the vehicle detection, the Input point cloud data set is segmented into many clusters to create a training set for the point net classification which involves the following steps like Noise filtering, Ground Removal, and clustering. 4.1
Noise Filtering
LiDAR scans the surrounding environment and creates point cloud datasets. The densities of the point cloud depend on the nature of the surroundings. The measured errors in outliers tend to be corrupt since it is away from the LiDAR source. It is very difficult to estimate exact representative like curvature or surface normal which is solved by statistical analysis. Each point on outlier is used as input and can be defined by global distance and standard deviation considered and this errors can be trimmed from the set of data. The Fig. 7a and b shows the result of Point cloud Image Before and after filtering. The Fig. 7c shows the filtered noisy data.
Fig. 7. a: Before filtering image [10] b: After filtering image c: Noisy data filtered
4.2
Ground Removal
To extract the ground from point cloud before segmentation we focus on the interest of object segmentation but not ground segments. For this ground removal model fitting method RANSAC, was used. PCL used in the implementation.
988
H. S. Gagana et al.
RANSAC ALGORITHM For the given fitting problem, estimate the parameters. Assumption: 1. From N data items parameters can be projected. 2. Assume M data items totally. Then the algorithm is: 1. Select N data items randomly. 2. Estimate X parameter. 3. With parameter X find in what way data items [M] appropriate the model in the interior of a given image and considered this K. Here K has to depend on a number of structures in the image and percentage of the data that belongs to the structure being fit. In the case of multiple structures, remove the fit data and redo RANSAC after a successful fit. Figure 8a and b shows the results of before ground removal and after ground removal.
Fig. 8. a: Before ground removal [10] b: After ground removal
Fig. 9. a: Outlier image b: Inlier image
It is repeated for the exact number of times, every time model is rejected because a small number of points categorized as Inlier or model is redefined as a measurement error. Then to know whether this point is Inlier or outlier, the threshold is set to ground which specifies how long the point is Inlier as shown in the Fig. 9a and b. In our instance, 30 cm threshold is most effective. It will help to cut some part of clusters like pedestrians. 4.3
Clustering
Figure 10a and c shows the results of clustered car objects and Fig. 10b shows the result of a clustered tree object.
Vehicle Detection Using Point Cloud and 3D LIDAR Sensor
989
Fig. 10. a: Cluster 1 b: Cluster 2 c: Cluster 3
Assume structure of kd-tree to find nearest neighbors then algorithm steps would be: 1. We create a kd-tree Pi to represent a Input point cloud; • Enhance Q to the gradient of collections C, and rearrange Q to an unfilled gradient when the gradient of all points in Q has remained administered. 2. When all points of the set P have been processed, they become a part of point clusters C and the algorithm is terminated or ended. 4.4
Training Data
After the clustering process, the clusters containing car objects are stored in one folder as cars and remaining clusters are stored in another as others. The same object could occur multiple times in the data, but from different viewpoints all objects are collected as different objects. The total number of clusters extracted is over 2000 car objects 4.5
Point Net
It is a significant kind of symmetrical data configuration. Due to its asymmetrical arrangement many researchers transmit this information to 3D voxel networks or Image groups, even though rendering of data is challenging. Point net provides combined architecture for application, ranges from segmentation part, semantic process scene and classification object. 4.6
Training and Validation
For implementing machine learning in real-time applications, one needs to evaluate all the necessary parameters so that we can obtain an optimized application. Due to the over fitting problem, the efficiency of the training set will not be considered efficient for unseen objects. So it should be taken care that all unseen data is evaluated before the validation of the detected data.
990
H. S. Gagana et al.
4.7
3D Bounding Box
The 3D bounding boxes are a method for feature extraction. Firstly the covariance matrix of the point clouds is calculated and its Eigen values and vectors are extracted. We consider the resultant Eigenvectors as normalized and the right-hand coordinate system (major Eigenvectors) represents x-axis and minor vector represents z-axis (minor Eigenvectors) as shown in Fig. 11. On each iteration a major Eigenvector is rotated, rotation order is always the same and it’s performing around other Eigenvectors provides the invariance to the rotation of the point cloud. Now, this rotated major axis vector is treated as a current axis vector. For every current axis, the moment of inertia is calculated. Also, the current axis is used for calculating eccentricity. The current vector is treated as a normal vector of the plane and the input point cloud is projected on to it. This axis is used as a reference to obtain eccentricity and projection.
Fig. 11. 3D Bounding box
4.8
Object Detection
For training and validation the data is split into 2000 and 1000 respectively. The model is trained with training set containing cars and other objects. The remaining gives the efficiency of the testing process. The Fig. 12a and b shows the result of visualized detected object in front and top view.
Fig. 12. a: Detected object (Front view) b: Detected object (Top view)
Vehicle Detection Using Point Cloud and 3D LIDAR Sensor
991
The performance on the training set is not a respectable display on the performance of hidden data due to the problem of over-fitting, consequently the trained model is usually evaluated with some unobserved data and obtained 91% accuracy.
5 Conclusion and Future Scope In autonomous driving, object detection plays a major role in the collision avoidance. LiDAR actively creates hypotheses and detect objects, although a vision based classifier is used for object validation or classification. Processing the point cloud for the extraction of objects makes a prominent role in classification. Point Net which takes input as raw point cloud without voxeliztion that absorbs both global and local point features, as long as effective, simple and efficient method, which classifies the point cloud with high accuracy and helps in detecting the object of interest. With this proposed framework the objects are detected accurately which helps to prevent the collision with the obstacle. The results formed are capable of showing the observer a 3D simulated environment which lets the observer see all the directions of the vehicle. Also, it helps the observer to identify the obstacles like car and plotting a bounding box around it. It cannot identify other obstacles present like pedestrian, cyclist. So further we can make a data set that can store all the possible obstacles and provide a name and classify all the obstacles by providing a 3D bounding box. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest. ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Asvadi, A., Garrote, L., et al.: DepthCN: vehicle detection using 3D-LIDAR. In: IEEE 20th International Conference on Intelligent Transportation Systems (ITSC) (2017) 2. Azim, A., Aycard, O.: Detection, classification and tracking of moving objects. In: Intelligent Vehicles Symposium, 3 July 2012 3. Qi, C.R., Liu, W., Wu, C., et al.: Frustum PointNets for 3D Object Detection from RGB-D Data, arXiv, 13 April 2018 4. Charlesworth, G., Wardrop, J.: A method of estimating speed and flow of traffic from a moving vehicle. In: Proceedings Institute of Civil Engineers, Part II, vol. 3, 158–171 (1954) 5. Wright, C.: A theoretical analysis of the moving observer method. Transp. Res. 7(3), 293– 311 (1973) 6. Wolcott, R.W., Eustice, R.M.: Visual Localization within LIDAR Maps for Automated Urban Driving 7. Wang, R.: 3D building modeling using images and LiDAR: a review, Canada, 05 July 2013 8. MMS Inc, A White Paper on LIDAR Mapping (2001) 9. Li, B.: 3D fully convolutional network for vehicle detection in point cloud. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017
992
H. S. Gagana et al.
10. Choi, M., Ulbrich, S., Lichte, B., Maurer, M.: Multi-target tracking using a 3D-Lidar Sensor. In: 16th International IEEE Annual Conference on Intelligent Transportation Systems, The Hague, Netherlands, 6 October 2013 11. Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. 32(11) (2013) 12. Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Stefan, R., Schiele, B.: The Cityscapes dataset for semantic urban scene understanding. In: IEEE Conference on Computer Vision and Pattern Recognition (2016) 13. Harlow, S., Peng, C.: Automatic vehicle classification system with range sensors. Transp. Res. Part C, 231–247 (2001) 14. Gelenbe, E., Abdelbaki, H.M., Hussain, K.: A laser intensity image based automatic vehicle classification system. In: Proceedings of ITSC 2001, vol. 2, pp. 277–287, August 2001
An Experiment with Random Walks and GrabCut in One Cut Interactive Image Segmentation Techniques on MRI Images Anuja Deshpande1(&), Pradeep Dahikar1, and Pankaj Agrawal2 1
2
Department of Electronics, Kamla Nehru Mahavidyalaya, Nagpur, Maharashtra, India [email protected], [email protected] Department of Electronics and Communication Engineering, G.H. Raisoni Academy of Engineering and Technology, Nagpur, Maharashtra, India [email protected]
Abstract. This research work proposes the Random Walks and GrabCut in One Cut interactive image segmentation techniques using MRI images, particularly those posing segmentation challenges in terms of complexity in texture, indistinct and/or noisy object boundaries, lower contrast, etc. We have computed accuracy measures such as Jaccard Index (JI), Dice Coefficient (DC) and Hausdorff Distance (HD) besides Visual assessment to understand and assess segmentation accuracy of these techniques. Comparison of the ground truth with segmented image reveals that Random Walks can detect edges/boundaries quite well, especially when those are noisy, however, has tendency to latch onto stronger edges nearby the desired object boundary. GrabCut in One Cut on the other hand sometimes needs more scribbles to achieve acceptable segmentation. Keywords: Accuracy GrabCut in One Cut
Graph cuts Hybrid segmentation Random Walks
1 Introduction Segmenting an image has been a stiff challenge for many decades now. The objectives of image segmentation being different for different needs, the techniques developed to solve those were oriented towards solving the specific problem, and did not have generality towards all applications. Complexity of the images further made it more challenging and researchers had to take different approaches to solve specific segmentation problems. This has resulted in multitude of image segmentation techniques [1], falling under manual, automatic and semi-automatic image segmentation categories. Over past several decades, hundreds of image segmentation techniques belonging to various categories [2] have come into existence and each one solved a specific problem. A general-purpose segmentation technique that can segment all types of images is still in development. Automatic segmentation is still a challenging task that is not yet perfected although there have been many attempts towards the same over last few decades. In image segmentation, accuracy of the segmentation process decides if © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 993–1008, 2020. https://doi.org/10.1007/978-3-030-37218-7_105
994
A. Deshpande et al.
the extracted object is good enough to be fed to next application, if not, pre and/or post processing becomes critical to ensure that the segmentation results are acceptable.
2 Literature Survey As put forth in [3], Intensity images can be segmented using mainly four approaches [4, 5]; these approaches are based on threshold, boundary, region and hybrid (these employ boundary and region criteria both). Thresholding based methods [6] discard spatial information and does not perform effectively in the presence of noise or if the foreground object has blurred boundary or textural complexities. Thresholding methods assume that adjacent pixels belong to either foreground or background. Assignment of foreground or background is based on grey levels for greyscale images or color value for RGB images, belonging to specific range. In boundary based methods [3], Sobel or Robert filter [3, 7] is employed as gradient operator and pixel value changes swiftly at the boundary between two regions [8]. Complementary to boundary based methods [4], is regions based methods [9]. In this method, pixels belonging to one region demonstrate attributes which are similar. This category has given rise to a new family i.e. region growing. Split and Merge technique [3, 10] is one of the best-known technique in the category. In Region Growing method, homogeneity is evaluated by comparing one pixel with its neighbor based on its connectivity (4 or 8), however, these methods are influenced by noise and even moderate success is critically dependent on homogeneity criterion [3, 11–13]. The Hybrid methods [3] combine boundary and region criterion and employ morphological watershed segmentation [14] and variable-order surface fitting [4]. With the watershed algorithm it is certain that closed boundaries will get formed along the ridges. Region boundaries prevent water from getting mixed up when the topology gets flooded and yield segmentation [15]. Random Walks falls under this category of hybrid method. The Random Walks interactive segmentation technique as described in [16], formulates the problem on graph. In this method, the user identifies few pixels as foreground and background, called seeds. The remaining pixels in the image are imagined to release a random walker and probability is computed that it reaches the seed point. In Graph partitioning methods, hard constraints are imposed by marking pixels belonging to foreground (object to be extracted) and background that is discarded. In [17] Hard and soft constraints are imposed to segment the images. Through mincut/max-flow the global optimum of soft constraints is achieved that satisfies hard constraints. Cost function incorporating region and boundary properties, as a soft constraint is defined. For the graph cuts [17], algorithm [18] has been developed to facilitate interactive segmentation based on user input and generally requires more seeds to be placed to overcome small-cut problem. There have been quite a few extensions to overcome shortcomings of graph cuts like, multi-level banded graph cuts [19], lazy snapping [20], GMMRF [21] and GrabCut [22] however the later needs more
An Experiment with Random Walks and GrabCut in One Cut
995
computational power and the box interface is not always feasible. Random walks has earlier been proposed by [23] for texture discrimination and different aspects of random walk have been explored in [24, 25]. Automatic clustering using Random Walks has also been proposed in [26, 27]. In the GrabCut in One Cut method [28] the authors propose a new energy term, that through one graph cut can be globally maximized and demonstrate that the NP-hard [29] segmentation functional can be avoided using this approach. The technique further replaces the iterative optimization techniques based on block coordinate descent quite effectively. This paper discusses experiments conducted using Random Walks [16] and GrabCut in One Cut method [28] image segmentation techniques on MRI images. We have organized this paper as follows. We have discussed accuracy measures in Sect. 3. The Sect. 4 illustrates our approach towards this experiment and the experiment details in Sect. 5, the results and findings are described in Sect. 6 followed by observations of this experiment in Sect. 7 and supported by critical discussion in Sect. 8. The concluding comments are categorized in Sect. 9.
3 Accuracy Measures Without a scientific evaluation method to assess correctness and completeness of image segmentation technique, both from coverage as well as spatial overlap perspective, accuracy of the image segmentation technique cannot be ascertained. Further, without establishing accuracy, we cannot ascertain effectiveness of the image segmentation technique either. Various evaluation methods have evolved over time to ascertain sensitivity and specificity of image segmentation techniques, viz. F1 score, SIR (Similarity Ratio), TPR (True Positive Ratio), FPR (False Positive Ratio), etc. Worstcase needs to be additionally worked out for these methods towards sensitivity and specificity. For the same methods like AHE (Average Hausdorff Error), AME (Average Mean Error), etc. are used. Similarly, owing to squaring function, MSE (Mean Squared Error), weighs outliers heavily. ROC Curve [30, 31] is another method but it also requires computation of TPR and FPR. While these methods have significance in certain context, however, require additional methods for conclusive results. Hence, it is necessary that we chose such evaluation methods which are conclusive and to ensure the same, in this experiment, we have employeed accuracy measures Jaccard Index (JI), Dice Coefficient (DC) and Hausdorff Distance (HD) for the assessment of segmentation accuracy of these techniques. These methods are computationally inexpensive, faster, and conclusive and are convincing as against other evaluation methods mentioned above. 3.1
Jaccard Index (JI)
The Jaccard Index compares the similarity as well as diversity between two data sets A and B and is expressed in (1) below.
996
A. Deshpande et al. \ Bj jA \ Bj J ðA; BÞ ¼ jA jA [ Bj ¼ jAj þ jBjjA \ Bj
ð1Þ
By subtracting Jaccard Index from 1 we get the Jaccard Distance and is expressed as BjjA \ Bj dJ ðA; BÞ ¼ 1 J ðA; BÞ ¼ jA [ jA [ Bj
3.2
ð2Þ
Dice Coefficient (DC)
The Dice Coefficient, is used for determining similarity (presence or absence) of two data sets A and B and is expressed in (3) below. QS ¼
2jA \ Bj jAj þ jBj
ð3Þ
Where QS stands for quotient of similarity. 3.3
Hausdorff Distance (HD)
The Hausdorff distance (spatial distance) indicates how different two data sets are from each other and expressed in (4) as – dH ðX; YÞ ¼ inf f 0; X Y and Y Xg Where X :=
S
fz M ; dðz; xÞ g
x X
ð4Þ
ð5Þ
4 Our Approach In this experiment, instead of executing the algorithm once as usually practiced, we have implemented Random Walks technique iteratively, particularly two iterations. The first iteration filtered most of the background and in the second, we have achieved final segmentation. We have performed multiple runs of the segmentation process with varying parametric values (Beta, Threshold, etc.), marked different foreground and background seeds on the image during each run and have selected only the visually best results for further assessment. For the GrabCut in One Cut technique, we have performed multiple iterations for different parametric values (Number of Colour bins per channel and Colour Slope) to get the best result. Additional scribbles were marked to get the best results.
An Experiment with Random Walks and GrabCut in One Cut
997
We have studied Random Walks and GrabCut in One Cut image segmentation technique and conducted this experiment using implementation code [32] and [33] respectively. We have ascertained accuracy of these segmentation technique through – 1. Visual Assessment 2. Accuracy Measures a. Jaccard Index b. Dice Coefficient c. Hausdorff Distance
5 Experiment Setup The image dataset [34] we have used for this experiment consisting of MRI images is the Left Atrial Segmentation Challenge 2013 (LASC’13). The dataset included original images in RAW format as well as ground truth constructed by human subjects. We have used an open source Insight Segmentation and Registration Toolkit (ITK) [35] to extract Image snapshot from the RAW MRI image for the purpose of this experiment. We have performed below steps towards image acquisition and pre-processing for this experiment – 1. 2. 3. 4.
Download LASC’13 Dataset consisting 10 MRI images. Improve Contrast using ITK Toolkit on each data set. Using ITK Toolkit, export RAW image slice to PNG format. Perform Lucy-Richardson deconvolution on exported images to enhance contrast and image edges.
In the Lucy-Richardson deconvolution filtering step, we have implemented disc based Point Spread Function with radius of two. This has enhanced the image contrast and edges significantly better. During the segmentation process, we have visually compared segmented image and the ground truth, and if found acceptable, we have completed the segmentation. Post completing all the image segmentations we have then computed the accuracy measures to establish the accuracy, statistically. We have accepted the segmentation output if the algorithm is unable to segment the image correctly even after varying requisite parameters – Beta and Threshold for Random Walks and Number of Colour Bins and Colour Slope for GrabCut in One Cut and performing numerous segmentation runs for each of these changed values. For all segmented images, we have assessed accuracy by computing accuracy measures - Jaccard Index, Dice Coefficient and Hausdorff Distance. We have performed segmentation process as shown below (Fig. 1).
998
A. Deshpande et al.
Fig. 1. Segmentation process followed in this experiment
6 Experiment Results Below are the segmentation results of our experiment on MRI images using Random Walks and GrabCut in One Cut (Tables 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10).
An Experiment with Random Walks and GrabCut in One Cut
999
Table 1. Segmentation set A001 (Foreground with noisy boundary) Original Image
Ground Truth
JI DC HD
Random Walks
GrabCut in One Cut
0.857340720 0.923191648 3.464101615
0.870056497 0.930513595 2.236067977
Table 2. Segmentation Set A002 (Multi-object foreground with overlap and indistinct boundary) Original Image
Ground Truth
JI DC HD
Random Walks
GrabCut in One Cut
0.380698638 0.551457976 9.165151390
0.819523269 0.900810979 2.645751311
Table 3. Segmentation Set A003 (Foreground with overlap but sharper boundary) Original Image
Ground Truth
JI DC HD
Random Walks
GrabCut in One Cut
0.802176697 0.890230906 5.744562647
0.933717579 0.965722802 2.0
1000
A. Deshpande et al. Table 4. Segmentation Set A004 (foreground with overlap and indistinct boundary) Original Image
Ground Truth
JI DC HD
Random Walks
GrabCut in One Cut
0.679456907 0.809138840 5.744562647
0.900738310 0.947777298 2.828427125
Table 5. Segmentation Set A005 (foreground with overlap and distinct boundary) Original Image
Ground Truth
JI DC HD
Random Walks
0.790398918 0.882930514 5.477225575
GrabCut in One Cut
0.878469617 0.935303514 2.449489743
Table 6. Segmentation Set A006 (Multi-object foreground with partial overlap and mostly distinct boundary) Original Image
Ground Truth
JI DC HD
Random Walks
GrabCut in One Cut
0.667215815 0.800395257 6.0
0.852422907 0.920332937 3.605551275
An Experiment with Random Walks and GrabCut in One Cut
1001
Table 7. Segmentation Set A007 (Blurry foreground with indistinct boundary) Original Image
Ground Truth
JI DC HD
Random Walks
GrabCut in One Cut
0.875173370 0.933431953 4.582575695
0.873622337 0.932549020 2.828427125
Table 8. Segmentation Set A008 (Foreground with distinct sharp object boundary) Original Image
Ground Truth
JI DC HD
Random Walks
GrabCut in One Cut
0.946229913 0.972372182 3.872983346
0.949658173 0.974179152 1.732050808
Table 9. Segmentation Set A009 (foreground with blurry but mostly distinct boundary) Original Image
Ground Truth
JI DC HD
Random Walks
GrabCut in One Cut
0.879618594 0.935954344 3.872983346
0.958133971 0.978619426 2.00
1002
A. Deshpande et al.
Table 10. Segmentation Set A010 (foreground with blurry but mostly distinct boundary) Original Image
Ground Truth
JI DC HD
Random Walks
GrabCut in One Cut
0.895893028 0.945088161 4.242640687
0.926530612 0.961864407 2.236067977
7 Observations During this experiment, we have observed that the Random Walks algorithm, although capable, however, in its current implementation form, does a remarkable job of segmenting the image by marking just one foreground and one background seed. Most other interactive image segmentation techniques including GrabCut in One Cut require various scribbles to be marked as foreground and background to get desired results, at times requiring manual tracing along the noisy or blurred boundary. Random Walks has performed very well to detect noisy object boundaries in most of the cases. Since GrabCut in One Cut allows more interaction to mark additional pixels as foreground and the background, we could achieve desired segmentation in most of the cases. For segmentation sets A001, both the techniques could achieve only around 90% accuracy, GrabCut in One Cut performed slightly better. The blurry edge at the bottom of the image, while it was detected by both successfully, it is not part of the foreground as can be ascertained using the ground truth. Segmentation sets A002 and A006 are classic cases of multi object segmentation. Random Walks although capable, however, in its current implementation form is not suitable for such demands and could not detect as the findings suggest. GrabCut in One Cut, owing to the ability of marking additional scribbles could detect those and achieve better output, however as the accuracy measures suggest, the output from both techniques is unacceptable for most applications. GrabCut in One Cut could segment the dataset A003 much better with accuracy ranging around 95% using additional scribbles. The edge is practically non-existent along the left side of the object leading to the failure by Random Walks. Segmentation set A004 and A005 is another case where object boundary is not distinct, particularly towards lower left bottom edge. GrabCut in One Cut could achieve about 90% accuracy owing to more scribbles; however, for Random Walks it is lower. Segmentation set A007 not only has blurry edges but also does not have a distinct boundary separating the foreground and background. Both techniques could achieve only around 90% accuracy.
An Experiment with Random Walks and GrabCut in One Cut
1003
Table 11. Comparison of Random Walks and GrabCut in One Cut Feature Orientation Relative speed of execution Noisy edge detection Multi-object segmentation capability Foreground and background markers
Random Walks Medical images Slow Excellent Capable, but not in current implementation form One pixel each
GrabCut in One Cut Natural images Fast Weak Yes Many scribbles
Both the techniques have been successful and yielded around 95% accuracy on the segmentation set A008, here the object boundary is quite distinct, although bit blurry. On segmentation set A009, GrabCut in One Cut has been quite successful resulting in around 95% accuracy. We needed more scribbles though at the top of the foreground to handle the blurry area. Random Walks on the other hand could only achieve around 90% accuracy. While most of the foreground has distinct boundary, the blurry area on the top has caused such an outcome. Segmentation set A010, while the foreground object has distinct boundary the entire boundary is blurred. GrabCut in One Cut has performed better with about 93% accuracy as compared to Random Walks that was barely touching 90%.
8 Discussion Image segmentation being a very vast subject, we will restrict this discussion to supervised and/or graph based methods. Other techniques such as Intelligent Scissors [36] also treats the image as graph problem and imposes connectivity structure. In this method, user marks points along the boundary and using Dijkstra’s [37] algorithm computes shortest path. This Intelligent Scissors algorithm is quite fast and implementation is also quite simple however, it requires intuitive handling when the foreground object has noisy or low contrast boundaries. In Active Contour [38] or Level Sets [39], which has numerous implementations, expects user to mark initial seeds near object boundary and the algorithms then evolve the object boundary to local energy minimum by employing different energy functional and domain knowledge. In the graph theory category, one of the first approaches are by [40, 41], however, normalized cuts introduced by [42] drew lot of interest and is an excellent work to follow. Subsequently, many algorithms, which came later, had focused on spectral properties of the graph, as explained in [43, 44]. The isoperimetric algorithm proposed by [45] and Swendsen-Wang algorithm [46] are quite an exceptions. Various approaches to improving results using Graph Cut based methods [47, 48] have been proposed in last one decade. These largely include work on shape priors, additional constraints, etc., for varied applications ranging from biomedical images [49–53], industrial images [54], natural images [55–58], etc. to motion or video segmentation [59, 60]; each one of them trying to overcome some or the other limitation.
1004
A. Deshpande et al.
We however continue to study the vast field of image segmentation pursuing newer techniques with the aim to develop a general algorithm that can segment all images.
9 Conclusion During the course of this experiment, both techniques did not yield acceptable segmentation at first and we had to introduce a pre-processing step in our experiment, which significantly improved the segmentation results. We introduced contrast enhancement and de-blur using Lucy-Richardson algorithm as pre-processing step and it then had significant impact on the overall segmentation outcome. Random Walks could detect weak or noisy boundaries very well in most of the cases, while GrabCut required more scribbles for the same. As we witnessed during this experiment, Random Walks has shortcomings particularly when a strong edge is in proximity of desired object boundary then the algorithm tends to latch onto it instead of picking up desired boundary. As the findings of this experiment suggests, multi-object segmentation could not be achieved using Random Walks in its current implementation form, although capable. GrabCut in One Cut on the other hand has successfully achieved it, although with low accuracy. GrabCut in One Cut has orientation towards segmenting Natural images, however, this experiment suggest that it can also be used on MRI images, with few more scribbles or effort. Below table summarizes the comparative findings for quick reference (Table 11). Overall, GrabCut in One cut has segmented the images comparatively better than Random Walks, largely due to additional scribbles applied to delineate the foreground, sometimes involving manual tracing along the object boundary. While strength of algorithms is certainly critical for the segmentation success, equally important is the subject knowledge. Overlapping organs do pose a very difficult segmentation challenge and ascertaining object of interest itself could be challenging then. We have found that pre-processing of the images is required to achieve acceptable segmentation using both the techniques. We believe this paper will be helpful for fellow researchers to understand these techniques and their applications with facts and scientifically assessed accuracy. We intend to continue studying various image segmentation algorithms and techniques by conducting similar experiments, to understand effectiveness and accuracy on various natural as well as MRI images using the same accuracy measures to maintain consistency and provide mathematical foundation to the assessment. During study and experimentation, we will observe and assess, which algorithms are more suitable or effective for specific segmentation needs. We expect this research shall add value and assist fellow researchers since the recommendations are based on scientifically conducted experiments.
An Experiment with Random Walks and GrabCut in One Cut
1005
Acknowledgement. Anuja Deshpande thanks Leo Grady, the author of Random Walks interactive image segmentation technique, for his excellent work on this algorithm and making publicly available Matlab code, which enabled us to perform this experiment. His comments and suggestions shared with us during the experiment were pivotal to the successful completion of the experiment. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest. ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Kaur, D., Kaur, Y.: Various image segmentation techniques: a review. Int. J. Comput. Sci. Mobile Comput. 3, 809–814 (2014) 2. Adams, R., Bischof, L.: Seeded region growing. IEEE Trans. Pattern Anal. Mach. Intell. 16 (6), 641–647 (1994) 3. Besl, P.J., Jain, R.C.: Segmentation through variable-order surface fitting. IEEE Trans. Pattern Anal. Mach. Intell. 10(2), 167–192 (1988) 4. Haralick, R., Shapiro, L.: Image segmentation techniques. Comput. Vis. Graph. Image Process. 29(1), 100–132 (1985) 5. Sahoo, P., Soltani, S., Wong, A.: A survey of thresholding techniques. Comput. Vis. Graph. Image Process. 41, 233–260 (1988) 6. Ballard, D., Brown, C.: Computer Vision. Prentice-Hall, Englewood Cliffs (1982) 7. Davis, L.: A survey of edge detection techniques. Comput. Vis. Graph. Image Process. 4, 248–270 (1975) 8. Zucker, S.: Region growing: childhood and adolescence. Comput. Vis. Graph. Image Process. 5, 382–399 (1976) 9. Horowitz, S., Pavlidis, T.: Picture segmentation by a directed split-and-merge procedure. In: Proceedings of 2nd International Joint Conference on Pattern Recognition, Copenhagen, Denmark, pp. 424–433 (1974) 10. Cheevasuvit, F., Maitre, H., Vidal-Madjar, D.: A robust method for picture segmentation based on split-and-merge procedure. Comput. Vis. Graph. Image Process. 34(3), 268–281 (1986) 11. Chen, S., Lin, W., Chen, C.: Split-and-merge image segmentation based on localized feature analysis and statistical tests. CVGIP: Graph. Models Image Process. 53(5), 457–475 (1991) 12. Pavlidis, T., Liow, Y.: Integrating region growing and edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 12(3), 225–233 (1990) 13. Meyer, F., Beucher, S.: Morphological segmentation. J. Vis. Commun. Image Represent. 1, 21–46 (1990) 14. Vincent, L., Soille, P.: Watersheds in digital spaces: an efficient algorithm based on immersion simulations. IEEE Trans. Pattern Anal. Mach. Intell. 13(6), 583–598 (1991) 15. Grady, L.: Random walks for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 28(11), 1768–1783 (2006) 16. Wallace, R., Ong, P.-W., Schwartz, E.: Space variant image processing. Int. J. Comput. Vis. 13(1), 71–90 (1994)
1006
A. Deshpande et al.
17. Boykov, Y.Y., Jolly, M.P.: Interactive graph cuts for optimal boundary & amp; region segmentation of objects in N-D images. In: Proceedings of Eighth IEEE International Conference on Computer Vision, Vancouver, BC, Canada, vol. 1, pp. 105–112 (2001) 18. Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision. IEEE Trans. Pattern Anal. Mach. Intell. 26(9), 1124–1137 (2004) 19. Lombaert, H., Sun, Y., Grady, L., Xu, C.: A multilevel banded graph cuts method for fast image segmentation. In: Proceedings Tenth IEEE International Conference on Computer Vision (ICCV 2005), Beijing, vol. 1, pp. 259–265 (2005) 20. Li, Y., Sun, J., Tang, C., Shum, H.: Lazy snapping. In: Proceedings of ACM SIGGRAPH, pp. 303–308 (2004) 21. Blake, A., Rother, C., Brown, M., Perez, P., Torr, P.: Interactive image segmentation using an adaptive GMMRF model. In: Proceedings of ECCV, pp. 428–441 (2004) 22. Kolmogorov, V., Blake, A.: GrabCut—Interactive foreground extraction using iterated graph cuts. In: ACM Transactions on Graphics, Proceedings of ACM SIGGRAPH 2004, vol. 23 (3), pp. 309–314 (2004) 23. Wechsler, H., Kidode, M.: A random walk procedure for texture discrimination. IEEE Trans. Pattern Anal. Mach. Intell. PAMI 3, 272–280 (1979) 24. Gorelick, L., Galun, M., Sharon, E., Basri, R., Brandt, A.: Shape representation and classification using the poisson equation. IEEE Trans. Pattern Anal. Mach. Intell. 28(12), 1991–2005 (2006) 25. Grady, L., Schwartz, E.L.: Isoperimetric partitioning: a new algorithm for graph partitioning. SIAM J. Sci. Comput. 27(6), 1844–1866 (2006) 26. Harel, D., Koren, Y.: On clustering using random walks. In: Proceedings of 21st Conference on Foundations of Software Technology and Theoretical Computer Science, London, U.K, vol. 2245, pp. 18–41 (2001) 27. Yen, L., Vanvyve, D., Wouters, F., et.al.: Clustering using a random-walk based distance measure. In: Proceedings of the 13th Symposium on Artificial Neural Networks, pp. 317– 324 (2005) 28. Tang, M., Gorelick, L., Veksler, O., Boykov, Y.: GrabCut in One Cut. In: 2013 IEEE International Conference on Computer Vision, Sydney, NSW, pp. 1769–1776 (2013) 29. Vicente, S., Kolmogorov, V., Rother, C.: Joint optimization of segmentation and appearance models. In: 2009 IEEE 12th International Conference on Computer Vision, Kyoto, pp. 755– 762 (2009) 30. Powers, D.: Evaluation: from precision, recall and F-measure to ROC, informedness, markedness & correlation. J. Mach. Learn. Technol. 2(1), 37–63 (2011) 31. Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006) 32. Grady, L.: Random Walker image segmentation algorithm. http://leogrady.net/wp-content/ uploads/2017/01/random_walker_matlab_code.zip. Accessed 20 July 2018 33. Gorelick, L.: OneCut with user seeds. http://www.csd.uwo.ca/*ygorelic/downloads.html. Accessed 06 Apr 2018 34. Tobon-Gomez, C.: Left Atrial Segmentation Challenge 2013: MRI training, 05 August 2015. https://figshare.com/articles/Left_Atrial_Segmentation_Challenge_2013_MRI_training/ 1492978/1. Accessed 20 July 2018 35. The Insight Segmentation and Registration Toolkit (ITK) toolkit. https://itk.org/ITK/ resources/software.html. Accessed 20 July 2018 36. Mortensen, E., Barrett, W.: Interactive segmentation with intelligent scissors. Graph. Models Image Process. 60(5), 349–384 (1998) 37. Dijkstra, E.: A note on two problems in connexion with graphs. Numer. Math. 1, 269–271 (1959)
An Experiment with Random Walks and GrabCut in One Cut
1007
38. Kass, M., Witkin, A., Terzopoulos, D.: Snakes: active contour models. Int. J. Comput. Vis. 1 (4), 321–331 (1987) 39. Sethian, J.: Level set methods and fast marching methods. J. Comput. Inf. Technol. 11(1), 1– 2 (1999) 40. Zahn, C.T.: Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans. Comput. 20(1), 68–86 (1971) 41. Wu, Z., Leahy, R.: An optimal graph theoretic approach to data clustering: theory and its application to image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 15(11), 1101– 1113 (1993) 42. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000) 43. Sarkar, S., Soundararajan, P.: Supervised learning of large perceptual organization: graph spectral partitioning and learning automata. IEEE Trans. Pattern Anal. Mach. Intell. 22(5), 504–525 (2000) 44. Perona, P., Freeman, W.: A factorization approach to grouping. In: Proceedings of 5th European Conference on Computer Vision, ECCV 1998, vol. 1, pp. 655–670. Freiburg, Germany (1998) 45. Grady, L., Schwartz, E.L.: Isoperimetric graph partitioning for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 28(3), 469–475 (2006) 46. Barbu, A., Zhu, S.: Graph partition by Swendsen-Wang cuts. In: Proceedings Ninth IEEE International Conference on Computer Vision, Nice, France, vol. 1, pp. 320–327 (2003) 47. Vu, N., Manjunath, B.S.: Shape prior segmentation of multiple objects with graph cuts. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition 2008, Anchorage, AK, pp. 1–8 (2008) 48. Wang, H., Zhang, H: Adaptive shape prior in graph cut segmentation. In: Proceedings of 2010 IEEE International Conference on Image Processing, Hong Kong, pp. 3029–3032 (2010) 49. Yu, Z., Xu, M., Gao, Z.: Biomedical image segmentation via constrained graph cuts and presegmentation. In: Proceedings of 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Boston, MA, pp. 5714–5717 (2011) 50. Wu, X., Xu, W., Li, L., Shao, G., Zhang, J.: An interactive segmentation method using graph cuts for mammographic masses. In: Proceedings of 2011 5th International Conference on Bioinformatics and Biomedical Engineering, Wuhan, pp. 1–4 (2011) 51. Freedman, D., Zhang, T.: Interactive graph cut based segmentation with shape priors. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), San Diego, CA, USA, vol. 1, pp. 755–762 (2005) 52. Furnstahl, P., Fuchs, T., Schweizer, A., Nagy, L., Szekely, G., Harders, M.: Automatic and robust forearm segmentation using graph cuts. In: 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro, Paris, pp. 77–80 (2008) 53. Ali, M., Farag, A.A., El-Baz, A.S.: Graph cuts framework for kidney segmentation with prior shape constraints. In: MICCAI, vol. 4791, pp. 384–392 (2007) 54. Zhou, J., Ye, M., Zhang, X.: Graph cut segmentation with automatic editing for Industrial images. In: Proceedings of International Conference on Intelligent Control and Information Processing, Dalian, pp. 633–637 (2010) 55. Wu, X., Wang, Y.: Interactive foreground/background segmentation based on graph cut. In: Proceedings of Congress on Image and Signal Processing, Sanya, Hainan, pp. 692–696 (2008) 56. Lempitsky, V., Kohli, P., Rother, C., Sharp, T.: Image segmentation with a bounding box prior. In: Proceedings of IEEE 12th International Conference on Computer Vision, Kyoto, pp. 277–284 (2009)
1008
A. Deshpande et al.
57. Lang, X., Zhu, F., Hao, Y., Wu, Q.: Automatic image segmentation incorporating shape priors via graph cuts. In: Proceedings of International Conference on Information and Automation, Zhuhai, Macau, pp. 192–195 (2009) 58. Jeleň, V., Janáček, J., Tomori, Z.: Mobility tracking by interactive graph-cut segmentation with Bi-elliptical shape prior. In: Proceedings of IEEE 8th International Symposium on Applied Machine Intelligence and Informatics (SAMI), Herlany, pp. 225–230 (2010) 59. Hou, Y., Guo, B., Pan, J.: The application and study of graph cut in motion segmentation. In: Proceedings of Fifth International Conference on Information Assurance and Security, Xi’an, pp. 265–268 (2009) 60. Chang, L., Hsu, W.H.: Foreground segmentation for static video via multi-core and multimodal graph cut. In: Proceedings of IEEE International Conference on Multimedia and Expo, New York, NY, pp. 1362–1365 (2009)
A Comprehensive Survey on Web Recommendations Systems with Special Focus on Filtering Techniques and Usage of Machine Learning K. N. Asha(&) and R. Rajkumar School of Computing Science and Engineering, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India {cuashin,vitrajkumar}@gmail.com
Abstract. In present scenario, to improve the consumers buying/purchasing experience, use of technologicalinnovations like Recommender system is very prominent. This small yet powerful utility analyzes the buying pattern of a consumer and suggests which items to buy or use. Recommender systems can be applied to varied fields of consumer’s interest like online shopping, ticket booking and other online contents. Most of the e-commerce sites are attracting huge number of potential customers by providing useful suggestions regarding buying a product or service. This technique is mainly based on the use of Machine Learning, which enables the system to make decisions efficiently. Earlier, these recommendations were mainly relying on filtering process such as collaborative filtering, content based, knowledge based, demographic and hybrid filtering. These filtering techniques, which constitute the recommender system, are discussed in detail in this study. The survey is conducted to describe the various means and methods of recommendation to consumer in real time. This survey presents a comparative study among different types of recommender system based on various parameters and filtering schemes. Moreover, it shows a significant improvement in recommender system by using machine learning based approaches. Keywords: Web recommendation Machine learning
E-commerce Filtering techniques
1 Introduction Due to large scale data digitization and economic means of data processing, various companies and establishments have opted for customized production/service based on individuals’ needs rather than the past trend of mass production [1]. With the deep penetration of internet over the time has filled the World Wide Web with overflowing information and web contents. E-commerce platforms have become the need of the hour as it is beneficial for companies which can offer customized products and services for consumers, who can choose services or products from multiple available options. This process required collection of large amount of data regarding various aspects of consumers’ choice, need, buying pattern etc. Later this bulk data is processed and © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 1009–1022, 2020. https://doi.org/10.1007/978-3-030-37218-7_106
1010
K. N. Asha and R. Rajkumar
analyzed to provide fruitful output which enables different companies in decision making regarding the customization and improvement in their offerings to consumers. The major function of a recommender system is to create a list of products or service and along with this it offers suggestions to an individual consumer that the recommended products or services is best suited as per his/her liking or requirements [2]. Hence, the recommender system can also be termed as a Personalized Agent which caters to the requirements of individual consumer by suggesting accurate recommendations, which are most likely to be picked by the consumer. The response of a recommender system is customized and personalized for each individual. As it is a general point that the recommender systems are based on different filtering techniques viz. content based, collaborative, knowledge based and demographic. These techniques, its merits and demerits are studied in detail in this paper. Using the combination of these techniques in harmony so as to obtain optimal results, a recommender system is called as the hybrid recommender system. Recommender system’s ability to provide unique suggestions to every individual depends heavily on the information received by the knowledge source. The knowledge source can have different origin which is extracted by processing of large amount of data. 1.1
Classification of Recommender System
The figure below [3], depicts the working of four types of filtering techniques, these techniques are discussed briefly (Fig. 1):
Fig. 1. Recommendation techniques and their knowledge sources
• Collaborative: As the name suggests this technique sorts the similar response and choices of different consumers in order to provide suggestions accordingly [4]. • Content based: This technique operates on the basis of product’s specifications and qualities and rating of that product by an individual. This content based info system recommends the consumer based on individual’s preferences [5].
A Comprehensive Survey on Web Recommendations Systems
1011
• Demographic: This technique uses the demographic profile of an individual or group of individuals having similar demographic profile and provides suggestions to consumers belonging to similar demographic profile [6]. • Knowledge based: This recommender technique is based on knowledge sourced from an individual’s requirements and choices. Based on this knowledge, the recommender system finds the product that suits the most as per an individual’s choice [7]. • Hybrid: This technique combines useful technique to obtain optimal results. Majority of e-commerce giants rely on this method. Collaborative technique and content based technique are popular combinations under the hybrid recommendation [11]. Hybrid technique is more efficient in eliminating the limitations of traditional techniques (Table 1). Table 1. Comparative analysis of filtering techniques: merits and demerits Filtering technique Collaborative
Content based
Hybrid technique Demographic Knowledgebased
Merits
Demerits
• Least requirement of domain knowledge • Produces more accurate results even without user and product features • Usage data not required. Cold start problem is absent • Ability to recommend new and rarely used products • Catering to the unique requirements of individuals
• Presence of cold start problem i.e. absence of product rating • Standardized products are required
Combines two or more techniques by eliminating the limitation of each technique Independent of the ratings provided by consumer It is based on explicit information without the need of consumer ratings
• Incapable of observing consumers choice. e.g., Pictures, Music, Movies etc. • Hard to determine item or service value • Shallow: only a very narrow analysis of certain types of content can be provided • Overspecialization: Unable to suggest product or services outside consumers’ interest Suggestions provided are less accurate and cover a broad spectrum of users Chance of inaccurate classification can’t be ruled out Probability of data overload is evident as it is challenging to store consumers profile on large scale
1012
1.2
K. N. Asha and R. Rajkumar
Relevance of Recommender Method
Fig. 2. Application of recommender system
The Recommender system is more relevant in the process and services which require instant response just like in online shopping platforms. Through this, one gets to know about the uses of recommender system. The Fig. 2 shows a categorization of different applications of recommender system that has been introduced which are being utilized by the entertainment world along with e-learning and e-business etc. 1.3
Barriers that are Being Faced by the Web-Based Recommender System
Automatic recommender systems aims at offering recommendations of the items and services that can be appreciated by the user, by considering their past interest and choices depending on the ratings, and history of purchase. These shared filtering techniques fail when they are transplanted into decentralized developments. Arranging recommender systems on Semantic Web involves miscellaneous and complicated issues. The major issues that can be seen in semantic web recommender systems are: 1.3.1 Ontological Commitment The Semantic Web recommender system is featured with machine-decipherable content that has been dispersed on the Web. For assuring that agents can get the knowledge and understand about latter information, semantic interoperability via ontologies should be prepared [8]. 1.3.2 Interaction Facilities In the initiating time, decentralized recommender systems were being subject to multiagent research projects. It enables people to communicate with their peers through making synchronized communication exchange feasible. The Semantic Web comprises
A Comprehensive Survey on Web Recommendations Systems
1013
an intrinsically data- centric environment replica. Thus it can be evaluated that message exchanges have become restricted to asynchronous communication. 1.3.3 Security and Credibility Congested communities normally own proficient and competent mediums for controlling user identity and castigate malicious manners. Decentralized frameworks, in the midst of those shared systems, open business centers and the Semantic Web, moreover, cannot evade misdirection and dishonesty. Caricaturing & character producing in this manner become simple to accomplish [9]. 1.3.4 Computational Complexity and Scalability Centralized method and framework take into account anticipating and restricting network estimate and may in this manner adapt their sifting frameworks to guarantee adaptability. Registering comparability measures for every one of these “people” in this way winds up infeasible. Therefore, adaptableness must be guaranteed while limiting last calculations to sufficiently limit neighborhoods. 1.3.5 Low Profile Overlap Concentrated profiles are normally characterized by vectors demonstrating the customer’s area of interest and their choices for different products. In order to reduce the dimensionality and ensure profile overlap, Recommenders, [10] function in the sphere where product sets are moderately small. There it requires incorporating few new efficient methods for ensuring the solution of profile overlap issue along with making profile similarity measures meaningful.
2 Related Work 2.1
Content Based Filtering Techniques
According to Gemmis et al. [16] discussed that Content-based recommender systems (CBRSs) are based on product and customer depictions for making product representations and user profiles which will be efficiently utilized to recommend products and services similar to those items which people liked the most in the previous times or similar like those which hold a great clientele base in the past decades. This chapter is trying to introduce such techniques that can be efficient and competent in offering better suggestions to the CBR methods which will help business to become more prosperous and fruitful (Fig. 3). Bocanegra et al. [17] have stated that on channel like Health and fitness related videos are quite famous, but the matter of issue is their quality, they are generally of low quality but still they are being followed by people. One of the best methods for improving the quality videos is generating impressive and educational health content, to offer information about how to gain fitness and maintain health. This paper put emphasize on investigating the possibility of framing a content-based method for recommendation which links consumers to repute health informative website like Medline Plus for a given health video from YouTube.
1014
K. N. Asha and R. Rajkumar
Fig. 3. Application of recommender system through implementing different methods
Bianchini et al. [20] have suggested that the idea of food recommender system is apt and suggestive in providing customers- modified, personalized as well as healthy menus along with considering consumer’s short/long-term likes and medical prescriptions. Actually the chief novel contribution of the projected scheme is the usage of recommendation types linked with customer’s profile that offers the idea of their medical condition and thus help in selecting food according to their behaviors too (Table 2). Table 2. Comparative analysis of content-based filtering techniques Article
Involvement
Performance evaluation
Dataset
Application region
Algorithm name
Bocanegra et al. [17] Bianchini et al. [20]
Video recommendation Recommendation the basis of food stuff selection
Mean Accuracy
Videos on Youtube
HealthRec-Sys
Bagher et al. [21]
Consumer behavior
Videos related to health Food Prescription and formation of the menu News recommendation
Reddy et al. [26]
Provides movie Euclidean distances recommendation based on the movie’s genres
Average response Method dataset time and excellence acquired through the of recommendation BigOven.com Accuracy, remind and F1- Evaluation
Data derived from twitter’s page, New York Times and BBC articles Movie Lens dataset
Publication recommender
PREFer
Evolutionary recommender system Content-based filtering using genre correlation
In the words of Bagher et al. [21], this study portrays the theory of trend for calculating the interests of user while choosing items in the midst of different groups of that are offering similar items or services. Trend based user (TBU) approach is
A Comprehensive Survey on Web Recommendations Systems
1015
developed through involving consumer profile into consideration and a new expansion in developing Distance Dependent Chinese Restaurant Process (dd-CRP). dd-CRP is a Bayesian Nonparametric, capable of capturing the dynamics of customer’s area of interest. Reddy et al. [26] have proposed that the recommendation system which provides movie suggestions to viewers, works on the likings and preferences of viewers in the past. These suggestions are presented based on one or more factors like that of actors in a movie, movie genre, music and or director etc. Here they have presented a content based recommender system that will provide accurate suggestions based entirely on the genre of the film. 2.2
Collaborative Filtering Recommender System
It was calculated by Nilashi et al. [24] that for ensuring that the recommendations are being provided efficiently and instantly, the exchange between calculation time and its correctness must be taken into account. In this paper, they have suggested the use of a fresh hybrid recommendation technique using Collaborative filtering (CF) method. Use of fresh technique employs dimensionality reduction technique (DRT) and ontology method, which shall provide effective remedy of two major limitations i.e. Shortage and Expansion (Table 3). Table 3. Comparative analysis of collaborative filtering techniques Article
Contribution
Performance Dataset measurement
Application area Algorithm name
Nilashi et al. [24]
Dimension reduction, SVD, clustering and Ontology Rank, rating, trust and ACO optimization Fuzzy logic and context aware model
MAE, Precision, recall, F1score
MovieLens
Online Movie User-and-itemrecommendation based + SVD + EM + Ontology
MAE, RMSE, Rate coverage
FilmTrust, Epinions and Ciao
Online trust and TCFACO opinion mining
MAE, coverage
Context-aware RSs (CARSs) Movie and restaurant recommendation
Combination of collaborative filtering and textual content along with deep learning
Precision, Recall, Normalized discounted cumulative gain, F1score
LDOSCoMoDa, RestaurantCustomer Private realworld dataset
Parvin et al. [29] Linda et al. [33] Xiong et al. [34]
Web service Deep hybrid collaborative recommendation filtering approach for service recommendation (DHSR)
1016
K. N. Asha and R. Rajkumar
Parvin et al. [29] have proposed a fresh approach based on CF technique, which forecasts about the absent ratings, correctly. This approach is termed as TCFACO Trust-aware collaborative filtering method based on ant colony optimization. It performs three functions; firstly, it segregates the consumers based on the ratings and social trust. Next, values are allocated to the consumer in order to find resemblance with the potential consumers. Later, a cluster of users with similar preferences are selected to determine the unidentified ratings. Linda et al. [33] suggested that despite being extensive use of CF, it faces the issue of insufficiency. To address the issue they present the addition of fuzzy trust into CARSs, this will solve the existing problem of insufficiency without compromising on the quality if output. This is a two-step process; they first introduce fuzzy trust and establish this to context-aware CF technique for enhanced suggestions. Later introduce fuzzy trust propagation to mitigate insufficiency problem and provide accurate suggestions. Xiong et al. [34] stated that machine learning finds extensive use in matrix factorization based CF recommender system for selecting a web service. It is difficult to confine the composite relations between web applications and web components, thus resulting in inferior suggestions. To solve this problem a fresh deep learning hybrid model is developed by the union of CF text/print content and to identify the composite relation between web applications and web components, deep neural network is utilized. 2.3
Hybrid Filtering Techniques
Kouki et al. [18] presented a system termed as HyPER (HYbrid Probabilistic Extensible Recommender), establishes and questions over a large spread of data origins. Such data sources/origins comprise of various consumer-consumer, product- product identical qualities, web content and social details. HyPER is designed to harmonize various information signals to intelligently make recommendations. This ensures enhanced and better recommendations by employing graphical representation termed as hinge-loss Markov random fields. According to Tarus et al. [19], various recommender systems have proved to be successful yet suffer from insufficient correctness, particularly in case of online education because of the dissimilarity in students’ understanding level, grasping power, and learning sequence. To address this issue, they suggest a hybrid recommender system based on knowledge that uses ontology and Sequential Pattern Mining (SPM) to suggest online education resource to students. This method explores learners’ chronological learning model. Also, the hybrid technique can tackle cold start and data shortage issues. Vashisth et al. [23] discussed the use of fuzzy logic to house multiplicity and vagueness in consumers’ preferences and concerns. This assures in providing more accurate suggestions with dissimilar preferences. They have devised a hybrid recommender system based on fuzzy logic with the fresh use of type 2 fuzzy sets to generate consumer models competent enough to identify the natural vagueness of human actions in terms of different choices among different consumers. This system enhances the correctness of the recommendations by minimizing the room for error (Table 4).
A Comprehensive Survey on Web Recommendations Systems
1017
Table 4. Comparative analysis of hybrid filtering techniques Article
Contribution
Performance Dataset measurement
Application area
Algorithm name
Kouki et al. [18]
Probabilistic soft logic approach for prediction
RMSE and MAE
Online Movie recommendation
HyPER
Tarus et al. [19]
Ontology creation, similarity computation for prediction, generating top learning items and Final recommendation Interval Type-2 fuzzy rule set
MAE, Precision, recall
e-learning
(CF + Onto + SPM)
Product recommendation system
Fuzzy feature combination hybridization
Tourism recommendation
PixMeAway GUI
Vashisth et al. [23]
Neidhardt et al. [35]
Recall, Confusion matrix, RMSE
Seven factors are Distance considered based on similarity user profile
Yelp academic dataset and the Last.fm dataset Learning management system (LMS) of a public university
Book purchase and shopping women apparels Private data
In their research, Wan et al. [30] have presented a hybrid filtering(HF) recommendation system integrated with Learner Influence Model (LIM)-to collect interpersonal data, Self-Organizing based (SOB) recommendation approach-to suggest best possible student set for active students and strategy and sequential pattern mining (SPM) to suggest learning objects (LOs) to students. It was proposed by Neidhardt et al. [35] that travel and tourism services pose complex test to a recommender system. In this study, they present a novel way by using pictorial depiction to help completely extract visitors’ choice for tourism services/products. The model is created using 7 attributes, later the traveler selects the images of his/her choice and an attribute is scored on its basis. 2.4
Application Based Recommender Systems
Chau et al. [22] presented the description of Content Wizard; it is also a concept-based RS for proposing learning and reading materials that can match up with the requirement and instructor’s educational objectives while designing an online programming course. In this technique the instructors are required to offer a set of code instances that in cooperation represent the learning objectives which are supposed to be associated with every course unit. This Content Wizard tool help the instructor to minimize the time he needs to spend in task management thus this tool is time efficient. Porcel et al. [25] offered a scheme called Sharing Notes, which is an educational and academic societal network which helps people in developing personalized recommendations for enhancing the quality of teaching and learning practices. To accomplish this
1018
K. N. Asha and R. Rajkumar
thought and idea it requires integrating a hybrid RS which utilizes ontology to describe the maintenance of trust amongst its users in the network. Along with this it adopts the furry linguistic modeling for improving the illustration of information (Table 5). Table 5. Comparative analysis of application based recommender system Article Chau et al. [22] Porcel et al. [25] Sridevi et al. [28]
Application Algorithm Content Wizard Programming course instruction Teaching & SharingNotes learning process Hospital selection
Finding right doctors and hospitals
Dataset Java Course data from University of Pittsburgh
Performance Precision, Recall and F1- Score
Epinions dataset
Precision
Private dataset
Weightage of votes and ratings
Sridevi et al. [28], presented a personalized data concerning the healthcare suppliers and hospitals in an easy, modified, and well thought-out way for the users and thus assisting them in managing their health issues. Under this approach user is being offered by the opportunity of creating his ownprofile by considering input rating, and after that they are being provided by a list of popular hospitals and doctors using mutual filtering with the help of the recommender system. A combined module is projected by considering the available demographic data, alike user’s rating forecast, total number of reviews and ratings supported on average ratings will help to solve cold start newuser problem. 2.5
Machine Learning Based Approaches
Meza et al. [27] concluded that the earning sources of different nationwide and local jobs that come under government are all based on taxation. Escalating the earning with tax payers has always been a major challenge for government institutions. In the present study Maze et al. have established a fuzzy-based RS approach and its preliminary outcomes that are being implemented to a dataset from the Municipality of Quito, offer their suggestions to the local peoples about their payments behavior. This projected approach enhanced people’s awareness towards the tax payments and thus it helps in increasing governmental institutions’ financial profits. According to Sattar et al. [31], one of the main challenges which this theory is facing is of making useful recommendation from the available sources and set of millions of items with sparse ratings. Different and various types of approaches have been projected for enhancing the quality and accurateness, but somewhere they all have overlooked possible barriers that can hinder the process of RS. Few of the issues are sparsity and cold start problems. Sattar et al. [31] tried to suggest a new hybrid recommendation approach which unites content-based filtering with collaborative filtering that overcome above mentioned issues.
A Comprehensive Survey on Web Recommendations Systems
1019
Subramaniyaswamy et al. [32] have pointed out the all the issues associated with recommender system approach, like scalability, sparsity, and cold-start. They have designed knowledge-based sphere particular ontology for developing individualized recommendations. They have also established two new and dissimilar approaches, first is ontology-based prognostic models and prominent representation model. This prediction model is persuaded by data mining algorithms by correlating the interests of the user. They have introduced a novel alternate of KNN algorithm forthe collaborative filtering based recommender system known as Adaptive KNN. Table 6. Comparative analysis of machine learning based recommender system Article
Application
Meza et al. [27]
RS for tax Fuzzy logic payments Movie KNN, SVM, recommendation Naïve Bayes, decision tree Movie Adaptive recommendation KNN
Sattar et al. [31]
Subramaniya swamy et al. [32]
2.6
Algorithm
Dataset
Performance
Municipality of Quito (payment behavior) IMDB movie dataset
NA (new discount methods suggested) MAE (Mean Absolute Error)
Movie lens
Precision, recall and accuracy
Merits and Demerits of Content-Based Filtering Methods
Problems faced in collaborative filtering method are solved in the content based (CB) filtering technique. These are independent of ratings of products/services; as a result one can get suggestions without compromising the correctness of the recommendations. In case the consumer changes his choice, the CB filtering method easily adapts to such changes and quickly provides fresh suggestions. It provides suggestions without revealing the identity of consumers; this maintains confidentiality [12]. CB filtering method does face many issues as mentioned in the paper [13]. Metadata is of great importance for CB filtering method as they need detailed information of products and well maintained profile of consumers. This is termed as Limited Content Analysis. Overspecialized content [14] also challenges this technique as the products are suggested to consumers is similar to the products already mentioned in their profiles. 2.7
Merits and Demerits of Collaborative Filtering Methods
Collaborative filtering methods benefits over CBF as the former easily executes in areas where there is limited information about the products and is tough to get it analyzed by a computer. Although, CF techniques are widely used, it suffers some probable challenges, discussed below. 2.7.1 Cold-Start Problem This means that in absence of relevant information about consumers as well as products, recommender system can’t produce desired results [2]. This affects the correctness of the suggestions. As the new consumers have not provided any personal details nor rated any product, so their profile remains incomplete making it impossible for the recommender system to know about his preferences.
1020
K. N. Asha and R. Rajkumar
2.7.2 Data Sparsity Problem This issue refers to shortage of required data, i.e., when a consumer has reviewed only a few products in comparison to other consumers who have rated in large numbers [15]. This results in thin consumer-product template, i.e., incapability find perfect neighbors thus ultimately production of inaccurate suggestions. 2.7.3 Scalability This refers to the linear growth in the consumers and products count, which poses challenges in providing suggestions [15]. It is therefore vital to develop recommender methods that are able to scale up with the growth of items and users data. Dimensionality reduction techniques, like (Singular Value Decomposition) SVD method, are incorporated to address the issue and generate more accurate suggestions. 2.7.4 Synonymy When similar products have multiple names, it creates a situation of Synonymy. In order to handle this problem several approaches, like automatic term expansion thesaurus construction, and SVD, particularly Latent Semantic Indexing are used. These approaches also suffer from a limitation where additional terms may have different meanings from what is essential, thus lowering the quality of suggestions.
3 Conclusion In this paper, we have presented a detailed survey report on various recommender systems that have been studied how data analytics and multiple filtering techniques are applied in real time applications to generate most of the desirable suggestions. It was found while discussing various filtering techniques like collaborative, content based and hybrid filtering methods, that hybrid filtering is a proven to be the most suited technique for producing the most accurate suggestions to consumers. Present era of social networking sites and e-commerce/e-learning platforms have penetrated the web worldwide and is growing exponentially. As a result, there is a huge data available, regarding social network information, on the web. This vital information can be utilized by the recommendation system to generate the most accurate recommendations to its users. In order to maintain the secrecy of users’ information it is suggested to develop a novel recommender system to address privacy issues in the near future.
References 1. Aggarwal, C.C.: Recommender Systems, pp. 1–28. Springer, Cham (2016) 2. Burke, R.: Hybrid Web Recommender Systems, pp. 377–408. Springer, Heidelberg 3. Burke, R.: Hybrid recommender systems: survey and experiments. UMUAI 12(4), 331–370 (2002) 4. Ekstrand, M.D., Riedl, J.T., Konstan, J.A.: Collaborative filtering recommender systems. Found. Trends® Hum. Comput. Interact. 4(2), 81–173 (2011)
A Comprehensive Survey on Web Recommendations Systems
1021
5. Wang, D., Liang, Y., Xu, D., Feng, X., Guan, R.: A content-based recommender system for computer science publications. Knowl.-Based Syst. 157, 1–9 (2018) 6. Al-Shamri, M.Y.H.: User profiling approaches for demographic recommender systems. Knowl.-Based Syst. 100, 175–187 (2016) 7. Carrer-Neto, W., Hernández-Alcaraz, M.L., Valencia-García, R., García-Sánchez, F.: Social knowledge-based recommender system. application to the movies domain. Expert Syst. Appl. 39(12), 10990–11000 (2012) 8. Golbeck, J., Parsia, B., Hendler, J.: Trust networks on the semantic web. In: Proceedings of Cooperative Intelligent Agents, Helsinki (2003) 9. Ziegler, C.N., Lausen, G.: Analyzing correlation between trust and user similarity in online communities. In: Jensen, C., Poslad, S., Dimitrakos, T. (eds.) Proceedings of the 2nd International Conference on Trust Management. Volume 2995 of LNCS, pp. 251–265. Springer, Oxford (2004) 10. Miller, B., Albert, I., Lam, S., Konstan, J., Riedl, J.: MovieLens unplugged: experiences with an occasionally connected recommender system. In: Proceedings of the ACM 2003 Conference on Intelligent User Interfaces. ACM, Chapel Hill, NC, USA (2003) 11. Smyth, B.: Case-based recommendation. In: Brusilovsky, P., Kobsa, A., Nejdl, W. (eds.) The Adaptive Web, pp. 342–376. Springer, Heidelberg (2007) 12. Shyong, K., Frankowski, D., Riedl, J.: Do you trust your recommendations? an exploration of security and privacy issues in recommender systems. In: Emerging Trends in Information and Communication Security, pp. 14–29. Springer, Heidelberg (2006) 13. Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender system. a survey of the state-of- the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 17(6), 734–749 (2005) 14. Zhang, T., Vijay, S.I.: Recommender systems using linear classifiers. J. Mach. Learn. Res. 2, 313–334 (2002) 15. Park, D.H., Kim, H.K., Choi, I.Y., Kim, J.K.: A literature review and classification of recommender systems research. Expert Syst. Appl. 39(11), 10059–10072 (2012) 16. De Gemmis, M., Lops, P., Musto, C., Narducci, F., Semeraro, G.: Semantics-aware contentbased recommender systems. In: Recommender Systems Handbook, pp. 119–159. Springer, Boston (2015) 17. Bocanegra, C.L.S., Ramos, J.L.S., Rizo, C., Civit, A., Fernandez-Luque, L.: HealthRecSys: a semantic content-based recommender system to complement health videos. BMC Med. Inform. Decis. Making 17(1), 63 (2017) 18. Kouki, P., Fakhraei, S., Foulds, J., Eirinaki, M., Getoor, L.: Hyper: a flexible and extensible probabilistic framework for hybrid recommender systems. In: Proceedings of the 9th ACM Conference on Recommender Systems, pp. 99–106. ACM, September 2015 19. Tarus, J.K., Niu, Z., Yousif, A.: A hybrid knowledge-based recommender system for elearning based on ontology and sequential pattern mining. Future Gener. Comput. Syst. 72, 37–48 (2017) 20. Bianchini, D., De Antonellis, V., De Franceschi, N., Melchiori, M.: PREFer: a prescriptionbased food recommender system. Comput. Stand. Interfaces 54, 64–75 (2017) 21. Bagher, R.C., Hassanpour, H., Mashayekhi, H.: User trends modeling for a content-based recommender system. Expert Syst. Appl. 87, 209–219 (2017) 22. Chau, H., Barria-Pineda, J., Brusilovsky, P.: Content wizard: concept-based recommender system for instructors of programming courses. In: Adjunct Publication of the 25th Conference on User Modeling, Adaptation and Personalization, pp. 135–140. ACM, July 2017 23. Vashisth, P., Khurana, P., Bedi, P.: A fuzzy hybrid recommender system. J. Intell. Fuzzy Syst. 32(6), 3945–3960 (2017)
1022
K. N. Asha and R. Rajkumar
24. Nilashi, M., Ibrahim, O., Bagherifard, K.: A recommender system based on collaborative filtering using ontology and dimensionality reduction techniques. Expert Syst. Appl. 92, 507–520 (2018) 25. Porcel, C., Ching-López, A., Lefranc, G., Loia, V., Herrera-Viedma, E.: Sharing notes: an academic social network based on a personalized fuzzy linguistic recommender system. Eng. Appl. Artif. Intell. 75, 1–10 (2018) 26. Reddy, S.R.S., Nalluri, S., Kunisetti, S., Ashok, S., Venkatesh, B.: Content-based movie recommendation system using genre correlation. In: Smart Intelligent Computing and Applications, pp. 391–397. Springer, Singapore (2019) 27. Meza, J., Terán, L., Tomalá, M.: A fuzzy-based discounts recommender system for public tax payment. In: Applying Fuzzy Logic for the Digital Economy and Society, pp. 47–72. Springer, Cham (2019) 28. Sridevi, M., Rao, R.R.: Finding right doctors and hospitals: a personalized health recommender. In: Information and Communication Technology for Competitive Strategies, pp. 709–719. Springer, Singapore (2019) 29. Parvin, H., Moradi, P., Esmaeili, S.: TCFACO: trust-aware collaborative filtering method based on ant colony optimization. Expert Syst. Appl. 118, 152–168 (2019) 30. Wan, S., Niu, Z.: A hybrid e-learning recommendation approach based on learners’ influence propagation. IEEE Trans. Knowl. Data Eng. (2019) 31. Sattar, A., Ghazanfar, M.A., Iqbal, M.: Building accurate and practical recommender system algorithms using machine learning classifier and collaborative filtering. Arab. J. Sci. Eng. 42(8), 3229–3247 (2017) 32. Subramaniyaswamy, V., Logesh, R.: Adaptive KNN based recommender system through mining of user preferences. Wirel. Pers. Commun. 97(2), 2229–2247 (2017) 33. Linda, S., Bharadwaj, K.K.: A fuzzy trust enhanced collaborative filtering for effective context- aware recommender systems. In: Proceedings of First International Conference on Information and Communication Technology for Intelligent Systems, vol. 2, pp. 227–237. Springer, Cham (2016) 34. Xiong, R., Wang, J., Zhang, N., Ma, Y.: Deep hybrid collaborative filtering for web service recommendation. Expert Syst. Appl. 110, 191–205 (2018) 35. Neidhardt, J., Seyfang, L., Schuster, R., Werthner, H.: A picture-based approach to recommender systems. Inf. Technol. Tourism 15(1), 49–69 (2015)
A Review on Business Intelligence Systems Using Artificial Intelligence Aastha Jain(&), Dhruvesh Shah, and Prathamesh Churi Mukesh Patel School of Technology and Management and Engineering, Vile Parle (West), Mumbai 400056, India [email protected], [email protected], [email protected]
Abstract. Artificial Intelligence [AI] is paving its way into the corporate world focusing at the fundamentals of the business systems which are related to manufacturing, entertainment, medicine, marketing, engineering, finance and other services. Businesses today have a lot of data which can be used in the field of artificial intelligence in the form of data training, learning live data and historic data. Dealing with complex and perplex situations can be done immediately with these automated systems thus saving a lot of time. This paper puts forward a framework with different methods that makes use of artificial intelligence in business systems today. These methods include Swarm intelligence and Port intelligence which can even exceed human intelligence and be of great help to the corporate world. These help in constructing frameworks and algorithms that have helped the business systems in achieving the optimum results. Keywords: Artificial intelligence Business intelligence Swarm intelligence Port intelligence
Data mining
1 Introduction Artificial Intelligence has a fundamental purpose of creating intelligent machines and computer programs that are capable of understanding and learning and sometimes exceeding the power of human intelligence [1, 9]. Presently AI is used in many equipments, games, automations software etc. [8, 11]. It heavily depends on the data training, historic data and the live data to learn (Fig. 1). Emergence of AI in Business: After the arrival of web enabled infrastructure and quick steps made by the AI development community, AI has been increasingly used in business applications [11]. Performing mechanical competition using program rules is very easily performed by the computers and these simple monotonous tasks are efficiently and robustly solved better than the humans. A form of the computers even better than the humans. But unlike humans’ computers may have difficulty in understanding some specific scenarios and situations which is tackled by artificial intelligence. Thus, most of the year the search is based on understanding the intelligent human behaviour which involves © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 1023–1030, 2020. https://doi.org/10.1007/978-3-030-37218-7_107
1024
A. Jain et al.
Fig. 1. Learning journey of AI [2]
problem solving skills abstract thoughts, high-level reasoning and recognition of patterns [3]. Artificial intelligence helps us comprehend this procedure by creating scenarios and recreating it potentially helping us boost beyond our abilities. It includes the following: • • • • • • • • • 1.1
Attributes of intelligent behaviour Reasoning and thinking Reasoning to solve problems Repeating and understanding from experience Applying the acquired knowledge Exhibiting creativity and imagination Dealing with complex and perplex situations Responding rapidly and efficiently to new situations Handling ambiguous and random information Components of Business Intelligence
Businesses today have admittance to a lot of data. Business intelligence puts the data of these series of methods and makes more profit. The major components of business intelligence are (Fig. 2): 1. OLAP (Online Analytical Processing) This component of BI allows administrators to categorize and choose combinations of data for their monitoring. The foremost purpose of OLAP is to recapitulate information from the large database for the decision-making process [7]. The report is then generated using OLAP and can be accessible in a format as per the user’s requirements [10]. It has many advantages including quick response to queries, flexible working with data and helps user create views using spreadsheet. 2. Data mining Data mining is the process of examining data from various sources and giving a brief information about it which can be used to increase profits and decrease the costs. It’s chief purpose is to find associations or patterns among loads of fields in huge databases.
A Review on Business Intelligence Systems Using Artificial Intelligence
1025
Fig. 2. Components of business intelligence [5]
3. CRM marketing To successfully manage customer relationships, the organization must serve their customers in a particular manner and to understand their buying behavior and keeping in mind their needs. Customer satisfaction for various markets, products and services should be kept in mind too. 4. GIS The collective use of Geographical Information Systems and Business Intelligence today is helping businesses function and increase productivity and decrease the costs. Applications allow companies to track employees, machinery, assets and also collect and deduce data in real time to make knowledgeable decisions. 5. Knowledge management Knowledge management helps the business to gain awareness and understanding from its own experience. Detailed knowledge management activities help emphasis on obtaining, stowing and exploiting knowledge for active learning, problem solving, tactical planning and decision making.
2 Related Works 2.1
Swarm Intelligence
Swarm Intelligence is an algorithm which is a recent and a growing example in intelligent computing which is inspired by biology. It is used in the real-world applications for building adaptive systems. Swarm intelligence used tracking algorithms which are encouraged by the shared behavior of animals which display regionalized, self-organized patterns in the scavenging procedure [12].
1026
A. Jain et al.
1. Genetic Algorithm (GA) The Genetic Algorithm is an algorithm which discovers improved explanations with the help of nature enthused by evolutionary biology like reproduction, selection, mutation, and also recombination. We find estimated solutions using this search technique. A generation is every next iteration. Iterations from 50 to 500 or more generations are natural for any GA. Run is a complete set of generations. A single or more significantly appropriate chromosomes in the population after the final run ends. Report statistics are frequently averaged over multiple runs of this algorithm on the exact same problem by the researchers. 2. Honeybee Foraging Mechanism Bees adopt the well-organized communication of food source and their foraging activities are inspiring. Employed bees collect the food in the hive. Bees go over different flowers for the collection of food. If there is a sufficient amount of food, the bee makes many journeys between its hive and the flowers. The food is then transported to onlooker bees through “waggle dance”. The dance is an efficient way of sharing information and communication among the bees. Onlooker bees observe the dance and hence the know the position of food and it’s amount. This mechanism assists the colony to pleat food rapidly and efficiently. 3. Artificial Bee Colony Algorithm Employee bees go find food and their number is directly proportional to the amount of food. The scout bee is the superfluous food source of employed bee. The search space is searched by the bees. This information is then shared and passed on to the onlooker bees which then chose a food source in the neighborhood. In this iterative algorithm, the total number of bees, onlooker or employed is the same as number of solutions. For each and every iteration, every single bee employed will discover a source of food which is in the neighborhood of its present source of food and then assesses the measure of nectar it has i.e., appropriateness to the purpose. 1. Initializing all the values. 2. The employed bees evaluate the amount of nectar it has i.e., suitability to the purpose, by moving from one source to another 3. The onlooker bees are positioned based on the quantity of nectar attained by bees employed. 4. Scout bees discover new food sources 5. Good sources of food are memorized 6. Go to step 2, if termination criteria is not fulfilled; otherwise pause the procedure and demonstrate the best source of food found so far (Fig. 3). This algorithm firstly normalizes the database according to homogenous data after the database is prepared. Then analysis is done with the metaheuristics algorithm where GA and ABC are both implemented simultaneously and evaluated to give performance analysis. We then implement the proposed Artificial bee optimization using a comparative study with GA and back propagation to find the optimal solution.
A Review on Business Intelligence Systems Using Artificial Intelligence
1027
Fig. 3. Artificial bee optimization algorithm [4]
2.2
Port Business Intelligence Framework Design
This framework design consists of four layers namely data extraction layer, data warehouse layer, data mining layer and front layer: 1. Data Extraction layer: The port production system takes its data from huge oracle databases and stores it. Data extraction simply selects the source of the data, cleans the empty and duplicate data, it clears any errors that exists and makes the data reliable and more accurate. 2. Data Warehouse layer: Data mining of the stored data is carried on along with the cleaning of data. A warehouse model is build which consists of fact and dimension model. 3. Data mining layer: This will be integrated to the data warehouse layer using the clustering analysis method. The main idea of this method is to group similar things based on the similarity variable attributes on the foundation of multidimensional space information. 4. The front layer: It divides the business reports and data analysis into customer policies and financial risk control. According to this analysis, it gives direction and a path of collaboration for the business executives (Fig. 4).
1028
A. Jain et al.
Fig. 4. Port business intelligence system [6]
3 Inference The above values have been obtained from [4]. On comparing the genetic algorithm with the artificial bee algorithm, we have obtained Table 1. ABC works better with True variables whereas genetic algorithm works better with false variables. Table 1. Comparison of genetic algorithm and artificial bee colony Analysis variable True +ve False +ve False -ve True +ve rate False +ve rate
Genetic algorithm Artificial bee colony 233 374 237 75 274 127 0.47 0.74 0.18 0.15
Swarm intelligence has very less constraints on its applications and hence used by the U.S. military for supervising unmanned vehicles. It is also now researched by NASA for its implementation in planetary mapping. Port business decision making on the other hand is restricted by external factors, except when it is under the stimulus of internal data. Formation of heterogenous data like data, audio, text, and external enterprise web page information may also be different.
A Review on Business Intelligence Systems Using Artificial Intelligence
1029
4 Conclusion The paper summarizes and concludes that different frameworks in the business systems are suitable according to different requirements. The artificial bee algorithm gives better results in the true positive rate and the predicted values proving that computers can exceed human intelligence level. When general algorithm and artificial bee colony are compared, ABC gives us better true positive and and positive predicted values. Business intelligence frameworks implementing artificial bee algorithms should be implemented to increase the optimization of results in the future. In port intelligence which consists of four main layers, after extraction, warehousing and mining of data is done, the front layer divides the business reports and data analysis into customer policies and financial risk control. According to this analysis, it gives direction and a path of collaboration for the business executives. Thus each of these frameworks help us in a different way to analyze the data and give the required predictions. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Quan, X.I., Sanderson, J.: Understanding the artificial intelligence business ecosystem. IEEE Eng. Manage. Rev. 46(4), 22–25 (2018) 2. Kiruthika, J., Khaddaj, S.: Impact and challenges of using of virtual reality & artificial intelligence in businesses. In: 2017 16th International Symposium on Distributed Computing and Applications to Business, Engineering and Science (DCABES), Anyang, pp. 165–168 (2017) 3. Bai, S.A.: Artificial intelligence technologies in business and engineering (2011) 4. Anusha, R., Nallaperumal, K.: Business intelligence: an artificial bee colony optimization approach. In: 2014 IEEE International Conference on Computational Intelligence and Computing Research, pp. 1–4. IEEE, December 2014 5. Wu, J.Y.: Computational intelligence-based intelligent business intelligence system: concept and framework. In: 2010 Second International Conference on Computer and Network Technology, pp. 334–338. IEEE, April 2010 6. Lei, H., Yifei, H., Yi, G.: The research of business intelligence system based on data mining. In: 2015 International Conference on Logistics, Informatics and Service Sciences (LISS), pp. 1–5. IEEE, July 2015 7. Denić, N., Nešić, Z., Radojičić, M., Vasović, J.V.: Some considerations on business intelligence application in business improvement. In: 2014 22nd Telecommunications Forum Telfor (TELFOR), pp. 1142–1145. IEEE, November 2014 8. Gang, T., Kai, C., Bei, S.: The research & application of business intelligence system in retail industry. In: 2008 IEEE International Conference on Automation and Logistics, pp. 87–91. IEEE. September 2008
1030
A. Jain et al.
9. Muthusamy, V., Slominski, A., Ishakian, V.: Towards enterprise-ready AI deployments minimizing the risk of consuming AI models in business applications. In: 2018 First International Conference on Artificial Intelligence for Industries (AI4I), pp. 108–109. IEEE, September 2018 10. Qihai, Z., Tao, H., Tao, W.: Analysis of business intelligence and its derivative-financial intelligence. In: 2008 International Symposium on Electronic Commerce and Security, pp. 997–1000. IEEE, August 2008 11. Valter, P., Lindgren, P., Prasad, R.: Artificial intelligence and deep learning in a world of humans and persuasive business models. In: 2017 Global Wireless Summit (GWS), pp. 209– 214. IEEE, October 2017 12. Rosenberg, L., Pescetelli, N.: Amplifying prediction accuracy using swarm AI. In: 2017 Intelligent Systems Conference (IntelliSys), pp. 61–65. IEEE, September 2017
Image Completion of Highly Noisy Images Using Deep Learning Piyush Agrawal and Sajal Kaushik(B) Bharati Vidyapeeth’s College of Engineering, New Delhi, India [email protected]
Abstract. Generating images from a noisy or blurred image is an active research problem in the field of computer vision that aims to regenerate refined images automatically in a content-aware manner. Various approaches have been developed by academia and industry which includes modern ones applying convolution neural networks and many other approaches to have more realistic images. In this paper, we present a novel approach that leverages the combination of capsule networks and generative adversarial networks that is used for generating images in a more realistic manner. This approach tries to generate images which are locally and globally consistent in nature by leveraging the low level and high-level features of an image. Keywords: Capsule networks · Generative Adversarial Networks Convolutional Neural Networks
1
·
Introduction
Generating images from highly noisy images is an active research problem. Various Deep learning techniques both convolutional and non-convolutional Networks have been previously used by the academia and industry to tackle this Problem. With the advent of Generative Adversarial Networks it is Possible to generate images from highly noisy images. An architecture of the generator and the discriminator is very important in determining the results. The training stability and performance heavily depends on architecture. Various techniques such as batch normalization, Stacked Architecture and multiple generators and discriminators. Generative Adversarial network proved to be a game changer in deep learning. It uses two components, Generator and discriminator. The Generator inputs a random data and outputs a sample of data. Then the generated data is fed into the Discriminator Network which compares it with the real data which the discriminator have and then determines whether it is real or fake. Ultimately Generator tries to minimize the Min-Max function and discriminator does opposite of it by maximizing the function. DCGAN’s (Deep Convolutional Generative Adversarial Networks) uses deep convolutional layers in addition to the basic generative and discriminative network models. DCGAN have high similarity to Gans, but specifically focuses on c Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 1031–1043, 2020. https://doi.org/10.1007/978-3-030-37218-7_108
1032
P. Agrawal and S. Kaushik
leveraging Deep Convolutional networks in place of those fully-connected networks. Spatial Correlations are determined by the conv-nets in various areas of the images. DCGAN would likely be more fitting and accurate for image datasets, whereas the acceptability of the gans over the large domains. This paper presents a brief comparison between capsule gans and dcgans architecture. There are various non learning techniques which try to encapsulate neighboring pixels information and thus to regenerate images from noisy images. Various variants of gans such as Conditional Generative Adversarial Networks, Laplacian Pyramid of Adversarial Networks, Deep Convolutional Generative Adversarial Networks, Adversarial Auto encoders, Information Maximizing Generative Adversarial Networks and Bidirectional Generative Adversarial Networks played a major role in image generation. Motivation behind this paper is to experiment with the results of capsule networks and gans to remove noisy images. This Paper uses capsule Gans to somehow recover from the disadvantages of the Gans. Converging of gans is a big challenge with respect to the state of the art problems of reconstructing images. Combination of capsule networks and generative adversarial networks may give higher accuracy. The paper is organized as: The Related Work has been described in Sect. 2. Further, Sect. 3 describes the methodology of the approach. Section 4 shows our experiment process, describes about the different datasets used and how they are trained. Section 5 shows the results of the model and graphs generated. And lastly, Sect. 6 concludes the paper and discuss about the results.
2
Related Work
Various Methods have been developed previously in order to complete the noisy blurred Images [1]. GANs proved to be the most influential one to Complete the Blurred Images [2]. Before Gans, Convolutional Networks played a Major role in Classifying and Generating Images [3,4]. Various Methods such as patch Matching and various Non-Learning Approaches such as that rely on propagating appearance information from neighboring pixels to the target region using some mechanisms [5]. Image reconstruction from highly noisy images is an active research problem in the area of computer vision [?]. Basic image processing techniques have been used but the results are not much satisfactory. Deep learning approaches leverages out the obtained correlation between the various patterns obtained from noisy images and real images. 2.1
Generative Adversarial Networks
The Generative Adversarial network model uses latent variables z that used to generate the data that needs to belong to the data x [6–8]. The probability (distribution) of generator pg, is defined over the data with a prior on p(z). The Generator function G(z, g) represents a mapping from the latent variable space to the space of the data. The Discriminator function D(x, d) maps the data space
Image Completion of Highly Noisy Images Using Deep Learning
1033
to a single scalar which represents the probability that x belongs to the actual dataset x or to pg. The Mapping of the data to a single scalar that represents whether the data belongs to the actual dataset x or to p(g) The Training of discriminator is to maximize the probability of classifying an image whether coming from an generator or actual datasets. P (G) = E(x)[LogD(x)] + E(z)[log(1 − D(G(z))]
(1)
The Eq. 1 E(x) represents the expectation of that point in the real distribution, D(x) represents the output of that point in the discriminator. Z is the latent which is fed to the generative adversarial network. G(z) the output of the variable z from the generative networks is passed to the discriminator as D(g(z)). 2.2
DCGAN
Deep Convolutional Generative Adversarial Network is a class of Convolutional Neural Networks which improves the performance of Gans and make Gans more stable. It is a Technique to generate more realistic images from the noisy images. DCGAN leverages the convolutional networks to get the spatial relationship between various parts of the images and thus use this information to improve the image generation using generative adversarial Networks [9]. DCGANS works better for large-scale scene understanding data. Original Gans are difficult to scale using Convolutional neural networks, thus DCGAN proposed an optimized network for Image Generation. 2.3
Conditional GAN
Conditional Gan is an extension of standard gan which generates samples conditioned on specific variables. An additional variable in the model, such that we generate and discriminate samples conditioned on that variable [10]. Main aim of the generator is to generate samples that are both realistic and match the context of the additional variables. 2.4
Info GAN
For image completion [11], several modifications such as optimal patch search have been proposed [12,13]. In particular [14], proposed a global-optimization based method that can obtain more consistent. 2.5
WGAN
With several reasons to support, GANs are very [difficult] to train. Wasserstein GAN (WGAN), an alternative function [15,16] shows that the convergence properties are better when compared with standard value as a result of value
1034
P. Agrawal and S. Kaushik
function. This work uses the Wasserstein distance W(q, p), q and p being distributions, which can be understood as the lowest cost to move mass when the q is transformed to p. The value function is: x)] min max Ex∼Pr [D(x)] − Exˆ∼Pg [D(ˆ G D∈D
(2)
Above different approaches has been used differently approaches have been studied to improvise the state of the art models which can solve the presented problem.
3 3.1
Approach Capsule Gans
The paper proposes an architecture for creating these capsules which will replace the neurons in a neural network form image generation. The capsules of nested neurons. Vj is the activation vector outputted by layers. Vjis defined such that its length describes the probability of the existence of the entity represented by its capsule. The longer ones are valued a little under 1 and the shorter ones are squashed near zero with the help of the nonlinear activation function:vj =
sj ||sj ||2 2 1 + ||sj || ||sj ||
(3)
Where Vj is the output vector and Sj is the input of the activation function. Let Ui be the output of the previous capsule layer, which acts as the input for the current layer j. Now, the outputs of the previous layers are multiplied by a weight matrix Wij which results in what is called a prediction vector Uj|i . Now, this prediction vector is then multiplied by the coupling coefficients between the capsules of the layers ‘i’ and ‘j’: Cij , to give the resultant Sj . Here, the sum of all coupling coefficients for a capsule is equal to 1. It is determined by the process of ‘routing softmax’ where the logits, bij are the log probabilities that a capsule ‘i’ couples with a capsule ‘j’. These log priors are then learned discriminatively at the same time as that of the other parameters. The coupling coefficients are then learned iteratively by monitoring the agreement between the capsules of the different layers. This agreement is measured by the scalar multiplication of the output of the capsule of the given layer (Vj ) and that of the input prediction vector coming from the previous layer (Uj|i ). This value of aij is treated as log likelihood and is added to the initial value of bij . This is then used to calculate the new values of coupling coefficients using the routing softmax method. Let ui be the output of the previous capsule layer, which acts as the input for the current layer j. Now, the outputs of the previous layers are multiplied by a weight matrix Wij which results in what is called a prediction vector uj|i . Now, this prediction vector is then multiplied by the coupling coefficients between the capsules of the layers ‘i’ and ‘j’: cij , to give the resultant sj .
Image Completion of Highly Noisy Images Using Deep Learning
1035
Here, the sum of all coupling coefficients for a capsule is equal to 1. It is determined by the process of ‘routing softmax’ where the logits, bij are the log probabilities that a capsule ‘i’ couples with a capsule ‘j’. These log priors are then learned discriminatively at the same time as that of the other parameters. The coupling coefficients are then learned iteratively by monitoring the agreement between the capsules of the different layers. This agreement is measured by the scalar multiplication of the output of the capsule of the given layer (vj ) and that of the input prediction vector coming from the previous layer (uj|i ). exp(bij ) cij = k exp(bik )
(4)
This value of aij is treated as log likelihood and is added to the initial value of bij . This is then used to calculate the new values of coupling coefficients using the routing softmax method. 3.2
Network Architecture and Implementation
This experiment consists of two architectures, first the Generator [refer] which is responsible for generating images from the noise and Discriminator based on Capsule Networks [17] which validates the generated images. The generator is inspired from DCGAN framework [18] consists of 5 convolutional layers. These layers work for up-sampling a 100-dimensional vector whose each element is sample independently with 1 standard deviation on a normal distribution of mean 0. To regularize the network, batch normalization [19] has been applied in between layers so that stability could increase. Except for final layer, rectified linear units [20] are used as nonlinearity between all layers and the output is applied with nonlinearity prior to batch normalization. In the end, ‘tanh’ [21] as activation is used (Fig. 1). Figure 1: DCGAN generator used for LSUN scene modeling.
Fig. 1. Generator architecture [9]
1036
P. Agrawal and S. Kaushik
The discriminator of the model is based on capsule network which comprises of dynamic routing. The architecture has final output as the 16-dimensional capsule. Input layer consists of 64 × 64 dimension which is then passed to 18 × 18 kernels resulting in 256×24×24 layers, which is then passed to another kernel of size 18 × 18 and the resulting layer on dynamic routing gives one 16-Dimensional capsule. This process of dynamic routing occurs between the output digit capsule and the primary layer of the capsule. An illustration is provided below (Fig. 2).
Fig. 2. Discriminator architecture
4 4.1
Experiments Dataset
In this work, the model prepared is examined on several datasets. The motive of the examination is to evaluate the model on different datasets such that how accurate they can complete images from noisy images. The developed model is evaluated on three different datasets in this work. These three datasets are: CelebA [22], the Street View House Numbers (SVHN) [23], Facescrub [24]. The CelebA dataset consists of more than 200,000 images each having 40 different attributes. There are 10,177 unique identities in the dataset. Having so many unique identities, this dataset played very crucial role in understanding of the facial attributes on different postures for the model. The other dataset, Facescrub with over 100,000 images of more than 500 unique faces is also focused on improving facial recognition research. The high quality images and manually cleaned images played a very important role for training and providing better
Image Completion of Highly Noisy Images Using Deep Learning
1037
diversity. From both datasets, images were cropped and resized to 64 × 64 to keep only the face from each image. SVHN being traned separately, the images were cropped into 32 × 32 for the input to the model. 4.2
Training Process
Data Preprocessing: The two datasets, CelebA and Facescrub are used together to train the model while the SVHN was trained separately. For former combination, Openface [25] is used to identify the facial area and further crop the image. It uses the mode called “outerEyesAndNose” to crop and then save the aligned images. The pre-trained model of Openface on OpenCV (Open Computer Vision) library was used for this purpose. For SVHN, the version with cropped images of 32 × 32 dimension is used directly. After cropping and resizing, the images are stored in a single object file. This object file is nothing but the Numpy[] object generated using pickle[] module. This is done to prevent processing of images on each re-iteration of the batch in a limited memory system. Each image once stored to the object file then this object file is directly used to load the uncompressed form of the images directly into memory without pre-processing then to un-compress. The images are read in the batch of 32 images from this object file. Overall, This helped the model to alleviate the training process by minimizing the time to process image pixels and convert them to numpy array every-time when we train. Training: The actual training consists of loading the object file of stored images, and reading the images from it in the batches of 32. The generator is trained by providing the batch of 64 noises [26,27]. 4.3
Visual Comparison
We compared the Capsule Network [17] based GAN with the DCGAN [11] based Semantic Inpainting [28,29] and the results are quite interesting to compare how good both models are for image generation task. Below is the results obtained by DCGAN model on Facescrub dataset. The results are based on Semantic Inpainting. The amount of noise sent to the generator is pretty much high. Due to the above fact it is expected that the results may degrade further. As shown in Fig. 6 capsgan on celebA performed with a very low accuracy of generating the image. Capsule networks leverages the information of capsules and due to lots of feature mixing the information at each [11] (Figs. 3, 4 and 5).
1038
P. Agrawal and S. Kaushik
Fig. 3. CapsGAN on Facescrub
Fig. 4. DCGAN on Facescrub
Image Completion of Highly Noisy Images Using Deep Learning
1039
Fig. 5. Noise sent to generator
Fig. 6. CapsGAN on CelebA
5
Experimental Results
As described in the training process, the model is trained on 3 different datasets. While the dataset is trained on SVHN dataset as well, our primary focus of evaluation is on the results obtained from the Celeba and Facescrub datasets. The Fig. 7 is a graph on Generator loss for the epochs.
1040
P. Agrawal and S. Kaushik
Fig. 7. Generator loss versus epochs
The facescrub dataset is trained on NVIDIA GTX 1060 6 GB GPU until 30 epochs where each epoch is trained for approximately 1000 batches of images from the dataset. While the CelebA [?] Instead of choosing the epochs arbitrarily, they were chosen by visual result convergence. Figure 7 shows the results from generator which were calculated on loss values over the 15000 epochs. Our Generator is based on the architecture provided here in the Fig. 1. Figure 8 below shows the results obtained from loss values of Discriminator. This discriminator is based on the Capsule Network [17] whose implementation is presented here. Generator basically trying to generate the images from noisy images. Figure 7 shows a uniform rise at regular intervals of loss which is quite evident in terms of the model’s robustness. The peak’s of the loss is gradually decreasing which indicates the model is learning gradually. Due to higher number of epochs, the time taken to train the model is around 3 days. The generator and discriminator follows a parallel mechanism due to which training times increases exponentially. Discriminator Accuracy is constant most of the epochs. Minor reductions in the accuracy is due to the robustness of the model. Independently the Cumulative mean of the loss is increasing which proves as the data increases on the training side the models capability to learn the pattern decreases.
Image Completion of Highly Noisy Images Using Deep Learning
1041
Fig. 8. Discriminator loss and accuracy versus epochs
6
Discussion and Extension
General Adversarial Networks are powerful tools and its variant DCGAN is very popular with different other variants launch recently including infoGAN [30]. Inspired from the success of Capsule Networks [17] over a conventional CNN, this work involved using Capsule Network with GANs as a discriminator on different datasets to examine how the results are different from one of the popular method, DCGAN. The results are generated on different datasets of public faces in this work. We hope that our work helps out future researches to improve Generative models allowing computers to automate the process of Art making. We (authors) declare that there is no conflict of interest. No humans/animals involved in this research work. We have used our own data.
References 1. Lagendijk, R.L., Biemond, J., Boekee, D.E.: Identification and restoration of noisy blurred images using the expectation-maximization algorithm. IEEE Trans. Acoust. Speech Signal Process. 38(7), 1180–1191 (1990)
1042
P. Agrawal and S. Kaushik
2. Gregor, K., Danihelka, I., Graves, A., Wierstra, D.: DRAW: a recurrent neural network for image generation. CoRR abs/1502.04623, arXiv:1502.04623 3. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012) 4. Im, D.J., Kim, C.D., Jiang, H., Memisevic, R.: Generating images with recurrent adversarial networks. CoRR abs/1602.05110, arXiv:1602.05110 5. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. CoRR abs/1311.2901, arXiv:1311.2901 6. Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial networks. CoRR abs/1602.05110. https://arxiv.org/abs/1406.2661 7. Theis, L., van den Oord, A., Bethge, M.: A note on the evaluation of generative models (2015). arXiv:arXiv:1511.01844 8. Dosovitskiy, A., Springenberg, J.T., Brox, T.: Learning to generate chairs with convolutional neural networks, CoRR abs/1411.5928, arXiv:1411.5928 9. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks, CoRR abs/1511.06434. arXiv:1511.06434 10. Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 11. Amos, B.: Image Completion with Deep Learning in TensorFlow. http://bamos. github.io/2016/08/09/deep-completion 12. Gorijala, M., Dukkipati, A.: Image generation and editing with variational info generative adversarial networks. arXiv preprint arXiv:1701.04568 13. Sun, J., Yuan, L., Jia, J., Shum, H.-Y.: Image completion with structure propagation. In: ACM Transactions on Graphics (ToG), vol. 24, pp. 861–868. ACM (2005) 14. Darabi, S., Shechtman, E., Barnes, C., Goldman, D.B., Sen, P.: Image melding: combining inconsistent images using patch-based synthesis. ACM Trans. Graph. 31(4), 82-1 (2012) 15. Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN. arXiv preprint arXiv:1701.07875 16. Nagarajan, V., Kolter, J.Z.: Gradient descent GAN optimization is locally stable. In: Advances in Neural Information Processing Systems, pp. 5585–5595 (2017) 17. Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules, CoRR abs/1710.09829. arXiv:1710.09829 18. Su´ arez, P.L., Sappa, A.D., Vintimilla, B.X.: Infrared image colorization based on a triplet DCGAN architecture. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 212–217. IEEE (2017) 19. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 20. Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of ICML, vol. 30, p. 3 (2013) 21. Wazwaz, A.-M.: The tan h method: solitons and periodic solutions for the DoddBullough-Mikhailov and the Tzitzeica-Dodd-Bullough equations. Chaos Solitons Fractals 25(1), 55–63 (2005) 22. Liu, Z., Luo, P., Wang, X., Tang, X.: Large-scale CelebFaces Attributes (CelebA) dataset (2018). Accessed 15 Aug 2018
Image Completion of Highly Noisy Images Using Deep Learning
1043
23. Goodfellow, I.J., Bulatov, Y., Ibarz, J., Arnoud, S., Shet, V.: Multi-digit number recognition from street view imagery using deep convolutional neural networks. arXiv preprint arXiv:1312.6082 24. Ng, H.W., Winkler, S.: A data-driven approach to cleaning large face datasets. In. IEEE International Conference on Image Processing (ICIP 2014), pp. 343–347 (2014). https://doi.org/10.1109/ICIP.2014.7025068 25. Amos, B., Ludwiczuk, B., Satyanarayanan, M.: OpenFace: a general-purpose face recognition library with mobile applications. Technical report, CMU-CS-16-118, CMU School of Computer Science (2016) 26. Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X., Chen, X.: Improved techniques for training GANs. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29, pp. 2234–2242. Curran Associates, Inc. (2016) 27. Dziugaite, G.K., Roy, D.M., Ghahramani, Z.: Training generative neural networks via maximum mean discrepancy optimization. arXiv:1505.03906 (2015) 28. Yeh, R.A., Chen, C., Lim, T., Hasegawa-Johnson, M., Do, M.N.: Semantic image inpainting with perceptual and contextual losses, CoRR abs/1607.07539, arXiv:1607.07539 29. Hinton, G.E., Sabour, S., Frosst, N.: Matrix capsules with EM routing. In: International Conference on Learning Representations (2018). https://openreview.net/ forum?id=HJWLfGWRb 30. Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: InfoGan: interpretable representation learning by information maximizing generative adversarial nets. CoRR abs/1606.03657, arXiv:1606.03657
Virtual Vision for Blind People Using Mobile Camera and Sonar Sensors Shams Shahriar Suny(&), Setu Basak, and S. M. Mazharul Hoque Chowdhury Department of Computer Science and Engineering, Daffodil International University, Dhaka, Bangladesh {shams.cse,setu.cse,mazharul2213}@diu.edu.bd
Abstract. In this modern era of technology, almost all the real life challenges are solved by using the innovative technological solutions. We know there are many blind people, who still remain outside from the touch of modern and digitized technologies. Blind people cannot get proper direction for their destination path and it is still hard to find out what object that he/she will interact with. The proposed model will not only help in their easy movement but also give them a virtual vision. In this implementation we will use NodeMCU and mobile camera which will be guided by OpenCv library that uses Python to detect an object. It will also analyze and report the object type to the user, which will create a virtual vision to the user. On the other hand, there will be some sonar sensors which will include vibration motor to help the blind persons to sense that there is an obstacle ahead in that direction. The future work of this project will provide the blind person a proper direction to his/her way using Google maps. Keywords: Virtual vision YOLO Physically challenged Mobile camera Sensors
1 Introduction There are many visually impaired individuals around us. They have confronted numerous issues to move from one spot to another. The proposed model will assist them by detecting the hindrance present before them. Then again, they can likewise comprehend what is the name of the article that before them. So they can have a virtual vision with deterrent evasion framework. In this paper we have presented a project that will help the blind people by giving them a virtual vision. This project has been created by the picture handling innovation and Arduino. Picture preparing will take the video picture from the camera, at that point it will discover the article before it is analyzed by Yolo model in python OpenCv. It has likewise capacity to recognize different item from the video feed. Yet, we keep it just one for the particular item acknowledgment which is nearer to the camera. In this adaptation we have utilized versatile camera which associate with the picture handling server remotely. Then again, this undertaking will likewise assist the visually impaired individuals with overcoming the obstacle before them by hearing the notice sound. At first there is just a single sensor has been © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 1044–1050, 2020. https://doi.org/10.1007/978-3-030-37218-7_109
Virtual Vision for Blind People Using Mobile Camera and Sonar Sensors
1045
utilized. In any case, it very well may be executing in various edge so it can likewise identify if there is some enormous gap before the client or not. So at once they can have a virtual vision with along to an impediment evasion framework.
2 Literature Review Jain along with his team members have worked on a hand gloves that can help a visually hampered person [1]. This glove had a camera included which connected to a raspberry pi device. The camera had object detection feature. Which will help the user to know the object that in front of the gloves. The author used DNN method to detect the object. There were also five vibrators which can help users to get direction to the object. Similar work that can be found in [2] where the Arora et al. have discussed about the measurement of the blind people’s problems in everyday life. So they developed some kind of device that can be wearable. This work also implements the real time object detection system. So that the user can know the object in front of it. The author used same DNN method to detect the object. It also used combination of single shot-multiple box detection technology. This work also include speech to tell the user what is closer to him. Another work for the blind people where Frankel have tried to find all the object from an image that is previously captured by the camera [3]. In this case mobile camera is used for taking the pictures. Then the pictures are passed through the algorithm to make a 3D representation. This 3D representation will help to locate all the objects with all the possible angles. After the process of the image is finished then the user can get the image from it. A touch screen will provide to touch in the picture and if there is any object that detected by the algorithm. Bai along with his team members tried to implement both obstacle avoidance and object detection. This device has a camera that attached to a sunglass but the user need to hold on the main device. This system used CNN method to detect the object in front of it. The author trying to make an obstacle avoidance with the camera also. The camera they used RGB-D which used to take image and then it sends it to the device to process and result. Then the user can get acknowledged about the obstacle. User get the obstacle in-front of him/her then the system will beep to alert the user. Islam et al. have compared many systems variation [5]. There are wide details that display over a table. All the sensors and actuators that needed for the projects that are listed nicely. Hogle and his team describes about a patent of a new wearable device that can connect wirelessly to help visually hampered people [6]. In this work camera will take the pictures of the current state and send them to the connected server. The outcome of the process will come to the device and make the user know what type of object that the user facing right now. It has also voice command control so that the user can interact with the device properly. Rodríguez et al. Have presented an assisting system for blind people that will help them avoiding the obstacles [7]. José along with his team proposed a navigation system for blind people [8]. They have proposed a model that will help the user navigating the path with different types of algorithms. Giudice and Legge have discussed about different sensory devices, optical technologies to assist the blind people [9]. They have proposed a method and also compared different models. Cardin et al. Have proposed a technique for the visually challenged people [10]. They proposed a wearable obstacle
1046
S. S. Suny et al.
detection device. Shoval along with his team proposed a mobile robot to assist the blind people to move along like a normal human [11]. They trained the system and then made a nave belt to avoid obstacles. Lim et al. Have proposed a vest for blind people to avoid obstacles in front of user [12]. Pradeep and his team have used robotic vision to help the blind people [13]. Borenstein and his team proposed a navbelt to assist blind persons in moving around easily [14]. Borenstein et al. Proposed a framework guideCane for blind people to move easily [15].
3 Methodology At the beginning, the camera will begin functioning not surprisingly. Then the continuous video feed will go into the python library called “Yolo”. The library will get the picture feed and searches for the information, which then sends us the recognized item name. At that point the name will be passed into the server to refresh the information. The second part is the server part, which will deal with the information among camera and server and furthermore with primary NodeMCU board to the server. It will remains as the fundamental correspondence connect between two channels. The model will dependably stay up with the latest information obtained from the client. If the client needs to think about the item, at that point the server refreshes its information. So all the associated parts will be refreshed. The last part of this flow is the main NodeMCU board. Some sensors are connected with the main board but other sensors are associated with the principle board. They will take constant incentive from the required sensors (Fig. 1).
Fig. 1. Work flow of the system
4 Design and Implementation This is a system with both IOT and the image processing model but we have different elements. At first let’s take a look on the proposed system design. The mobile camera which will be connected wirelessly to the trained database mode. The mobile camera will grab the image from the video feed and sent it the
Virtual Vision for Blind People Using Mobile Camera and Sonar Sensors
1047
python library. Then the library will identify the object. Initially the python is connected to the computer, so if the camera detects any object, it will be informed to the user by voice. If the user has pressed the active state button. In the main board, it will connect to the Wi-Fi. So that it can connected to the internet to initiation an API communication. It will continuously check for the data arrival. Whenever the user presses the button to change the state, it means that he user wants to know the object type additionally. We can see all these from Fig. 2.
Fig. 2. Full system design
This is the primary stage assignment of the venture camera working rule that appeared in Fig. 3. Then again, it is an IP based camera. So we can get the live stream on what the camera has detected. The camera must be associated with the system from where we are additionally associated with the NodeMCU board. So the camera will take the video feed and after that it will send it to the Yolo python library. Then the object can be detected by the mobile phone.
Fig. 3. Object detection.
The server will be controlled by the phpMyadmin database. An API hit will originate from the camera with the information observed by camera. At that point the server information will have refreshed with it. Figure 4 demonstrates that. Be that as it may, then again there will be other solicitation as well. To the primary board end, there will be catch that will press by the client. So the API call will proceed with check
1048
S. S. Suny et al.
whether the condition of the camera change or not. In the event that the client presses the catch and need to comprehend what is on front of them, at that point an API hit will go to the server and change its condition. In this point the camera will proceed with a hunt on whether the client needs to comprehend on the identified object. When the camera detects the front object. It will continuously tell the name of that particular object.
Fig. 4. Server connection
The Fig. 5 shows the detailed design process. In this case we are going to use NodeMCU which is a nice replacement of the Arduino. Because we need another WiFi module that will add to the Arduino. And the configuration of this is pretty much complex. So that we have used the NodeMCU. Because it has an integrated Wi-Fi module in it. On the other hand, it is small in size.
Fig. 5. Main board connection
5 Results and Accuracy With the information obtained from the image, the Yolo model will execute it with python algorithm. So the portable camera’s information is shared in response to popular URL demand. At that point the item identification model will pre display the information. Hence we can get this kind of picture where the items are appeared with the outskirts. The recognized items are combined into a cluster. What’s more, the
Virtual Vision for Blind People Using Mobile Camera and Sonar Sensors
1049
cluster request will be the nearest article recognized. That implies which article is greater, so it will identify first at that particular point. On the off chance that there is no part distinguished from the picture, at that point it will pass the void exhibit as indicated below. Figure 6 represents the detection of different objects. So as far as all the possible scenario that we have tested. We can get all the possible outcomes successfully. We have tested several of them and get at most 90% accuracy in object detection. As objects are larger, it will become closer to the first position of the array.
Fig. 6. Mobile camera object detection [7]
6 Conclusion At the end of the all discussion we can see that, this prototype device can easily detect objects and it can also avoid obstacles successfully. At this age there is only limited work for the blind or physically hampered people. So that we are trying to make something that can surely help and give them a better life. The proposed model will enhance the quality of life of visually impaired people to an unprecedented rate. There is much more future scope of this project to make an easier life for the blind people. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest. ✓ No humans/animals involved in this research work. ✓ We have used our own data.
1050
S. S. Suny et al.
References 1. Jain, S., Varsha, S.D., Bhat, V.N., Alamelu, J.V.: Design and implementation of the smart glove to aid the visually impaired. In: 2019 International Conference on Communication and Signal Processing (ICCSP) (2019) 2. Arora, A., Grover, A., Chugh, R., Reka, S.: Real time multi object detection for blind using single shot multibox detector. Wirel. Pers. Commun. 107, 651 (2019) 3. Frankel, M.: Systems and methods for blind and visually impaired person environment navigation assistance. US20190026939A1. https://patents.google.com/patent/US20190026 939A1 4. Bai, J., Liu, Z., Lin, Y., Li, Y., Lian, S., Liu, D.: Wearable travel aid for environment perception and navigation of visually impaired people. Electronics 8, 697 (2019) 5. Islam, M.M., Sadi, M.S., Zamli, K.Z., Ahmed, M.M.: Developing walking assistants for visually impaired people: a review. IEEE Sens. J. 19(8), 2814–2828 (2019) 6. Hogle, R., Beckman, R.: Object detection, analysis, and alert system for use in providing visual information to the blind. US20190070064A1. https://patents.google.com/patent/ US20190070064A1 7. Rodríguez, A., Yebes, J.J., Alcantarilla, P.F., Bergasa, L.M., Almazán, J., Cela, A.: Assisting the visually impaired: obstacle detection and warning system by acoustic feedback. Sensors 12, 17476–17496 (2012). https://doi.org/10.3390/s121217476 8. José, J., Farrajota, M., Rodrigues, J.M.F., du Buf, J.M.H.: The SmartVision local navigation aid for blind and visually impaired persons. JDCTA (2011) 9. Giudice, N.A., Legge, G.E.: Blind navigation and the role of technology. In: The Engineering Handbook of Smart Technology for Aging, Disability, and Independence. Wiley (2008) 10. Cardin, S., Thalmann, D., Vexo, F.: Wearable obstacle detection system for visually impaired people. In: Haptex 2005 (2005) 11. Shoval, S., Borenstein, J., Koren, Y.: Mobile robot obstacle avoidance in a computerized travel aid for the blind, 1050-4729194 $03.00 0 1994 IEEE (1994) 12. Shin, B.-S., Lim, C.-S.: Obstacle detection and avoidance system for visually impaired people. In: Oakley, I., Brewster, S. (eds.) HAID 2007. LNCS, vol. 4813, pp. 78–85. Springer, Heidelberg (2007) 13. Pradeep, V., Medioni, G., Weiland, J.: Robot vision for the visually impaired, 978-1-42447030-3/10/$26.00 ©2010 IEEE (2010) 14. Shoval, S., Borenstein, J.: The Navbelt—a computerized travel aid for the blind based on mobile robotics technology, 0018–9294/98$10.00 ã 1998 IEEE (1998) 15. Borenstein, J., Ulrich, I.: The GuideCane - a computerized travel aid for the active guidance of blind pedestrians. In: Proceedings of the 1997 IEEE International Conference on Robotics and Automation Albuquerque, New Mexico, April 1997
Study on Performance of Classification Algorithms Based on the Sample Size for Crop Prediction I. Rajeshwari1(&) and K. Shyamala2 1
2
Computer Science, Queen Mary’s College, Chennai, Tamilnadu, India [email protected] Computer Science, Dr. Ambedkar Government Arts College, Chennai, Tamilnadu, India [email protected]
Abstract. An important issue in agricultural planning is the crop selection for cultivation. This work presents an approach which uses Data Mining classification techniques to predict the crops to be cultivated based on the availability of soil nutrient. The dataset considered in this work has been taken from the soil test centres of Dindigul district, Tamilnadu. The parameters are the various nutrients present in the soil samples collected from the different regions of Dindigul district. The properties of soils are analyzed to find the crop suitable for cultivation in the region. The dataset is taken in 9 different sample sizes. Seven different classification algorithms are deployed on these soil datasets. The performance of the algorithms are analyzed based on certain metrics and by using the Confusion matrix and Classification report are used for analysis. The Decision Tree is found to be the best algorithm for the soil dataset. Keywords: Crop prediction KNN
Classification Decision tree Naïve Bayes
1 Introduction Agriculture plays an important role in Indian economy. Although selection of crop for cultivation has been linked to a number of factors such as soil, weather and water, this work investigated the soil nutrient availability. The aim of this work was to determine the classification algorithm suitability for difference sizes of data sets. As nutrition is important for the human health, soil nutrients are important for plant growth. Proper soil nutrition will prevent nutrient-related plant stress and crop losses through pests, diseases, and poor post-harvest quality. From the soil, the roots of a plant receive nutrients which help it grow. Knowledge about soil health and its maintenance is very important for sustaining crop productivity [1]. A soil test gives a brief idea about soil structure and mineral compositions rations. Indian Government has established soil test centers at every district head quarters [2]. Soil testing is a chemical method for estimation of nutrient supplying power of a soil or soil fertility evaluation [3]. The soil test provides information about the various chemical properties such as EC, pH, along with micro nutrients N, P, K and macro nutrients Zn, B, Cu, Fe [4]. © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 1051–1058, 2020. https://doi.org/10.1007/978-3-030-37218-7_110
1052
I. Rajeshwari and K. Shyamala
The main aim of the paper is to assist farmers in selecting crop for cultivation. Data mining classification techniques are applied over the soil nutrient data sets to predict the crops to be cultivated. For this, the given dataset is divided into two sets., viz., a training set and test set. The training sets are randomly sampled from the dataset. The remaining form the test set. The classifier is built in the learning step by the classification model by analysing the training set. Prediction of the class labels are done in the classification step. The predictive accuracy of a classifier is estimated from the test set. The percentage of test sets that are correctly classified by the classifier is the accuracy of a classifier. The best way to achieve higher accuracy is to test out different algorithms. The best one can be selected by cross-validation. Classification algorithms like decision tree, KNN, Linear SVM, Kernal SVM, Logistic regression, Naive Bayes and Random forest are deployed on the soil datasets and their performance are analysed based on certain metrics like Accuracy score, Cohen’s Kappa, Precision, Recall And FMeasures, Hamming Loss, Explained Variance Score, Mean Absolute Error, Mean Squared Error and Mean Squared Logarithmic Error. The Confusion matrix and Classification report are used for analysis. The main problem is comparative study of these classification algorithms, by applying them on soil datasets with 9 different sizes. The organization of this paper is as follows. In Sect. 2, the literature review conducted is described and the study to be done is introduced. Section 3 describes the dataset collection and its preprocessing activities. Section 4 discusses the results of the classification algorithms for predicting the crop to be cultivated. Finally, Sect. 5 includes conclusion and future work.
2 Literature Review Deshmukh et al. [5] analysed and compared the ADTree, Bayes Network, Decision Table, J48, Logistic, Naive Bayes, NBTree, PART, RBFNetwork and SMO algorithms on five datasets using WEKA. Correctly Classified Instances (CCI), Incorrectly Classified Instances (ICI), Kappa Statistic (KS), Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) were the metrics taken for analysis. The paper concluded that no single classification algorithm could provide the best predictive model for all datasets. Ramesh et al. [6] analysed and compared the performance of Navie Bayes, Bayesian Network, Naive Bayes updatable, J48 and Random Forest on soil data set based on Relative Absolute Error, Mean Absolute Error, Correctly classified instances and Cohen Kappa Statistics. They have concluded that Naïve Bayes classifier is the efficient classification technique among the remaining classification techniques. Manjula et al. [1] had used Naïve Bayes, Decision Tree and Hybrid approach of Naïve Bayes and Decision Tree for classifying the soil dataset and their performance was analysed based on the two factors, accuracy and execution time. It was concluded that the hybrid classification algorithm had performed well over the other two methods. Kumar et al. [7] analysed and compared the behavior of Naïve Bayes, J48(C4.5) and JRip on soil profile data and concluded that JRip model is the best classifier for soil sample data analysis. The performance was analysed based on correctly classified instances, incorrectly classified instances, accuracy and mean absolute error. Baskar et al. [8] had used Naïve Bayes, J48(C4.5) and JRip on samples and concluded J48
Study on Performance of Classification Algorithms
1053
model turned out to be the best classifier based on correctly classified instances, incorrectly classified instances, accuracy and mean absolute error. Ten fold validation was used. Hemageetha et al. [9] analysed Naïve Bayes, J48, Bayesian network and JRip classifiers on soil data set based on PH value to predict the suitability of soil for crop cultivation using WEKA. The metrics used were correctly classified instances, incorrectly classified instances, accuracy, Cohen Kappa Statistics and Mean Absolute Error. Ten fold validation was done. It was concluded that J48 outperformed all other classifier models. Pahwa et al. [10] analyzed and compared the behaviour of different kinds of classification algorithms on medical dataset taken from literature. Naive Bayesian, Decision Tree Induction, Multilayer Perceptron and logistic regression were applied on three different datasets. The classification techniques were compared based on the TPrate, FPrate, precision, recall, classification accuracy and the execution time in seconds. They have concluded that MLP is more accurate and better than others but it takes more time. Shah et al. [11] had compared the performance of Decision tree, Bayesian Network and K-Nearest Neighbor algorithms for predicting breast cancer with the help of WEKA, based on the parameters lowest computing time and accuracy. It was concluded that Naïve Bayes is a superior algorithm. Nabi et al. [12] had evaluated the performance of Naïve Bayes, Logistic Regression, Decision tree and Random forest for predicting the diabetic patients based on correctly classified instances, incorrectly classified instances, Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Relative Absolute Error(RAE), Root Relative squared Error (RRSE) and Confusion Matrix. It was concluded, that Logistic Regression has more correctly classified instances with low RMSE and RRSE whereas Navie Bayes has low RAE and MAE. From the review, it was found that most of the authors have used few techniques and analysed them based on few metrics. In this study more techniques are applied on datasets of 9 different sizes, and analysed on more metrics. Classification algorithms like decision tree, KNN, Kernal SVM, Linear SVM, Logistic regression, Naive Bayes and Random forest are deployed on the soil datasets and their performance are analysed based on certain metrics like Accuracy score, Cohen’s Kappa, Precision, Recall And F-Measures, Hamming Loss, Explained Variance Score, Mean Absolute Error, Mean Squared Error and Mean Squared Logarithmic Error using PYHTON. The Confusion matrix and Classification report are used for analysis.
3 Research Methodology For this work, data was collected from the soil test centers in Dindigul district. Samples from various villages covering almost the entire district were collected. Nearly 44000 samples with 12 nutrients, viz., EC, pH, N, P, K, Fe, Zn, Cu, B, OC, S and Mn were collected. The collected data was preprocessed by removing unwanted data, noisy data and blank data. Extreme values in each attributes are treated as noisy data. 35700 samples from the processed data were taken. The data was initially coded in EXCEL and finally converted to comma delimited.csv. Using Principal Component Analysis on the soil data in PYTHON, components that has up to 80% of variance were
1054
I. Rajeshwari and K. Shyamala
considered and hence the first eight components: pH, N, P, K, S, Zn, Fe and Mn were taken as principal soil components. With the 8 features, a target class was added which gives the 11 set of crops that can be cultivated. The class label is given in the number form. From this dataset, 9 datasets of different sizes, viz., 110, 220, 550, 1100, 2200, 5500, 11000, 22000 and 33000 were formed with balanced class. The various classification algorithms were applied on the datasets with 0.1% of the data taken as test data.
4 Result, Analysis and Review of Outcomes The algorithms were executed on the processed 9 datasets in.CSV (Comma Separated format) file using PYTHON and the various metrics were analysed. The performance of the algorithms were analysed in two ways, viz., the first one, analyzing how each algorithm works on the different datasets and the second one, how the various algorithms perform on each dataset. Table 1 shows the outcome of the various algorithms based on the metrics that need to be the maximum and Table 2 for the minimum metrics on the dataset of size 110. Similarly for all datasets the results were obtained. For sample size 110, the corresponding comparison charts for Tables 1 and 2 are given in Fig. 1. Table 1. Performance of various algorithms on dataset of size 110 Algorithm DT KerSVM KNN LinSVM LogREG NaiveBayes RF
ATR 1 0.384 0.586 0.596 0.556 0.98 0.96
AT 0.929 0.364 0.545 0.636 0.455 0.818 0.545
AS 0.9286 0.3636 0.5455 0.6364 0.4545 0.8182 0.5455
AM 0.9818 0.3008 0.3769 0.4587 0.4133 0.8148 0.5841
AP 1 0.41 0.65 0.53 0.41 0.82 0.48
AR 0.93 0.36 0.55 0.64 0.45 0.82 0.55
AF 0.95 0.36 0.56 0.57 0.42 0.79 0.46
EVS 0.9949 0.6085 0.4213 0.8915 0.3766 0.9532 0.6096
CKS 0.9195 0.3125 0.4811 0.5769 0.3774 0.7905 0.4860
Table 2. Performance on dataset of size 110 Algorithm DT KerSVM KNN LinSVM LogREG NaiveBayes RF
ASD 0.05455 0.13754 0.11823 0.14990 0.12247 0.13899 0.20582
MAE 0.0714 2.0000 1.5455 0.8182 1.8182 0.3636 1.6364
MSE 0.0714 6.9091 9.0000 2.0909 9.8182 0.7273 8.1818
MSLE 0.0013 0.2905 0.3374 0.0655 0.3733 0.0473 0.3948
HL 0.0714 0.6364 0.4545 0.3636 0.5455 0.1818 0.4545
ET 14 18 15 43 18 16 25
From both the charts of Fig. 1, it can be seen that Decision tree has resulted in the best metrics, followed by NaiveBayes and linear SVM. Similarly for all the datasets, the outperformed algorithms are listed in Table 3.
Study on Performance of Classification Algorithms
1055
Fig. 1. Performance of various algorithms on datasets of size 110 Table 3. Algorithms that tops in the performance based on the resultant metrics Dataset size Maximum metrics 110 DT, NB, LinearSVM 220 DT, NB, RF, KNN 550 DT, NB, RF, KNN 1100 DT, RF, NB, KNN 2200 DT, RF, NB, KNN 5500 DT, RF, NB, KNN 11000 DT, RF, KNN, NB 22000 DT, RF, KNN, NB 33000 DT, RF, KNN, NB
Minimum metrics DT, NB, LinearSVM DT, NB, KNN, kerSVM DT, NB, KNN DT, NB, RF, KNN DT, RF, NB, KNN DT, KNN, RF DT, RF, KNN DT, RF, KNN DT, RF, KNN
From the Table 3, it is clear that the decision tree algorithm results in the highest accuracies and lowest error rates for all datasets. Followed by, when the sample size is smaller i.e. lesser than 1100, Naive Bayes can be used, for larger sample size, random forest can be deployed. Figure 2 shows the performance of algorithm individually on various datasets. As far as Decision Tree is concerned, accuracies are more than 0.92, accuracies increase with the size of the datasets and reach the maximum when the size is 11000 and deteriorates then. Also it results in null error when the size is between 2200 and 22000. For Kernal SVM, as size increases, accuracy increases and error rate decreases. Maximum resultant accuracy is less than 0.8. Linear SVM results in accuracies less than 0.63, and there is no direct correlation between the dataset size and accuracies. It gives a better result when the size is 11000. It is found that for KNN, the performance increases with the size and the accuracies less than 0.95. Logistic Regression results in accuracies less than 0.62. It shows a good performance when the sample size is 5500. Naïve Bayes performance is not directly correlated with the dataset size. But its accuracies were good when the sample size is 33000. But the error rate are more. As far as Random Forest is concerned, accuracies increase with the size of the datasets and reach the maximum when the size is 22000 and deteriorate then. Also its error rate decreases as the size increases with a minimum when the size is 22000.
1056
I. Rajeshwari and K. Shyamala
Table 4 shows the execution time of the various algorithms on the datasets. Linear SVM, Kernal SVM and logistic regression are having a lengthy time. The remaining algorithms are compared in Fig. 3. Random forest algorithm executes faster when the sample size is less than 1100. As sample size increases, Naïve Bayes algorithm executes faster. Random forest and KNN are the next faster ones.
Fig. 2. Performance of decision tree algorithm on the datasets
The outcome of the study is that Decision Tree is the best algorithm for the soil datasets of any size among the considered 7 algorithms with a little compromise in the execution time. Other than decision tree, If execution time is to be taken into consideration, Naive Bayes can be used for sample size less than 1100 and Random Forest for larger datasets.
Table 4. Execution time in ms Algo
Size 110 DT 14 KerSVM 18 KNN 15 LinSVM 43 LogREG 18 NaiveBayes 16 RF 25
220 550 1100 2200 5500 11000 22000 33000 16 17 21 29 46 62 94 145 21 31 63 147 696 23200 7640 17800 17 20 25 34 76 170 421 908 131 335 688 14300 42100 93300 24300 51000 19 26 38 68 148 305 915 2140 18 17 17 26 36 48 75 116 26 27 33 42 71 108 167 256
Study on Performance of Classification Algorithms
1057
Fig. 3. Performance of algorithms on the datasets based on execution time
5 Conclusion and Future Work This study has conducted a comparison between the classification algorithms like decision tree, KNN, Kernal SVM, Linear SVM, Logistic regression, Naive Bayes and Random forest on the soil dataset of 9 sizes using PYTHON. The algorithms were compared on various metrics like Accuracy score, Cohen’s Kappa, Precision, Recall And F-Measures, Hamming Loss, Explained Variance Score, Mean Absolute Error, Mean Squared Error and Mean Squared Logarithmic Error. The Confusion matrix and Classification report were used for analysis. It is concluded that Decision tree is found suitable for these datasets with little compromise in the execution time for predicting the crops to be cultivated. The other algorithms that top in the rank are NaiveBayes, when the sample size is lesser than 1100, and Random Forest, for larger sample size. When execution time is given importance, NaiveBayes can be used, followed by kNN for sample size less than 5500 and Random Forest for larger datasets. The work can be extended further by comparing these algorithms for different sizes of various datasets other than soil dataset, still larger datasets and varying the parameters for the algorithms. For example, the number of trees in random forest can be changed and analysed. Also the percentage of data taken for testing and training can be changed and analysed.
References 1. Manjula, E., Djodiltachoumy, S.: Data mining technique to analyze soil nutrients based on hybrid classification. Int. J. Adv. Res. Comput. Sci. 8(8) (2017). http://dx.doi.org/10.26483/ ijarcs.v8i8.4794, ISSN 0976-5697 2. RamaKrishna, B.V., Satyanarayana, B.: Agriculture soil test report data mining for cultivation advisory. Int. J. Comput. Appl. (2250-1797), 6(2), 11–16 (2016) 3. Singh, A.K., Singh, S.R.K., Tomar, K.: Soil testing towards sustainable agriculture and land management: farmer beliefs and attitudes. Asian J. Soil Sci. 8(2), 290–294 (2013) 4. Dogra, A.K., Wala, T.: A comparative study of selected classification algorithms of data mining. IJCSMC 4(6), 220–229 (2015). ISSN 2320-088X
1058
I. Rajeshwari and K. Shyamala
5. Deshmukh, B., Patil, A.S., Pawar, B.V.: Comparison of classification algorithms using WEKA on various dataset. Int. J. Comput. Sci. Inf. Technol. (IJCSIT) 4(2), 85–90 (2011) 6. Ramesh, V., Ramar, K.: Classification of agricultural land soils: a datamining approach. Agric. J. 6(3), 82–86 (2011). ISSN 1816-9155 7. Kumar, V.P., Velmide, L.: Datamining plays a key role in soil analysis of Warangal region. Int. J. Sci. Res. Publ. 4(3), 1–3 (2014). ISSN 2250-3153 8. Baskar, S.S., Arockiam, L., Charles, S.: Applying data mining techniques on soil fertility prediction. Int. J. Comput. Appl. Technol. Res. 2(6), 660–662 (2013) 9. Hemageetha, N., Nasira, G.M.: Analysis of soil condition based on pH value using classification techniques. IOSR-J. Comput. Eng. 18(6), 50–54 (2016). e-ISSN 2278-0661, pISSN 2278-8727 10. Pahwa, P., Papreja, M., Miglani, R.: Performance analysis of classification algorithms. IJCSMC 3(4), 50–58 (2014). ISSN 2320-088X 11. Shah, C., Jivani, A.G.: Comparison of data mining classification algorithms for breast cancer prediction. In: Conference Paper, July 2013. https://www.researchgate.net/publica-tion/ 269270867, https://doi.org/10.1109/ICCCNT.2013.6726477 12. Nabi, M., Wahid, A., Kumar, P.: Performance analysis of classification algorithms in predicting diabetes. Int. J. Adv. Res. Comput. Sci. 8(3), 456–461 (2017). ISSN 0976-5697
Global Normalization for Fingerprint Image Enhancement Meghna B. Patel1(&), Satyen M. Parikh1, and Ashok R. Patel2 1
A.M.Patel Institute of Computer Studies, Ganpat University, Gujarat, India [email protected], [email protected] 2 Florida Polytechnic University, Lakeland, FL, USA [email protected]
Abstract. The enhancement in fingerprint image remains as a vital step to recognize or verify the identity of person. The noise is influenced during the acquisition of fingerprint image. The poor quality images are captured and leads to inaccurate levels of discrepancy in values of gray level beside the ridges and furrows because of non-uniformity of ink and contact of finger on scanner. This poor quality images are affect to the minutiae extraction algorithm which may extract incorrect minutiae and affect to the fingerprint matching during postprocessing. Normalization is the pre-processing step for increase the quality of images by removing the noise and alters the range of pixel intensity values. The mean and variance are used in process to reduce variants in gray-level values along ridges and valleys. In this paper show the result of applied famous global normalization and prove by empirical analysis of an image that normalization is remain good for enhancement process and make the noise free image which is useful for next step of fingerprint recognition. Keywords: Fingerprint Normalization normalization Image enhancement
Global normalization Block
1 Introduction Due to the wide-spread use of internet, the information is electronically transferred through the connected network. The security of information is become major issue at the time of data transmission. Identification of true person and protect the information from opponent play vital role [1, 2]. The identity of person can be recognized or verified using biometric authentication. The biometric authentication is categorized into physiological and behavioral features of an individual like fingerprints, hand geometry, face, iris, voice, signature, key stroke [3]. Based on the bcc market research report [4] fingerprint is the most popular biometric in the world up to 2015. See the below Fig. 1. As well as in the latest report of BCC research titled as “Biometrics: Technologies and Global Markets” computed $14.9 billions in 2015, estimated to extended upto $41.5 billions by 2020 [5]. The fingerprint provide highest degree of reliability amongst other biometric modalities like hand, voice, iris, face etc. [6]. The fingerprints are distinctive and © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 1059–1066, 2020. https://doi.org/10.1007/978-3-030-37218-7_111
1060
M. B. Patel et al.
Fig. 1. The global market of biometric technologies from 2008–2015 [5]
remain unchanged during entire life [7]. Also based on study of research paper [8] the possibilities of two fingerprints are similar is very low around 1 in 1.9 1015. Fingerprints are the marks or pattern that is generated after pressing the tip of finger on any surface. Fingerprint holds ridges and valleys patterns. The papillary lines of finger are called ridges. The gaps exist between ridges are called valley patterns [1]. The ridges are displayed as a dark lines and the white area display between ridges are indicated as valleys. See in below Fig. 2. The Fingerprint Recognition done by first extracting minutiae and matching the minutiae. Minutiae are the features of fingerprint see in below Fig. 3.
Fig. 2. Ridges and furrows (valleys) of fingerprint [4]
The recognition of fingerprint is done in two parts: (1) Pre-processing (2) Postprocessing. The processes called normalization, image enhancement, orientation estimation, binarization and thinning are considered as Pre-processing phases while
Global Normalization for Fingerprint Image Enhancement
1061
Fig. 3. Minutiae of fingerprint
post-processing includes extraction of minutiae, detect and remove false minutiae, extract core point and minutiae matching [9–12]. The improvement in finger image quality is challengeable task because poor quality finger image are generated due to dry or wet skin or damaged due to age, scratches, worn down, etc. [6]. These factors affect to ridge structure, for that connected ridges become disconnected and disconnected ridges become connected. The result of false ridges structure extracts the false minutiae and gives false match of fingerprint. The paper formed in different sections. The Sect. 2 include previous work done on normalization Sect. 3 include proposed work on global normalization using mean and variance Sect. 4 include Performance Measurement and Result Discussion Sect. 5 include Conclusion and Future Enhancement.
2 Previous Work Done on Normalization The pixel-wise enhancement is used in an image processing. In pixel-wise enhancement the novel pixel value is depend on the earlier pixel value but it can’t affect the clarity of ridge and furrows structure. Many methods are used at the initial stage of image enhancement are like histogram equalization, contrast stretching, normalization and wiener filtering [6]. Various techniques of finger image enhancement are proposed in paper [13, 14]. The most of techniques are enhance the image from binary image whereas some techniques enhance finger image on gray scale image. Normalization is the first phase to apply enhancement on gray scale image. To increase quality of finger images, the researchers proposed block-wise implementation. They do adaptive normalization based on block processing. Using each block’s statistics, input finger image is divided in sub-block and normalized the process [13, 14]. In the research paper [15] states that in pre-processing fingerprint image enhancement can be done in two phases: (1) Use conventional filter for enhancing the image without changing filter parameters. (2) Use conventional filter to enhance the structure of ridges and furrows (valleys) along changes in filter parameter based on ridge orientation and frequency. The author
1062
M. B. Patel et al.
introduces the comparison of combination of local, global and block-local normalization at first phase for analysis of efficiency. In proposed work, followed the widely accepted normalization method established by Hong et al. [16].
3 Global Normalization Using Mean and Variance The images are captured by scanner during the image acquirement, at that time poor quality images are captured and lead to inaccurate levels of discrepancy in values of gray level beside the ridges and furrows because of non-uniformity of ink and contact of finger on scanner. Normalization is an important pre-processing step for increase the quality of images by altering the range of pixel intensity values for removing the noise. Sometimes, it is called as contrast stretching. This process reduces variants in gray level values along ridges and furrows using mean and variances. The following Eq. (1) is used for normalization process [6, 16] to determine the new intensity value of each pixel. qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 8 < M þ VAR0 ðI ði;jÞM Þ2 0 VAR qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Gði; jÞ ¼ : VAR0 ðI ði;jÞM Þ2 M0 VAR
if I ði; jÞ [ M
ð1Þ
otherwise
Here, I(i,j) describe gray-level value of pixel (i, j). M0 describe desired means value and VAR0 describe desired variance values. After applying Eq. (1) and make try and error concept for finding out desired mean and variance value perfect for databases. Then finally decided M0 = 100 and VAR0 = 100 is perfect value for fingerprint database. The steps of algorithm for proposed work in normalization process is describe in Fig. 4.
4 Performance Measurement and Result Discussion The proposed work is implemented in java language. To check result of normalization process, the experiment is done using two databases FVC2000 [17, 18] with subset DB1,DB2,DB3 and DB4 and FingerDOS [19, 20]. Tables 1 and 2 show detail description of databases. The proposed work is evaluated using different measurement parameters such as: (i) MSE and PSNR value (ii) Computational Time (iii) Quality of an Image. (i) MSE and PSNR value PSNR stands for Peak Signal-to-Noise Ratio. The below Eq. (3) used to measure quality within original and reconstructed finger image. Higher PSNR and lower MSE prove that reconstructed image has better quality. MSE stands for Mean Squared Error. To calculate PSNR, first calculate the MSE using following Eq. (2) then calculate PSNR using Eq. (3):
Global Normalization for Fingerprint Image Enhancement
1063
Fig. 4. Flowchart of normalization process Table 1. Database FVC2000 [17, 18] FVC2000 Set B Sensor type DB1 Low-cost optical sensor DB2 Low-cost capacitive sensor DB3 Optical sensor DB4 Synthetic generator
Image size 300 300 256 364 448 478 240 320
No. of impression 10 8 10 8 10 8 10 8
Resolution 500 dpi 500 dpi 500 dpi About 500 dpi
Table 2. FingerDOS database [19, 20] Finger DOS Sensor type Optical sensor (secuGen iD-USB SC)
Image size 260 300
No. of impression 3600 = 60 6 10 i.e. No. of subjects = 60 No. of fingers = 6 (index, middle and thumb of right and left hand) No. of impression = 10
Resolution 500 PPI
1064
M. B. Patel et al.
MSE ðx; yÞ ¼
N 1X ð xi yi Þ 2 N I¼1
ð2Þ
Where, N stands for all pixels in input image. PSNR ¼ 10 log10
L2 MSE
ð3Þ
Where, L stands for number of discrete gray level. The Table 3 shows comparative study of input image and normalized image with PSNR and MSE values. The comparison shows that normalized image contain higher PSNR and lower MSE value compared to input image. The comparison prove that proposed work present better result in specification of image brightness and contrast. Table 3. Comparative study of MSE, PSNR and computational time for original image and normalized image Images
Original image MSE PSNR FVC2000_101 10.634 37.365 FVC2000_102 09.551 38.438 FingerDOS_101 13.096 36.468 FingerDOS_101 12.765 37.287
Normalized image Execution time MSE PSNR Execution time 176 5.584 39.342 156 126 6.542 38.469 113 138 7.049 40.634 118 135 5.045 40.332 116
(ii) Computational Time The calculation time taken in milliseconds for original and normalized image. The result shown in Table 3. (iii) Quality of an image The visualize appearance of fingerprint images are improved after applying global normalization to converting image into grayscale image as well as noise are also reduced. To show the result two images are selected from FVC2000 database and FingerDOS database respectively. In image red mark show that noise is removed which is occurred due to pressure given on scanner. The implemented result is shown in Fig. 5.
Global Normalization for Fingerprint Image Enhancement Original Image
Normalized Image
Original Image
1065
Normalized Image
FVC2000_101
FVC2000_102
FingerDOS_101
FingerDOS_102
Fig. 5. Visualize appearance of an image [7]
5 Conclusion and Future Work The proposed work shows implementation of global normalization. The proposed work find out the noise which are occur due to pressure given on scanner and remove noise after altering range of pixel intensity values. The performances of original images and enhanced normalized images figure out using MSE and PSNR value. Also consider the execution time and visualize appearance of an image for comparative study. The experiential results demonstrate that enhanced normalized images provide better result compare to original image. An another fingerprint recognition phases like image enhancement, orientation estimation, binarization and thinning are done after removing the noise using normalization. To improve overall accuracy of fingerprint recognition, in future work, try to enhance each phase with applying different algorithms. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest. ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Lee, H.C., Gaensslen, R.E.: Advances in Fingerprint Technology. CRC Press, Boca Raton (2001) 2. Ratha, N.K., Bolle, R.M.: Fingerprint image quality estimation. IBM Computer Science Research Report (1999)
1066
M. B. Patel et al.
3. Jain, A.K., Ross, A., Prabhakar, S.: An introduction to biometric recognition. IEEE Trans. Circuits Syst. Video Technol. 14(1), 4–20 (2004) 4. Biometrics: Technologies and Global Markets. Published: November 2010 Report Code: IFT042C. http://www.bccresearch.com/market-research/information-technology/biometricstechnologies-markets-ift042c.html. Accessed Apr 2015 5. Biometrics: Technologies and Global Markets. Published: January 2016 Report Code: IFT042E. https://www.bccresearch.com/market-research/information-technology/biometricstechnologies-markets-report-ift042e.html. Accessed Oct 2017 6. Maltoni, D., Maio, D., Jain, A.K.: Handbook of Fingerprint Recognition, 3rd edn. Springer, New York (2009) 7. Carlson, D.: Biometrics - Your Body as a Key, 2 June 2014, http://www.dynotech.com/ articles/biometrics.shtml 8. Mario, D., Maltoni, D.: Direct gray-scale minutiae detection in fingerprints. IEEE Trans. Pattern Anal. Mach. Intell. 19(1), 27–40 (1997) 9. Patel, M.B., Patel, R.B., Parikh, S.M., Patel, A.R.: An improved O’Gorman filter for fingerprint image enhancement. In: 2017 International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS), pp. 200–209. IEEE, August 2017 10. Patel, M.B., Parikh, S.M., Patel, A.R.: Performance improvement in preprocessing phase of fingerprint recognition. In: Satapathy, S., Joshi, A. (eds.) Information and Communication Technology for Intelligent Systems, pp. 521–530. Springer, Singapore (2019) 11. Patel, M.B., Parikh, S.M., Patel, A.R.: An improved approach in fingerprint recognition algorithm. In: Published in International Conference on Computational Strategies for Next Generation Technologies (NEXTCOM 2017), Organized by CT Institute of Engineering Management & Technology Shahpur Jalandhar, 25–26 November 2017. Proceeding in Springer CCIS Series (2017). ISSN No. – 1865-0929 12. Patel, M., Parikh, S.M., Patel, A.R.: An improved approach in core point detection algorithm for fingerprint recognition. In: 3rd International Conference on Internet of Things and Connected Technologies. Elsevier (2018) 13. Kim, B.G., Kim, H.J., Park, D.J.: New enhancement algorithm for fingerprint images. In: Proceedings of 16th International Conference on Pattern Recognition, vol. 3, pp. 879–882. IEEE (2002) 14. Shi, Z., Govindaraju, V.: A chaincode based scheme for fingerprint feature extraction. Pattern Recogn. Lett. 27(5), 462–468 (2006) 15. Kocevar, M., Kacic, Z.: Efficiency analysis of compared normalization methods for fingerprint image enhancement. ARPN J. Syst. Softw. 3(3), 40–45 (2013) 16. Hong, L., Member, S., Wan, Y., Jain, A.: Fingerprint image enhancement: algorithm and performance evaluation. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 777–789 (1998) 17. Maio, D., Maltoni, D., Cappelli, R., Wayman, J.L., Jain, A.K.: FVC2000: fingerprint verification competition. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 402–412 (2002) 18. FVC2000 (n.d.). http://bias.csr.unibo.it/fvc2000/databases.asp. Accessed June–July 2018 19. Bong, F.F.L.D.: FingerDOS: a fingerprint database based on optical sensor. WSEAS Trans. Inf. Sci. Appl. 12(29), 297–304 (2015) 20. Fingerprint Database Based on Optical Sensor (FingerDOS) (n.d.). https://fingerdos. wordpress.com/. Accessed June–July 2018
Recurrent Neural Network for Content Based Image Retrieval Using Image Captioning Model S. Sindu1(&) and R. Kousalya2 1
2
Department of Computer Science, Dr. N.G.P. Arts and Science College, Coimbatore, India [email protected] Department of Computer Applications, Dr. N.G.P. Arts and Science College, Coimbatore, India
Abstract. With the tremendous growth in the collection of digital images by social media, ecommerce applications, medial applications and so on, there is a need for Content based image retrieval. Automatic retrieval process is one of the main focus of CBIR, whereas traditional keyword based search approach is time consuming. Semantic based image retrieval is performed using CBIR where the user query is matched based on the perception of the contents of the image rather than the query which is in text format. One of the main research issue in CBIR is the semantic gap which can be reduced by deep learning. This paper focuses on image captioning model for image retrieval based on the content using recurrent and convolutional neural network in deep learning. Keywords: Content based image retrieval neural network Recurrent neural network
Deep learning Convolutional
1 Introduction Traditional method for retrieving content based on the images uses text based query to retrieve the images from the database. Since large number of images are generated from different applications, text based image search is more tedious. In content based image retrieval, search is performed taking into consideration on the contents, rather than metadata of the image such as text description, tags, annotation etc. [1]. The term content refers to the low level features such as color, shape, texture or it can be higher level feature which can be derived from the image and thus ensures semantic image retrieval. One of the research challenging issues in CBIR is that the low level features such as color, texture and shape are sufficient to describe the higher level semantic information and to bridge the research gap the Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN) are used. 1.1
Convolutional Neural Network for CBIR
Content based image retrieval uses convolutional neural network to detect and recognize images. Convolutional neural network are special type of neural network which © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 1067–1077, 2020. https://doi.org/10.1007/978-3-030-37218-7_112
1068
S. Sindu and R. Kousalya
works very well with the data that is spatially connected and it is used for image processing and image classification. Using Convolutional neural network objects can be detected and recognized. Images are made of pixels, but individual pixels by themselves do not represent any useful information. But a group of pixels preeatures sent adjacent to each other could represent features like shapes (eyes, nose etc.). Normal neural network will not be able to accurately understand the image features correctly. CNN is composed of many layers and it automatically learns the features like colors, shapes, textures, relationships between shapes etc. The various layers in convolutional neural network are input layer, convolution layer, pooling layer, fully connected layer and output layer. Convolution and pooling are the two special type of layers helpful for image recognition in CNNs. Input layer is the first layer in the Neural network which is fed with normalized Image Pixel data. Raw image pixel data is converted to a grey scale image, and the pixel data are normalized by dividing the pixel intensities by 255 to convert pixel information to be in range 0 to 1. Output Layer is at the end of the Neural network which gives the final classification. Convolution layer convolutes the images by using filters. Pooling layers removes unnecessary parts from the image. 1.2
Recurrent Neural Network
Sequential characteristic of data are recognized by RNN and also predicts the pattern of next likely scenario. It is commonly used in natural language generation, text generation, voice recognition and text generation. RNN can handle dynamic temporal data in a better way [3]. The input to RNN is a time series data and the output can also be a time series data. RNN can handle different types of input and output such as varying input or fixed input, varying output or fixed output. The structure of RNN is shown in Fig. 1 along with the normal feed forward neural network. It has an input layer, hidden layers and an output layer. Recurrent Neural network have loops which allow information to persist and it is recursive in structure [4].
Fig. 1. Structure of recurrent neural network
Some of the applications based on the input type are Text recognition, Sentiment classification, chatbots, image captioning and so on. Image captioning are used in content based image retrieval.
Recurrent Neural Network for Content Based Image Retrieval
1069
Based on the object and actions of the image, generating textual description for an image is called image captioning. Generating captions for image is a challenging research issue. The role of image captioning for retrieving images based on the content is to achieve semantic similarity by capturing the textual data associated with the images. The caption generated from an image is shown in Fig. 2 and the corresponding caption generated for the below image is “A small aeroplane flying on the sky”. The rest of this paper is organized as follows. In Sect. 2 a brief literature survey is made on the existing works. Section 3 describes the proposed method followed by the experimental evaluation with the final conclusion.
Fig. 2. Caption generated from RNN [7]
2 Literature Survey Zhu et al. [5] used unsupervised algorithm defined by a semantic meaning with hashing functions. The purpose of their research is to enhance the capability of making unique distinctions of hash codes by extracting semantics automatically from the noisy associated texts. In traditional approach, the learning is by maintaining similarity of images visually, which is differentiated by the hash code formulation using a unified unsupervised framework. In [6] the authors proposed a CBIR schema. To improve the accuracy of the image retrieval system, color, texture and shape features are fine tuned and optimized. The noise and image translation problem is solved by using wavelet and curvelet features for textures. Finally, for optimization they used particle swarm optimization algorithm. When compared with traditional CBIR system, optimal solution and better accuracy is reached in their model. Content Based image retrieval can extract object features from the image in the database which is called as Object-based image retrieval systems. Segmentation is used for finding the similarity between the database image and the user query image [7]. Object based retrieval systems are suitable for retrieval systems where the objects are easily separated from the environment and that have unique identification for colors or textures [8]. In [9], remote sensing images are used for region based retrieval where a graph based theoretical approach with unsupervised learning is used in their framework. Graph model is used in images and this graph based model can easily find location based feature distribution. Here the resemblance with respect to the image provided by
1070
S. Sindu and R. Kousalya
the user and the image in database is based on the graph similarity. In [10], a template based caption generation method are used by the authors which uses fixed template sentence. In this method explicit annotation is required for each class. Image is linked to a sentence and a score is computed. The importance is given for the meaning obtained from the image with the sentence and the score is calculated. Strong enough captions are not generated using this method because it uses a rigid templates. Image retrieval based caption generation is proposed by the authors in [11]. Similar images based on the content are retrieved and the query images are also compared based on the captions. Transfer based caption generation strategies are used in their work. To compose image description, tree based method is used. The database image content is shown in the form of tree fragments and are used to denote the expressive phrases. The fragments of a tree for a new description composed are pruned. Neural network approach for caption generation in used by the authors in [12]. Their aim is framing sentence, which is the description for the given image. In their work they presented a learning approach which is sequential and minimal assumption is made on the structure which follows a sequence. Two LSTMs are used in their work. One LSTM is used to map the vector with the input of fixed dimensionality, and another LSTM which is deep is used for decoding from vector to target sequence. To understand the text a log based multimodal approach is proposed by Kiros et al. [13]. In their approach, a word contained on a image is compared with the previous word by probability distribution. Convolutional neural network is used for image-text modeling, and it is used to learn word representations and image features by model training. In [3, 4] recurrent neural network is used for caption generation. A multimodal neural work approach is used here. Given a previous word, next word is generated by probability. A deep recurrent neural network for text generation and a convolutional neural network for image identification, are the two different neural network types used in their model. The interaction between two model is performed in a multimodal layer to form a new mRNN model. Xu et al. in [14], used an model which automatically learns to describe the content of the image. A deterministic manner is used in their model and the model is trained.
3 Proposed Methodology Existing CBIR methodologies use pixel level feature similarity which results in large semantic gap. The main disadvantage with existing system is the object type mismatch in the query image and retrieved images. To overcome these difficulties, the proposed model uses CNN and RNN for the content image retrieval. By combining both CNN and RNN, semantic gap can be reduced further by giving importance total number of objects, relative position of objects, actions and so on. The steps to perform image Captioning using CNN and RNN are listed below and the flow chart is shown in Fig. 3. The Neural Image Captioning model works as follows • A Standard Pre Trained CNN models like VGG16, VGG19, ResNet or Inception is used as a Image Feature Extractor
Recurrent Neural Network for Content Based Image Retrieval
1071
Start
Input Data Set
Select batch of Image Dataset
Image Preprocessing
Feature extracƟon in CNN
Image capƟon generaƟon from extracted features using LSTM
Image repository and capƟon database
CapƟon and image indexing
Text Similarity Engine
Resultant images with capƟons Fig. 3. Flow chart for image captioning model
• The output of CNN Layer which is filtered feature vectors of the Input image is fed into a RNN • The Image Caption is encoded into a vectorised form using Word to Vector process • The RNN is trained to Generate Image captions by matching the Image Features from the CNN network corresponding caption of the image from the Image Database • There is a Attention Mechanism which helps the RNN learn to pay selective attention and focus only on a part of the Image to generate caption. 3.1
The Methodology Is Proposed in Three Steps
• Step 1: Object detection using convolution neural network • Step 2: Image captioning phase to generate text captions for images
1072
S. Sindu and R. Kousalya
• Step 3: Text similarity based ranking phase on image captions to identify similar images. Step 1: Object detection using Convolutional neural network The most important layer in CNN is the Convolution layer, and it is the first hidden layer which performs convolutions and it is a matrix multiplication operation with specific types of filters. Filters are usually a 3 3 matrices which convolves the image into processed forms like edge detection, depth detection etc. Each 3 3 sub section of original matrix are multiplied with the filter and result is stored as output and fixed stride is taken to start next matrix multiplication operation. This also reduces the input matrix dimension. Pooling layer helps in extracting the important features and removing unessential parts of an image. Pooling helps in dimension reduction and also performs rotational variance to some extent. Some of the pooling operations are max, min, sum and average. Fully connected layers are the last part of a Convolution Neural Network before the Output Layer and gets the input from the previous pooling layer which also acts as normal input layer. Image data is Stacked or Flattened before feeding the input to the Fully connected layer. By the time image reaches fully connected layers, the image data is dimensionally reduced from the original image size and also important features are highlighted and extracted. The activation function used is the Relu activation function. The use of convolution neural network bridges the semantic gap for object type matching and ranking on within other low level features. Further semantic gap exists like number of objects, relative position of objects, actions etc., which can be resolved using Recurrent Neural Network. Step 2: Image captioning phase to generate text captions for images Recurrent Neural Network is used to generate caption for the images. The output of CNN which is a filtered feature vectors of the input image is fed into RNN. The problem with RNN is the vanishing gradient descent and the memory is short. Hence RNN uses a architecture called Long Short-Term Memory (LSTM) networks as shown in Fig. 4, and the purpose of LSTM is to extend the memory for the required period of time. LSTM has two types of cells, short term memory cells and the long term memory cells, from which a decision is made to retain the information for a long time in memory by looping back into the network or to discard it. The memory in LSTM is in the form of gated cells, where gated means that the cell decides whether to store the information in the memory or to discard it. All these are performed by using different gates such as learn gate, forget gate, remember gate. The caption generated using CNN and LSTM is shown is Fig. 5.
Fig. 4. LSTM architecture
Recurrent Neural Network for Content Based Image Retrieval
1073
Fig. 5. Image captioning using CNN and RNN(16)
To further improve the performance of the LSTM, an Attention mechanism is used and in this mechanism there is an encoder LSTM which works as a normal LSTM. The intermediate outputs from the encoder LSTM is taken as input. For each input sequence the model is trained to learn to pay selective attention and are related to items in the output sequence. Filtering is performed for each input sequence item based on some conditions and the output sequence is generated. This type of attention based sequence mechanism is useful for applications like image caption generation to make best use of the convolutional neural network and the recurrent neural network. The sequence-based attention mechanism can be applied to computer vision problems to help get an idea of how to best use the convolution neural network to pay attention to images when outputting a sequence, such as a caption. Step 3: Text similarity based ranking phase on image captions to identify similar images The content based image retrieval model uses image captioning model to generate captions for the user query image. The database has images along with captions. Text similarity based ranking phase is applied on image captions to identify similar images. The query image caption is compared with the database image caption. If the caption are similar the relevant image is retrieved. Cosine similarity is used for the text similarity processing. To convert a image into captioning model and to find similarities between text, natural language processing is used. To understand the context of the image for caption generation and to perform text similarity Part of Speech and BLEU (BiLingual Evaluation Understudy) score are used. To measure the BLUE score N-Gram is used. 3.2
Part of Speech (POS) Tagging
POS tagging is used to tag each words based on the grammer and it is the process of marking up a word in a text as similar to a particular part of speech based on both definitions and speech. In the proposed work to reduce the semantic gap in retrieved images higher importance is given to noun and verb of the generated caption. In this proposed work, we have selected to “mandatorily match the noun” and also given next “higher priority to verb”.
1074
3.3
S. Sindu and R. Kousalya
N Gram
N-gram refer to the process of combining the nearby words together for representation purposes where N represents the number of words to be combined together. N-Gram combines words together from a sentence for representation purposes and N represents the number of words to be combinded. Example: A group of girls sitting under a tree 1-Gram - ‘A’, ‘group’, ‘of’, ‘girls’, ‘sitting’, ‘under’, ‘a’, ‘tree’ 2-Gram - ‘A group’, ‘group of’, ‘of girls’, ‘girls sitting’, ‘sitting under’, ‘under a’, ‘a tree’ 3-Gram - ‘A group of’, ‘group of girls’, ‘of girls sitting’, ‘girls sitting under’, ‘sitting under a’, ‘under a tree’. Breaking down a natural language into n-grams is essential for maintaining counts of words occurring in sentences which forms the backbone of traditional mathematical processes used in Natural Language Processing.
4 Experimental Evaluation The dataset used for evaluating the performance model of the proposed system is the MSCOCO dataset. It is implemented using tensorflow and keras. Tensorflow is one of the best library to implement deep learning and keras is built on top of tensorflow. The metrics used for evaluating the image captioning model is the BLUE metrics. All the images from validation dataset is taken for calculating BLEU score and the average score is 66.5 which shows that the sentence generated from query image are very similar compared to the sentence generated from pretrained captioning model. The custom metrics used to evaluate the performance of image captioning model for Content based image retrieval is Mean Average Precision. No of images used for calculating metrics is limited to 5000 images from validation set of MSCOCO dataset because of performance constraints. To better understand the closing of the semantic gap the results of the new proposed metrics is shown below in Table 1 and the corresponding images are shown from Figs. 6, 7, 8 and 9. Table 1. Mean Average Precision Mean Average Precision for Noun matched between generated and top 5 result captions 61.71
Mean Average Precision for Verb matched between generated and top 5 result captions 46.18
The input image given for the proposed image caption generator is shown in Fig. 6 and the corresponding relevant image retrival is shown in Fig. 7. The caption generated for the input image is “a couple of birds sitting on top of a tree” and the results shows that the importance is given to the noun and verb and the related top five images are retrieved. It shows that the image is retrieved with the relevant content and thus reducing the semantic gap.
Recurrent Neural Network for Content Based Image Retrieval
1075
Fig. 6. Input bird image
Fig. 7. Relavent top 5 images [5]
The input image with action is shown in Fig. 8 and the corresponding output image is shown in Fig. 9. The input image is “A baseball player swinging a bat”. The output image generated also shows the relevant image with the same action. It shows that while retrieving the image from the database, it searches for the image with similar action also.
Fig. 8. Input image with action [6]
Fig. 9. Output image with action
1076
S. Sindu and R. Kousalya
5 Conclusion and Scope for Further Research From a large database, Content based image retrieval is used to search a most relevant image based on the content of the image. To reduce the semantic gap in the retrieval, image captioning using recurrent neural network and convolution neural network is proposed in this work. Object classification is performed using CNN and the output of CNN is send as input to RNN. RNN uses LSTM to persist the needed information in the memory. The overall summary of the proposed work is that the object detection is performed using convolution neural network, and the image captioning phase is used to generate text captions for images and RNN is used in the image captioning phase and finally text similarity based ranking is performed on image captions to identify similar images. Non textual features are not captured in the current work and it is a future scope. Features like texture patterns can also be matched to further bridge the semantic gap in the retrieved results. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest. ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Yasmin, M., Sharif, M., Mohsin, S.: Use of low level features for content based image retrieval: survey. Res. J. Recent Sci. 2(11), 65–75 (2013) 2. Zhou, W., Li, H., Tian, Q.: Recent advance in context-based image retrieval: a literature survey, electronic [email protected] (2017) 3. Mao, J., Xu, W., Yang, Y., Wang, J., Yuille, A.L.: Deep captioning with multimodal recurrent neural networks. In: ICLR (2015) 4. Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: CVPR (2015) 5. Zhu, L., Shen, J., Xie, L.: Unsupervised visual hashing with semantic assistant for contentbased image retrieval. IEEE Trans. Knowl. Data Eng. 29, 472–486 (2016) 6. Fadaei, S., Amirfattahi, R., Ahmadzadeh, M.R.: New content-based image retrieval system based on optimised integration of DCD, wavelet and curvelet features. IET Image Process. 11, 89–98 (2016) 7. Danezis, G., Gürses, S.: A critical review of 10 years of privacy technology. In: Proceedings of the 4th Surveillance and Society Conference (2010) 8. Hoiem, D., Sukthankar, R., Schneiderman, H., Huston, L.: Object based image retrieval using the statistical structure of images? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2004) 9. Rashtchian, C., Hockenmaier, J., Forsyth, D.A.: Every picture tells a story: generating sentences from images. In: ECCV (4) (2010) 10. Chaudhuri, B.: Region-based retrieval of remote sensing images using an unsupervised graph-theoretic approach. IEEE Geosci. Remote Sens. Lett. 13(7), 987–991 (2016) 11. Kuznetsova, P., Ordonez, V., Berg, T., Choi, Y.: Treetalk: composition and compression of trees for image descriptions. TACL 2, 351–362 (2014)
Recurrent Neural Network for Content Based Image Retrieval
1077
12. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: NIPS (2014) 13. Kiros, R., Salakhutdinov, R., Zemel, R.S.: Multimodal neural language models. In: ICML (2014) 14. Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A.C., Salakhutdinov, R., Zemel, R.S., Bengio, Y.: Show, attend and tell: neural image caption generation with visual attention. In: ICML (2015)
Using Deep Learning on Satellite Images to Identify Deforestation/Afforestation Apurva Mhatre, Navin Kumar Mudaliar, Mahadevan Narayanan(&), Aaditya Gurav, Ajun Nair, and Akash Nair Department of Computer Engineering, SIES Graduate School of Technology, Navi Mumbai, India {apurva.mhatre16,navin.mudaliar16, mahadevan.narayanan16,aaditya.gurav16,ajun.nair16, akash.nair16}@siesgst.ac.in
Abstract. As of 2018 the area covered by the forests was determined to be 30.6% of the world’s land surface which is roughly below 3.8 billion hectares. With the exponential increase in population, the pressure on the natural resources increases which results in cutting the trees for agriculture and for industrial purposes thereby leading to deforestation. In this paper, we investigate the spatial distribution of vegetation cover by using Convolutional Neural Network (CNN) and satellite images. The features of the object identified by the CNN is extremely complex because of the number of filters used to identify various patterns. Supervised learning (learning with annotated data) is used to train the CNN model as it results in better accuracy while determining the forest cover. We leverage this technology of instance segmentation to correctly determine the forest cover in a particular satellite image that radically improves the accuracy. Keywords: Deforestation Instance segmentation Convolutional Neural Network
Computer vision
1 Introduction India is the home to 1.37 billion people [8], being the 2nd most populous country on the planet with just 3.287 million km2 of land, there is always a burden on the country’s national resources which includes the forests. Indian sub-continent has all types of geographical terrain right from ice clad mountains to hot deserts which makes India home to a wide variety of animal species. The exponential growth of human population has surged the need for space to build civilizations. In India the forest cover was 21% by the end of the last decade [7, 9], but it is constantly increasing in this decade with an average of 0.2% per annum according to Government data. The main challenges that are faced leading to above mentioned problems are• In India, the infrastructure needed to survey forest land constantly and periodically does not exist, and we lack human resources in the Ministry of Environment, Forest and Climate change. © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 1078–1084, 2020. https://doi.org/10.1007/978-3-030-37218-7_113
Using Deep Learning on Satellite Images
1079
• Much of the Himalayan ranges are geographically challenging to monitor. • Changes usually come to notice after a substantial portion of forest has been lost. This paper consists of related works, methodology used, implementation details and the results. The proposed solution consists of 3 phases which are preparing the dataset, feature extraction and object detection with instance segmentation as we want to determine the approximate area of our region of interest. The satellite images will contain forest areas at certain locations in the image that the model has to classify correctly. Convolutional Neural Network is used here which is a deep learning algorithm that takes image as an input, extract relevant features of objects by using learnable weights and biases and can distinguish between various objects. The pre-processing task which is an important aspect in traditional machine learning techniques is minimal here as the network is able to learn the characteristics with enough training. Mask Region-CNN (RCNN) [1, 2, 6] is an instance segmentation technique that not only identifies the bounding boxes but also the location of each pixel of the desired object. Its explanation is twofold: region identification and then classifying the said regions and generating masks and bounding boxes. This is done by using another Full Convolution Network (FCN) with input as feature map that gives 0 if the pixel isn’t associated with the object and 1 otherwise. Our model is able to segment the forest and non-forest covers with accuracy over 90% which has various applications. It can be used for comparing the images of previous dates to find out the deforestation rate over time. The model can be improved by training with a dataset of bigger size and variety. The model was tested on images unseen by the model which resulted in results with substantial accuracy.
2 Related Works Similar work [3] has been done in a competition organized by Planet. The objective was to detect deforestation in the Amazon Basin. Dataset was already provided by Planet with both JPEG and GeoTIFF (it provides info about the weather conditions inside an image) images but the major difference was the approach, the winners of the competition used Custom CNN model at first. Around 30 labels were determined which included mining, lakes, human habitation, etc. They compared the frequency of detection of labels and inferred whether there was significant change in the topology of the land. Another work [4] which used an Artificial Neural Network (ANN) and GIS to determine the relationship between deforestation and socioeconomic factors. It simulates the deforestation rate over a set of parameters which is preemptively defined and predicts the part of the forest that was destroyed. Similarly, to establish relation between forest structural features, stem density and basal area, an ANN [10] and remote sensing was used. The ANN generated a map of basal area which provided climatic influences on the forest structure along with structural heterogeneity.
1080
A. Mhatre et al.
Similar work [11] has been done using Land Change Modeler of TerrSet, a software analytical tool based on GIS that analyses historical data of forests and generates risk maps as well as future forest loss in the Democratic Republic of Congo. Similarly, Efficient Bayesian Multivariate Classifier (EBMC) [12] and Bayesian networks were used to identify the factors that are associated with direct deforestation drivers and to predict the risk of deforestation.
3 Methodology 3.1
Collection and Annotation of Dataset
Collection of dataset is always a challenge and satellite images are not easy to obtain. We collected images from Kaggle dataset and the rest were acquired from Planet.org, a website which provides satellite images over a timeline. Satellite images have an accuracy up to 3 m. High resolution image was essential to capture the details of our region of interest. The details in a high resolution make feature detection possible using the deep learning algorithm. Annotation of images has been done by pixel annotation tool [5] which is an open-source tool. 3.2
Feature Extraction and Detection
As shown in the Fig. 1, the model extracts thousands of features using the filters which gets better during epoch of training. The layers in the beginning of the network extract features of less relevance whereas the latter layers extract features of very high relevance. It forms a feature map of lower dimension compared to the image. As it is a Feature Pyramid Network (FPN) consisting of 2 pyramids, one receives the features of high relevance and the other receives the features of low relevance. The localization of the object is done by prediction where the Regional Proposal Network (RPN) scans the FPN. Two outputs are generated by the RPN, for the anchor class and bounding boxes. The anchor class is either background class or a foreground class. Then the image is further passed on to the hidden layers followed by max pooling and normalization. There are two stages of Mask RCNN, depending on the input image, it performs localization of the object then it predicts the class of the object, confines the bounding box and generates a mask. Both stages are connected to the backbone structure. The model accepts output of the Regional Proposed Network which are the ROIs and classification and bounding box are its output. With the help of these ROIs 28 28 pixel masks are generated as outputs. During testing, these masks are scaled up. In Fig. 2, the model depicts the mask formed by it on the input image. Once the model has localized the vegetation via bounding boxes, it has localized it pixel by pixel by forming a mask on it. The mask is then highlighted using OpenCV and the background is transformed to black and the mask is transformed to white.
Using Deep Learning on Satellite Images
1081
The architecture of the system is given in Fig. 3. The average binary cross-entropy loss: P Lreg ¼ i2fx;y;w;hg ðti di ð pÞÞ2 þ kkW k2 Loss minimization formula
i 1 X h k k ^ ^ y log y þ 1 y log 1 y ij ij ij ij m2 1 i;j m
Fig. 1. Masked RCNN framework for instance segmenttion [5]
Fig. 2. Masked RCNN pixel-level labelling [9]
Fig. 3. Architecture of the system
1082
A. Mhatre et al.
4 Implementation 4.1
Training
The model was trained for over 12 h on a machine with GTX 1070 graphics card. The model was given annotated images from the training dataset. The model generates segmentation masks for each instance of an object in the image. The model progressively was trained for over 15 epochs. The error in classifying forest from human vegetation cover is still above 62%. This can cause inconsistent results and give us misleading information. But this can be solved by training the model with a huge and varied dataset. The output is bounding box around detected features and masked feature with edge detection. Since we had no use of the bounding box it was ignored in the output and only the instant segmentation output is used to generate mask. 4.2
Testing
The model was then given inputs from the test dataset that had an accuracy of over 90%. It wasn’t able to classify few images that had less details in it. This could be improved by changing the learning rate and taking more variety in dataset during training process. In Fig. 4, the model is able to classify and perform instance segmentation when the dataset was balanced and could detect the vegetation and nonvegetation patches with an accuracy over 90%. During the earlier stages of training, the model is able to detect only the vegetation present in the image with an accuracy over 90% by forming the mask only on that part of the image with vegetation as shown in Fig. 5.
Fig. 4. Result on test image
Using Deep Learning on Satellite Images
1083
Fig. 5. Result on test image
5 Conclusion The goal of the project was to segment forest and non-forest cover using satellite images of any forest/region of interest. The solution was implemented as a proof of concept. The model was tested on a validation dataset of 120 images. The accuracy of the model was 90% on the validation dataset. The results were not extremely accurate on few unseen images but the positive deviation shown was true. This could be improved further by including more images in the training dataset with a lot of variety and also by further training the model we can reduce this error and make the system highly efficient and reliable. Acknowledgements. We would like to offer our special thanks to Prof. Pranita Mahajan for her valuable and constructive suggestions during the planning and development of this research work. Her willingness to give her time so generously has been very much appreciated. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest ✓ No humans/animals involved in this research work. ✓ We have used our own data.
1084
A. Mhatre et al.
References 1. Tang, C., Feng, Y., Yang, X., Zheng, C., Zhou, Y.: The object detection based on deep learning. In: 2017 4th International Conference on Information Science and Control Engineering (ICISCE), Changsha, pp. 723–728 (2017) 2. Zhang, X., An, G., Liu, Y.: Mask R-CNN with feature pyramid attention for instance segmentation. In: 2018 14th IEEE International Conference on Signal Processing (ICSP), Beijing, China, pp. 1194–1197 (2018) 3. Kan, W.: Planet: Understanding the Amazon from Space (2017). https://www.kaggle.com/c/ planet-understanding-the-amazon-fromspace/. Accessed 10 June 2017 4. Ahmadi, V.: Using GIS and artificial neural network for deforestation prediction. Preprints 2018, 2018030048. https://doi.org/10.20944/preprints201803.0048.v2 5. Abreheret: Pixel Annotation tool (2017). https://github.com/abreheret/PixelAnnotationTool 6. Ren, S., He, K., Girshick, R., Zhang, X., Sun, J.: Object detection networks on convolutional feature maps. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1476–1481 (2017) 7. FAO: The State of the World’s Forests 2018 - Forest pathways to sustainable development, Rome. Licence: CC BY-NC-SA 3.0 IGO (2018) 8. India Population, 12 July 2019. http://worldpopulationreview.com/countries/india. Accessed 30 July 2019 9. Forest Cover of India from 1987 to 2015. https://community.data.gov.in/forest-cover-ofindia-from-1987-to-2015/ 10. Ingram, J., Dawson, T., Whittaker, R.: Mapping tropical forest structure in southeastern Madagascar using remote sensing and artificial neural networks. Remote Sens. Environ. 94, 491–507 (2005). https://doi.org/10.1016/j.rse.2004.12.001 11. Goldman, E., Harris, N., Maschler, T.: Predicting future forest loss in the democratic republic of the Congo’s CARPE landscapes. Technical Note, World Resources Institute, Washington, D.C. (2015) 12. Dlamini, W.M.: Analysis of deforestation patterns and drivers in Swaziland using efficient Bayesian multivariate classifiers. Model. Earth Syst. Environ. 2 (2016). https://doi.org/10. 1007/s40808-016-0231-6
An Application of Cellular Automata: Satellite Image Classification S. Poonkuntran1(&), V. Abinaya1, S. Manthira Moorthi2, and M. P. Oza2 1
Velammal College of Engineering and Technology, Madurai, Tamilnadu, India [email protected], [email protected] 2 Indian Space Research Organization, Space Application Centre, Ahmedabad, India
Abstract. Cellular automata (CA) is an emerging area of research which is being applied in many areas. Satellite image classification is one such area where CA is mainly used for addressing declassification, misclassification and uncertainty. The paper presents an application of CA in satellite image classification. The parallelepiped algorithm in supervised classification have been taken as a base and CA rules were created and added to it in the proposed scheme. In comparative analysis, a test sample of Indian Pines have been experimented with proposed scheme and parallelepiped. The results show 6.96% of improvement in accuracy of proposed classifier with respect to parallelepiped classifier. Keywords: Image classification classification Remote sensing
Cellular automata Satellite image
1 Introduction The Cellular Automata (CA) is an emerging concept in biological system to study the diseases and its patterns through evolutionary computing model. After the success of CA in biological systems, it has been used in other areas such as gaming, cryptography, classification and prediction. A cellular automata consists of cells and it is a basic structuring element on which cellular automata is performed. It is a two dimensional block forming the portion of an object on which CA is applied such as Image and Texture. Each cell will have its own values from the property of object and it is assigned with particular state initially. This state of cell will be changed through a set of rules that governs the CA [10–12]. One of the recent applications of cellular automata is Satellite Image Classification in which the information about an object is acquired without having physical contact with an object [14]. Such classification becomes important in remote sensing and its results are demanded by Geographic Information Systems (GIS), Global Positioning System (GPS) and Remote Sensing (RS) for better understanding of the data and its relations. This will accelerate the classification to detect the area of interest in the image such as water resources, land, agriculture and roadways [1, 4, 10, 11]. © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 1085–1093, 2020. https://doi.org/10.1007/978-3-030-37218-7_114
1086
S. Poonkuntran et al.
The satellite image classifier performance is based on the number of training sets and its quality. The uncertainty is an issue in any classifier where pixels will be classified into more than one classes. It is an essential to address the uncertainty to improve the accuracy of classifier [1–5]. The other issues include declassification where the pixels will not be classified to any classes due to the limitations of classifier and its properties. The misclassification is another issue where the pixels are classified in to wrong classes. Cellular Automata (CA) has been found as best technique in addressing above mentioned issues [1–14]. This paper discusses an application of cellular automata in satellite image classification against parallelepiped algorithm which is a conventional and widely used technique under supervised classification. The paper is organized as follows. The first section introduces the cellular automata, satellite image classification and discusses the reasons for using cellular automata in classification. Section 2 discusses the related work and its challenges. Section 3 introduces proposed CA based satellite image classification method and its algorithm. Section 4 discusses the experimental results and comparative analysis of proposed scheme. Section 5 discusses conclusions and future enhancement.
2 Related Works The classification process is categorized in to two classes. They are supervised and unsupervised. The supervised classification requires input data and number of output classes well in advance. The user will supply the samples for all the output classes initially. It is known as training set. The classifier will then classify the input data using training set given by the user. The output classes defined here is non-overlapping classes. However, it is allowed to have minimum overlapping based on application. Many algorithm have been proposed in the literature for performing supervised classification on satellite images. The parallelepiped algorithm is one which is a conventional and widely used in remote sensing application [10–14]. Parallelepiped: Parallelepiped algorithm uses n-dimensional data space where the multispectral satellite data is presented. The mean and standard deviation are taken as base elements for every output classes specified in n-dimensional covering all the bands of the given satellite image. For every spectral value in the given band of satellite image, it is checked against the ranges of threshold formed by mean and standard deviation of the class. If it lies in the range, the spectral value is assigned to the class, other wise it is not classified. This process may classify the pixel into more than one classes that leads to uncertainty in classification. Again, pixels may not be classified in to any classes. This creates declassification issues. Because, it checks with ranges of threshold formed by mean and standard deviation [16]. The threshold ranges contains spectral values and it does not uses contextual values of the pixel in an image. The parallelepiped algorithm is given by the Eq. 1.
An Application of Cellular Automata: Satellite Image Classification
MðClass; BandÞ SDðClass; BandÞ SVðx; BandÞ MðClass; BandÞ þ SDðClass; BandÞ
1087
ð1Þ
Here, M (Class, Band) refers the mean of the particular output class at given band. The term SD (Class, Band) refers the standard deviation of the particular output class at given band. The spectral value of the pixel x at given band given by SV (x, Band). Here Band = 1, 2, 3…b. Where b is the total number of bands in the satellite image. Similarly, Class = 1, 2, 3…n. Where n is the total number of output classes in the satellite image. However, all the classification techniques are mainly attempts to understand the pixels in an image and its relationship between them. This understanding plays an important role in classifier accuracy. The pixels will have spectral and contextual information. The spectral information is intensity values described by the method of image acquisition. It is also commonly referred as pixel values. The contextual information is further level of understanding the pixels with its neighbors. It is useful to know how a pixel is located in an image and what objects refer to it. The common issues in parallelepiped classifier are declassification and uncertainty. In few cases, the pixels may be classified in wrong classes that lead to the misclassification. The major reason for these issues are only spatial information are used in classification. It is observed that when spatial information is combined with contextual information, it improves the accuracy. It also addresses the uncertainty, declassification and misclassification [4, 16]. A cellular automata (CA) is an emerging concept which is being applied in many fields such as physics, chemistry, computer science, biology and mathematics etc [12]. It is an iterative procedure that changes the state value of every cell defined initially over the iteration. A cell is a pattern and it is defined by the user initially. The cell is given with initial state value at the beginning. Later, these state values of cells will change through the iteration based on set of conditions that forms the rule set. The cell patterns, number of states and set of rules are basic components in CA and it is defined by the user based on the application. The cell is a pattern which is having central component and surrounded by neighbors. This structure brings the contextual information of the central component to the application. The cell pattern is not changes in iteration, instead its state value changes. This step addresses the uncertainty related to the central component in a cell. Thereby, CA becomes popular in solving uncertainty in classification.
3 Proposed Work 3.1
Cellular Automata
As per above discussion, the cellular automata can be applied to the data in any dimension. This paper focusses the cellular automata on image classification where the image is a 2-dimensional data and a grid of pixels will form a cell in CA. The image is first divided in to number of cells that forms the basic element in CA. The cell of an image is defined by neighborhood of pixel in an image. The rule set of CA is formed
1088
S. Poonkuntran et al.
based on neighborhood of pixels. Each cell is assigned a state values that represents initial class assigned to it in classification. This state value will then be modified based on the rule set over iteration. The number of iteration is decided based on the application. According to the literature Von-Neumann is a first person to use this model in to universal constructor application. In 1950, the model were explored for biological system development. In 1970, John Convay invented a “Game of Life” using cellular automata where each cell will have two states namely alive and dead. The mathematical definition of cellular automata is given by a function c: Zd!S. Where Zd is a cellular space on d-dimension, c is a element of Zd called as cells and S is a set of states that describes the number of output classes in the classification. The function assigns the cell to the state. This paper focusses on 2 dimensional space in CA. The following CA interpretation are used in the work. Spatial Dimension: The spatial dimension of CA specifies the dimension in which CA is used. Since, our classification is for images, the spatial dimension is 2. The shape will be a grid. Size of Neighborhood: It specifies how many neighbor pixels need to be considered for CA operation. The common neighborhood patterns are Von-Neumann that considers 4 neighbors, Moore that consider 8 and Extended Moore that considers 16 neighbors. States: The state of each cell is defined by two values. 1. Class to which the pixel belongs. 2. Type of the Pixel. Here the pixel will be in any one of four types. 1. Classified – The pixel will be classified to a single class. 2. Border – The pixel will be at border of the classes. 3. Noisy – The pixels are corrupted and not classified to any classes. Simply, the pixels that not belong to any of above mentioned three types. 4. Uncertain – The pixel will be classified to more than one classes. Transition Function: The set of rules that defines movement of cell from one state to another in CA. It is an important component that brings evolution in CA Iteration: It specifies the number of iteration used in CA for evolution. 3.2
Proposed Work
To showcase an application of cellular automata, we have developed a classifier that classify the images using parallelepiped algorithm initially. Then, we apply below given four rules created using cellular automata. The proposed classifier classifies the image using parallelepiped classifier with a trained samples given by the user. Then, the results are fed to cellular automata for further improvements. Rule 1: If the number of spectral output classes of pixel of an image is 1, and the spectral output classes of neighborhood pixels of an image is Not Classified or the same as actual pixel, then Pixel is classified as “Classified”
An Application of Cellular Automata: Satellite Image Classification
1089
Rule 2: If the number of spectral output classes of a pixel of an image is 1, and the spectral output classes of neighborhood of pixel of an image are different than actual pixel class, then pixel is classified as “Border” Rule 3: If the number of spectral output classes of a pixel of an image is 1 and the spectral output class is Noisy, then pixel is classified as “Noisy” Rule 4: If the number of spectral output classes of a pixel of an image is greater than 1, then the pixel is classified as “Uncertain” The CA uses set of rules to fine tune the results. The rule set combines the spectral values with contextual details. The contextual details of an image is taken through moore neighborhood pattern on the results of parallelepiped. The contextual information of a pixel is then processed in proposed cellular automata classifier by using the above mentioned four rules. The proposed CA_Classifier algorithm is as below.
Proposed CA_Classifier (I, T, G) // I – Input Satellite Image in size of MxNxK: M-No.of Rows, N-No. of Columns & K- No. of Bands // T- Training Sample // G- Ground Truth
1. 2. 3.
Read I , T, G Set Threshold as Thr. For Iter= 1 to Number of Iteration For k = 1 to K For i= 1 to M*N // Perform Parallelepiped Classifier
CL= Parallelepiped Classifier (I (i, k), T, Thr) // The output of conventional classifier is updated through CA.
CA=CA_Classifier (CL, Neigbours) // Update the Thr
4.
Thr=Thr+1 Return CA
To quantitatively measure the performance of the proposed CA classifier, the parallelepiped classifier is taken as base and its results are compared to the proposed scheme.
4 Experimental Results The proposed CA classifier has been implemented in Matlab 2016 b using sample hyper spectral images taken from Computational Intelligence Group (CIG) of the University of the Basque Country (UPV/EHU), Spain [13]. The sample images are satellite images taken from AVRIS sensors for earth observation. The Indian Pines is a sample image used in the experiment. For the quantitative analysis of results, the following parameters were used as metrics [15].
1090
S. Poonkuntran et al.
1. True Positive (TP): A true positive is an outcome where the classifier correctly classifies the positive class. 2. True Negative (TN): A true negative is an outcome where the classifier correctly classifies the negative class. 3. False Positive (FP): A false positive is an outcome where the model incorrectly classifies the positive class. 4. False Negative (FN): A false negative is an outcome where the model incorrectly classifies the negative class. 5. Total Positive (P): It is calculated by TP+FN. 6. Total Negative (N): It is calculated by FP+TN. 7. Sensitivity: True Positive Rate and it is calculated by TP/P. 8. Specificity: True Negative Rate and it is calculated by TN/N. 9. Accuracy: The accuracy specifies how accurate the classifier classifies the pixels. It is given by TP+TN/ (P+N). 10. Error: The error is the difference between the predicted classification and actual classification. It is given by 1-Accuracy. 4.1
Indian Pines
This image is taken on Indian Pines, a sample site in north-western Indiana. The image is taken through AVRIS sensor and specified in 224 spectral bands in 145 145 pixel resolution. The band wavelength is from 0.4–2.5 10−6 m. The spectral band contains mainly agriculture, forest and vegetation. It also contains rail line, highways, housing and other building structures. The final image contains 200 bands after removing water absorption bands. The ground truth were given for 16 classes which is not mutually exclusive as listed in Table 1. The sample band image and ground truth are shown in Fig. 1.
(a)
(b)
Fig. 1. (a) Sample band of Indian pines (b) Ground truth for 16 classes.
An Application of Cellular Automata: Satellite Image Classification
4.2
1091
Results and Comparative Analysis
As mentioned in Sect. 4, the experiment has been done on two approaches. First we experimented the proposed scheme that uses both parallelepiped and CA Rules for classification. Second we use only parallelepiped to compare results of the proposed scheme. The positive and negative class portfolio of the proposed scheme is tabulated in Table 2 and parallelepiped is tabulated in Table 3. It is found that the True Positive Rate of the proposed scheme is improved by 12.78% compared to Parallelepiped. At the same time, True negative rate of the proposed scheme is improved only by 0.5% compared to Parallelepiped. The proposed approach (Approach 1) outperforms the parallelepiped (Approach 2) by 6.96% of improvement in overall accuracy of the classification. The sensitivity and specificity of the proposed scheme is improved by 8% and 0.50% respectively with respect to the parallelpiped. The comparative analysis is tabulated in Table 3. Table 1. Positive/Negative values of Indian pines using approach 1 (parallelepiped with CA rules) Class Class Class Class Class Class Class Class Class Class Class Class Class Class Class Class Class Class
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
True positive False positive False negative True negative 9293 1483 83 10166 38 8 178 20801 1247 181 178 19419 739 91 162 20033 212 25 183 20605 410 73 163 20379 633 97 185 20110 22 6 161 20836 416 62 175 20372 19 1 188 20817 841 131 183 19870 2112 343 139 18431 512 81 158 20274 175 30 187 20633 1087 178 168 19592 337 49 181 20458 83 10 177 20755
1092
S. Poonkuntran et al.
Table 2. Positive/Negative values of Indian pines using approach 2 (parallelepiped only) Class Class Class Class Class Class Class Class Class Class Class Class Class Class Class Class Class Class
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
True positive False positive False negative True negative 8721 2055 137 10112 36 10 242 20377 1184 244 229 19368 706 124 215 19980 202 35 255 20533 393 90 246 20296 587 143 231 20064 22 6 240 20757 389 89 218 20329 18 2 244 20761 791 181 249 19804 2001 454 231 18339 483 110 209 20223 167 38 252 20568 1033 232 217 19543 314 72 251 20388 78 15 234 20698
Table 3. Comparative analysis Parameter Accuracy Error Sensitivity Specificity
Approach 1 Approach 2 86.45 79.49 13.55 20.51 62.63 54.62 98.84 98.32
5 Conclusions This paper presented an application of cellular automata in satellite image classification. The proposed CA method uses parallelepiped, a conventional supervised algorithm as a base and CA rule set is applied on the results of it. The CA improves the accuracy of the parallelepiped classifier by combining the spectral values with contextual values. The experimental result shows that 6.96% of improvement in overall accuracy of proposed classifier compared to parallelepiped. The true positive rate and true negative rate of the proposed classifier is improved by 12.78% and 0.5% respectively. It is concluded that the overall accuracy of the classifier is improved, when parallelepiped classifier is combined with CA. Thereby, CA will become an important component in satellite image classification. The proposed CA scheme is further extended with KNN, CNN classifiers to check its sustainable performance.
An Application of Cellular Automata: Satellite Image Classification
1093
Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest. ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Aponte, A., Moreno, J.A.: Cellular automata and its application to the modelling of vehicular traffic in the city of Caracas. In: Proceedings of the 7th International Conference on ACRI, vol. 4173, pp. 502–511 (2006) 2. Avolio, M.V., Errera, A., Lupiano, V., Mazzanti, P., Di Gregorio, S.: Development and calibration of a preliminary cellular automata model for snow avalanches. In: Proceedings of the 9th International Conference on ACRI, vol. 6350, pp. 83–94 (2010) 3. Sikdar, B.K., Paul, K., Biswas, G.P., Yang, C., Bopanna, V., Mukherjee, S., Chaudhuri, P. P.: Theory and application of GF(2P) cellular automata as on-chip test pattern generator. In: Proceedings of 13th International Conference on VLSI Design India, pp. 556–561 (2000) 4. Espínola, M., Piedra-Fernández, J.A., Ayala, R., Iribarne, L., Wang, J.Z.: Contextual and hierarchical classification of satellite images based on cellular automata. IEEE Trans. Geosci. Remote Sens. 53(2), 795–809 (2015) 5. Balzter, H., Braun, P., Kühler, W.: Cellular automata models for vegetation dynamics. Ecol. Modell. 107(2/3), 113–125 (1998) 6. Bandini, S., Bonomi, A., Vizzari, G.: A cellular automata based modular illumination system. In: Proceedings of the 9th International Conference on ACRI, vol. 6350, pp. 334–344 (2010) 7. Das, S., Chowdhury, D.R.: Generating cryptographically suitable non linear maximum length cellular automata. In: Proceedings of the 9th International Conference on ACRI, vol. 6350, pp. 241–250 (2010) 8. Doi, T.: Quantum cellular automaton for simulating static magnetic fields. IEEE Trans. Magn. 49(5), 1617–1620 (2013) 9. Dzwinel, W.: A cellular automata model of population infected by periodic plague. In: Proceedings of the 6th International Conference ACRI, vol. 3305, pp. 464–473 (2004) 10. Espínola, M., et al.: ACA multiagent system for satellite image classification. In: Proceedings of the PAAMS, vol. 157, pp. 93–100. AISC (2012) 11. Espínola, M., et al.: A hierarchical and contextual algorithm based on cellular automata for satellite image classification. In: Proceedings of the CiSE, pp. 1–4 (2011) 12. An Application of Cellular Automata in Hermetic Systems: https://www.hermetic.ch/pca/ pca.htm 13. López-Fandiño, J., Priego, B., Heras, D.B., Argüello, F.: GPU projection of ECAS-II segmenter for hyperspectral images based on cellular automata. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 10(1), 20–28 (2017). https://doi.org/10.1109/jstars.2016.2588530 14. Nichele, S., Ose, M.B., Risi, S., Tufte, G.: CA-NEAT: evolved compositional pattern producing networks for cellular automata morphogenesis and replication. IEEE Trans. Cogn. Dev. Syst. 10(3), 687–700 (2018). https://doi.org/10.1109/TCDS.2017.2737082 15. The UPV/EHU Hypersepctral Images Data Set. http://www.ehu.eus/ccwintco/index.php/ Hyperspectral_Remote_Sensing_Scenes 16. Xiang, M., Hung, C.-C., Pham, M., Kuo, B.-C., Coleman, T.: A parallelepiped multispectral image classifier using genetic algorithms. In: Proceedings of the 2005 IEEE International Geoscience and Remote Sensing Symposium, IGARSS 2005, Seoul (2005), pp. 4-pp. https:// doi.org/10.1109/igarss.2005.1526216
Morlet Wavelet Threshold Based Glow Warm Optimized X-Means Clustering for Forest Fire Detection B. Pushpa(&) and M. Kamarasan Computer and Information Science, Annamalai University, Chidambaram, India [email protected], [email protected]
Abstract. Forest fire detection is a significant problem to be resolved for the prevention of life and property safety. Clustering performance of existing fire detection algorithm was poor. In order to resolve the above limitations, Morlet Wavelet Threshold-based Glow Warm Optimized X-Means Clustering (MWTGWOXC) the technique is proposed. Initially, this technique takes a number of video frames as input and initializes a number of clusters. Then, the proposed approach initializes the glow warm populations with a number of video frames. Then, the technique calculates the fitness function of all clustered video frames and identifies a pre-fire stage or fire stage or critical fire stage. If any video frame not clustered, this technique employs Bayesian probability criterion which determines a higher probability of frame to become a cluster member and improves clustering accuracy. The simulation result demonstrates that the technique is able to increase fire detection accuracy and also minimize fire detection time. Keywords: Bayesian probability criterion Color Fitness function Intensity Spatio-temporal energy Texture Video frames and X-means clustering
1 Introduction Forest fire is a serious hazard in different places around the world. Recently, the large fires and the guard of life goods require effective prevention and it is achieved based on the video surveillance - based fire detection. But, a false positive rate of the conventional technique was more. Therefore, the technique is introduced to achieve higher accuracy for finding forest fires in the video. K-medoids Clustering was designed in [1] to get better fire flame discovery performance. But, false-positive detections were very higher. Dirichlet Process Gaussian mixture model-based approach was introduced in [2] for autonomous flame recognition depends on color, dynamics, and flickering traits of flames with higher accuracy. But, the time complexity of fire flame identification was more. Video fire detection was presented in [3] with the application of Gaussian Mixture Model using multi-colour features. However, reducing the rate of the false alarm was not considered. Based on the color, spatial and temporal information, Covariance matrix–based fire and flame discovery method was introduced in [4]. But, flame detection accuracy was poor. Remote identification of forest fires from video © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 1094–1105, 2020. https://doi.org/10.1007/978-3-030-37218-7_115
Morlet Wavelet Threshold Based Glow Warm Optimized X-Means Clustering
1095
signals was presented in [5] with the help of classifiers using K-SVD learned dictionaries. However, the computational complexity involved during forest fire discovery was not minimized. Spatio–Temporal Flame Modeling and Dynamic Texture examination was introduced in [6] for video-based fire recognition. But, the average frame rate was not efficient. ASTER Digital Elevation Model (DEM) was developed in [7] for determining the fire hazard with improved accuracy. However, processing time taken for analyzing fire risks was not solved. Robust approach for smoke recognition was presented in [8] with application of deep belief networks. But, fire detection was not attained in this approach. A visual analysis method was introduced in [9] to increase the speed of fire flame recognition in video with help of logistic regression and temporal smoothing. But, false positive ratio of fire recognition was higher. An efficient forest fire detection method was designed in [10] with help of background subtraction and color segmentation. However, forest fire identification performance was not efficient. In order to address the above mentioned existing issues, MWT-GWOXC technique is designed with the main contribution is described in below. To achieve enhanced fire detection accuracy for video sequence through clustering, MWT-GWOXC technique is developed. Morlet Wavelet Transformation is applied for reducing the time complexity of fire detection process. If intrinsic nature of flame pixels, Wavelet signals easily expose the random traits of a given signal and technique to attain more robust detection of flames in video with a lower time. To minimize the false positive rate of fire discovery as compared to existing work, if any video frame does not group into a cluster, X-Means Clustering utilize Bayesian probability criterion that computes a maximum probability to become a cluster member. This helps the technique to improve fire detection accuracy. This paper is formulated as follows. Section 2 portrays the related works. Section 3 presents the detailed process of MWT-GWOXC technique. In Sect. 4, an experimental setting of proposed technique is demonstrated. Section 5 explains the results and discussion of certain parameters. Finally, in Sect. 6 conclusion of the research work is discussed.
2 Related Works Video based fire discovery with help of rule method based on wavelet using RGB and HSV color space and spatial analysis based on wavelet was presented in [11]. However, fire detection accuracy was not enhanced. Support Vector Machine classifier was designed in [12] fire and/or smoke during the earliest stages of a fire with a real-time alarm system is identified. But, time and space complexity of this algorithm was very higher. A novel method was developed in [13] to discover fires through analyzing videos obtained by surveillance cameras according to color, shape disparity, and motion analysis. However, true positive rate of fire identification was not enough. A set of motion traits based on motion determination was presented in [14] to discover fire flame in videos. But, the ratio of number of incorrect detection was higher. Real-time multi-characteristic based fire flame identification was introduced in [15] to increase the reliability. However, specificity of fire flame detection was poor. A Saliency-Based
1096
B. Pushpa and M. Kamarasan
Method was introduced in [16] for enhancing the real-time fire detection performance in video frames. But, processing time needed for fire detection was very higher. An enhanced probabilistic approach was designed in [17] with application of two trait representations to carry out color-based fire detection. A probability-based framework was introduced in [18] to find out flame based on robust features and randomness testing. But, computational cost was very minimal. A Cumulative Geometrical Independent Component Analysis (C-GICA) model was presented in [19] with aim of enhancing the video fire detection rate. However, the ratio of number of video frames that are clustered was not sufficient. Multi-characteristic fusion based fast video flame recognition was introduced in [20]. But, computational time of flame detection was remained an open issue. The flame detection approach was introduced in [21] for improving the classification accuracy. However, this method was not involving the complex rules. The video-based fire detection was designed in [22] with combined saliency detection and convolutional neural networks to extract the features for the classification. But, false alarm rate does not contain fire detection. A flame and smoke detection system was introduced in [23] to detect the fire incidents using Quick Blaze. However, the frame error rate was not considered. A novel video smoke detection method was designed in [24] with both color and motion features. But the false alarm rate was not considered.
3 Proposed Technique The Morlet Wavelet Threshold-based Glow Warm Optimized X-Means Clustering technique is introduced with the aim of enhancing the forest fire detection performance with minimal time complexity. Glow Warm Swarm Optimization employed in the technique that provides an optimal solution to effectively cluster the different types of flame (i.e. pre-fire stage, fire stage and critical fire stage) in an input video sequence with higher accuracy. Figure 1 shows the architecture diagram of this technique to attain enhanced forest fire flame identification performance. As presented in Fig. 1 MWT-GWOXC technique initially takes video database as input which contains more number of a video file and it converted into video frames. Then, the technique defines a number of clusters and initializes the glow warm population with a number of video frames. Consequently, the technique applies Morlet Wavelet Transformation for each video frames where it partitions each video frames into a number of sub-blocks. Next, this technique determines fitness function for each frame based on multi-characteristics such as color, spatio-temporal energy, intensity, texture. By using the measured fitness function, then the technique group video frames into diverse clusters (i.e. pre-fire stage, fire stage and critical fire stage). The elaborate process of this technique is described below.
Morlet Wavelet Threshold Based Glow Warm Optimized X-Means Clustering
1097
Fig. 1. Architecture diagram of MWT-GWOXC technique
Let us consider an input video dataset includes of many videos represented as ‘Vi ¼ V1 ; V2 ; ::; Vn ’ where ‘n’ denotes total number of videos available in given dataset. The videos are initially converted into number of frames denoted as ‘Fi ¼ F1 ; F2 ; ::Fm ’. Here, ‘m’ represents the no. of frames. The proposed technique initializes the ‘x’ no. of clusters.
Fig. 2. Morlet’s Wavelet Transformation of video frame
After that, this technique initialize the population with help of number of glow warms (i.e. video frames). Then, Morlet Wavelet Transformation is applied in MWTGWOXC technique to split the images into a number of sub blocks as depicted in above Fig. 2, presents the process involved in Morlet’s Wavelet Transformation. During this process, sub block with lower frequency is constantly separated into a number of blocks to increase the fire flame detection performance with lower time consumption. From that, Morlet Wavelet Transformation of a video frame at time ‘Fi ðtÞ’ is formulated as follows,
1098
B. Pushpa and M. Kamarasan
Fwaveletðu;vÞ
1 ¼ pffiffiffiffiffiffi j uj
Z
1
1
Fi ðtÞ w
t v dt u
ð1Þ
From (1), the video frame is divided into a number of sub-blocks whereas ‘Fwaveletðu;vÞ ’ wavelet transformation of a frame. After completing the transformation process, the technique determines the fitness value for each frame based on color, spatio-temporal energy, intensity and texture of fire flame. Color is one of the most considerable characteristics to identify the fire flame. Let us consider p1 ; p2 ; ::pn is a fire-colored training RGB samples of the distribution to be approximated. Then, the pixels probability density function ‘pt ’ is mathematically obtained using kernel ‘u’ as, Pr ðpt Þ ¼
1 Xn uðpt pi Þ i¼1 n
ð2Þ
From (2), ‘u’ refers Gaussian kernel. Here, ‘u ¼ nð0; DC Þ’ whereas ‘DC’ denotes the diagonal Covariance matrix through diverse standard deviation ‘lj ’ with respect to each color channel ‘j’. For each pixel the fire color probability is mathematically evaluated as, 12 1 Xn X3 1 qffiffiffiffiffiffiffiffiffiffie pC ði; jÞ ¼ i¼1 j¼1 n 2pl2
ðpt pi Þ2 l2 j
ð3Þ
j
For each video frame ‘Fi ’, the average fire color probability is of each pixel pði; jÞ in the block is determined and it represented as the total fire color. CFi ¼
1X p ði; jÞ i;j C n
ð4Þ
From (4), ‘n’ represents the no. of pixels in the block and ‘pC ði; jÞ’ indicates the color of each pixel. The spatio-temporal energy is another significant feature to discriminate the real fire and fire colored objects in video frames. Due to wind or kind of burning material the shape of flame is continuously varied. Therefore, a fire colored object causes lower spatial distinctions within given time interval than a real fire. Thus, the temporal difference of the spatial energy at pixel ‘pST ði; jÞ’ within a last frame of temporal window ‘T’. pST ði; jÞ ¼
1 XT1 ðet ði; jÞ et ði; jÞÞ2 t¼0 T
ð5Þ
From (5), ‘et ’ denotes the spatial energy of the pixel at ‘t’ time instance, ‘et ’ indicates the total value of spatial energy. From that, total spatio-temporal energy is calculated by averaging the individual energy of pixels in blocks using below,
Morlet Wavelet Threshold Based Glow Warm Optimized X-Means Clustering
STFi ¼
1X p ði; jÞ i;j ST n
1099
ð6Þ
From (6), for each video frame the total spatio-temporal energy is measured. Followed by, for each block mean value of fire pixels’ intensity is mathematically obtained as, IF i ¼
1X p ði; jÞ i;j I n
ð7Þ
From (7), ‘n’ refers the total no. of pixels in the block and ‘pI ði; jÞ’ point outs the fire intensity of pixel. The average of the individual flickering contributions of the each pixels p (i, j) in the block represents the fire texture feature of each frame and it measured using below equation, TFi ¼
1X p ði; jÞ i;j f N
ð8Þ
From (8), ‘n’ denotes the no. of pixels in the block and ‘pI ði; jÞ’ signifies the flickering characteristic of pixel. Based on the measured fire color, intensity and texture, then the technique computes fitness function using below mathematical expression, dFi ¼ fCFi ; STFi ; IFi ; TFi g
ð9Þ
From (9), fitness function is measured for each video frame ‘Fi ’. Followed by, the technique defines threshold fitness function where threshold value is assigned for color, spatio-temporal energy, and intensity, texture. From that, MWT-GWOXC technique clusters the input video frames using below mathematical expression, 8 0 0 < If dFi \dMinT ; then group Fi into prefire stage 0 0 z¼ ð10Þ If dMinT \dFi [ dMaxT ; then Group Fi into fire stage : 0 0 If dFi \dMaxT ; then Group Fi into critical fire stage From (10), ‘z’ represents a clustering output, ‘dFi ’ denotes a fitness function value of video frame. Here, ‘Fi ’ refers an ‘ith ’ frame in given video sequence whereas ‘dMinT ’ refers a minimum threshold fitness function and ‘dMaxT ’ indicates the minimum threshold fitness function. After calculating the fitness value, this technique verifies if the fitness function of input video frame ‘dFi ’ is lesser than a minimum threshold fitness function ‘dMinTh ’. If the above condition is true, the technique clusters the video frame into a pre-fire stage. Otherwise, this technique checks if the fitness function of video frame is greater than a minimum threshold fitness function ‘dMinTh ’ and lesser than a maximum threshold fitness function ‘dMaxTh ’. When the above condition is satisfied, the technique clusters the input video frame into a fire stage. If fitness function of video frame ‘dFi ’ is greater than a maximum threshold fitness function ‘dMaxTh ’, then the technique group the frame into a critical fire stage.
1100
B. Pushpa and M. Kamarasan
If a video frame does not belong to any of the cluster, this technique improves cluster accuracy with help of Bayesian probability criterion. This helps to cluster the all video frames into the particular cluster with minimal error rate. From that, the Bayesian probability criterion is mathematically estimated as follows, dF ðtÞ dF1 ðtÞ BPFi ¼ P 2 dFn ðtÞ dF1 ðtÞ
ð11Þ
From (11), ‘BPFi ’ is a Bayesian probability criterion which gives higher probability of video frame becomes a member of the particular cluster. Here ‘dF2 ðtÞ’ point outs a fitness function of the video frame ‘F2 ’ at time ‘t’. Here, ‘dF1 ðtÞ’ represents the fitness function of the video frame ‘F1 ’ at the time ‘t’ and ‘dFn ’ denotes a fitness function of the video frame ‘n’ at the time ‘t’. This process of this technique is continual until the all video frames in the video are grouped. Through efficient clustering of video frames with a lower time, the technique increase fire detection accuracy as well as reduce fire detection time as compared to existing works. Input: Number of videos‘
’
Output: Improved fire detection accuracy Step 1:Begin Step 2: For each input video ‘ ’ Step 3:
Split ‘
Step 4:
Initialize ‘ ’ number of clusters
’ into number of frames ‘
Step 5:
Initialize the number of frames ‘
Step 6:
For each‘
Step 7:
Decompose ‘
’ into number of sub blocks
Determinefitness function ‘
Step 9:
If ‘
Step 11: Step 12: Step 13: Step 14:
’
’
Step 8:
Step 10:
’
’
’, then Cluster ‘
’ into pre-fire stage
else if ( Cluster ‘ else ( Cluster ‘
) ’ into fire stage ) ’ into critical fire stage
Step 15: End If Step 16: If any video frame is not clustered, then Step 17: Measure Bayesian probability Step 18: else Step 19: Stop the clustering process Step 20: End if Step 21: End For Step 22: End
Algorithm 1 Morlet Wavelet Thresholding Glow Warm Optimized X-Means Clustering
Algorithm 1 explains the step by step process of MWT-GWOXC technique to attain enhanced clustering performance.
Morlet Wavelet Threshold Based Glow Warm Optimized X-Means Clustering
1101
4 Experimental Settings The experimental evaluation of proposed, MWT-GWOXC technique is implemented in MATLAB with the help of FIRESENSE database [25]. This FIRESENSE database comprises of many forest fire videos. For conducting the simulation process, this technique considers various numbers of video frames in the range of 25–250. The simulation of this technique is conducted for many instances with respect to different number of video frames. The performance of MWT-GWOXC is compared with two existing works namely K-medoids Clustering [1] and Dirichlet Process Gaussian mixture model [2].
5 Performance Analysis The performance analysis of effectiveness of MWT-GWOXC technique is determined by using following parameters such as fire detection accuracy, fire detection time and false positive rate with aid of tables and graphs. 5.1
Performance Result of Fire Detection Accuracy
Fire Detection Accuracy ‘FDA’ is determined as the ratio of number of frames that are correctly clustered to the total number of video frames as input. The ‘FDA’ is estimated in terms of percentage (%) and mathematically calculated as, FDA ¼
MCC 100 m
ð12Þ
From (12), the accuracy of fire flame detection in video sequence is estimated with respect to a different number of frames. Here, ‘MCC ’ denotes number of frames that are correctly clustered and ‘m’ represents a total number of video frames considered for simulation work. Sample Mathematical Calculation • Proposed MWT-GWOXC technique: number of frames correctly clustered is 220 and the total number of frames is 250. Then fire detection accuracy is obtained as, FDA ¼
220 100 ¼ 88% 250
As presented in below Fig. 3, MWT-GWOXC technique increases the fire detection accuracy by 8% as compared to K-medoids clustering [1] and 16% as compared to Dirichlet Process Gaussian mixture model [2].
1102
B. Pushpa and M. Kamarasan
Fig. 3. Simulation results of fire detection accuracy
5.2
Fig. 4. Simulation results of fire detection time
Performance Result of Fire Detection Time
Fire Detection Time ‘FDT’measures the amount of time required for clustering video frames as pre-fire, fire, critical fire stage. The FDT is determined in terms of milliseconds (ms) and mathematically obtained using below, FDT ¼ m tðCSF Þ
ð13Þ
From (13), the time needed for detecting the forest fire flame is evaluated. Here, ‘m’ point outs the number of input video frames whereas ‘tðCSF Þ’ indicates the time utilized in single video for finding fire flame through clustering. Sample Mathematical Calculation • Proposed MWT-GWOXC technique: time taken for clustering single video frame is 0.09 ms and the total number of video frame is 250. Then FDT is estimated as, FDT ¼ 250 0:09 ¼ 23 ms As illustrated in Fig. 4, as a result, MWT-GWOXC technique decreases the fire detection time by 19% as compared to K-medoids clustering [1] and 31% while compared to Dirichlet Process Gaussian mixture model [2]. 5.3
Performance Result of False Positive Rate
False Positive Rate ‘FPR’ evaluated as the ratio of number of frames that are incorrectly clustered to the total number of video frames as input. The FPR is determined in terms of percentage (%) and mathematically estimated as, FPR ¼
MIC 100 m
ð14Þ
From (14), the false positive rate of fire flame detection is measured. Here, ‘MIC ’ represents number of frames that are inaccurately clustered and ‘m’ refers a total number of video frames.
Morlet Wavelet Threshold Based Glow Warm Optimized X-Means Clustering
1103
Sample Mathematical Calculation • Proposed MWT-GWOXC technique: number of frames wrongly clustered is 30 and the total number of frame is 250. Then FPR is calculated as, FPR ¼
30 100 ¼ 12% 250
As demonstrated in Fig. 5, the proposed MWT-GWOXC technique provides a lower false positive rate when compared to existing K-medoids clustering [1] and Dirichlet Process Gaussian mixture model [2].
Fig. 5. Simulation result of false positive rate
MWT-GWOXC technique minimizes the false positive rate by 40% as compared to K-medoids clustering [1] and 59% while compared to Dirichlet Process Gaussian mixture model [2].
6 Conclusion An efficient MWT-GWOXC technique is designed with the goal of attaining higher fire detection accuracy via clustering. The goal of this technique is obtained with the aid of Morlet Wavelet Transformation, Glow Warm Swarm Optimization and X-Means Clustering algorithm. The efficiency of this technique is evaluated by measuring the fire detection accuracy, fire detection time, and false-positive rate and compared with two conventional works. The experimental result shows that this technique gives better performance with an enhancement of fire detection accuracy and reduction of fire detection time as compared with conventional work. In future work, we concentrate on reducing energy consumption for fire detection. In addition, future work is focused to analyze more parameters to get better performance of the proposed technique.
1104
B. Pushpa and M. Kamarasan
References 1. Khatami, A., Mirghasemi, S., Khosravi, A., Lim, C.P., Nahavandi, S.: A new PSO-based approach to fire flame detection using K-medoids clustering. Expert Syst. Appl. 68, 69–80 (2017) 2. Li, Z., Mihaylova, L.S., Isupova, O., Rossi, L.: Autonomous flame detection in videos with a Dirichlet process gaussian mixture color model. IEEE Trans. Ind. Inform. 14(3), 1146–1154 (2018) 3. Han, X.-F., Jin, J.S., Wang, M.-J., Jiang, W., Gao, L., Xiao, L.-P.: Video fire detection based on Gaussian Mixture Model and multi-color features. Signal Image Video Process. 11(8), 1419–1425 (2017) 4. Habiboğlu, Y.H., Günay, O., Çetin, A.E.: Covariance matrix-based fire and flame detection method in video. Mach. Vis. Appl. 23(6), 1103–1113 (2012) 5. Rosas-Romero, R.: Remote detection of forest fires from video signals with classifiers based on K-SVD learned dictionaries. Eng. Appl. Artif. Intell. 33, 1–11 (2014) 6. Dimitropoulos, K., Barmpoutis, P., Grammalidis, N.: Spatio-temporal flame modeling and dynamic texture analysis for automatic video-based fire detection. IEEE Trans. Circuits Syst. Video Technol. (TCSVT) 25(2), 339–351 (2015) 7. Babu, K.V.S., Roy, A., Prasad, P.R.: Forest fire risk modeling in Uttarakhand Himalaya using TERRA satellite datasets. Eur. J. Remote. Sens. 49(1), 1–16 (2016) 8. Pundir, A.S., Raman, B.: Deep belief network for smoke detection. Fire Technol. 53(6), 1943–1960 (2017) 9. Kong, S.G., Jin, D., Li, S., Kim, H.: Fast fire flame detection in surveillance video using logistic regression and temporal smoothing. Fire Saf. J. 79, 37–43 (2016) 10. Mahmoud, M.A.I., Ren, H.: Forest fire detection using a rule-based image processing algorithm and temporal variation. Math. Probl. Eng. 2018, 1–8 (2018). Article ID 7612487 11. Gupta, A., Bokde, N., Marathe, D., Kishore, : A novel approach for video based fire detection system using spatial and texture analysis. Indian J. Sci. Technol. 11(19), 1–17 (2018) 12. Ho, C.-C.: Nighttime fire/smoke detection system based on a support vector machine. Math. Probl. Eng. 2013, 1–7 (2013). Article ID 428545 13. Foggia, P., Saggese, A., Vento, M.: Real-time fire detection for video surveillance applications using a combination of experts based on color, shape and motion. IEEE Trans. Circuits Syst. Video Technol. 25(9), 1545–1556 (2015) 14. Mueller, M., Karasev, P., Kolesov, I., Tannenbaum, A.: Optical flow estimation for flame detection in videos. IEEE Trans. Image Process. 22(7), 2786–2797 (2013) 15. Chi, R., Zhe-Ming, L., Ji, Q.-G.: Real-time multi-feature based fire flame detection in video. IET Image Process. 11(1), 31–37 (2017) 16. Jia, Y., Yuan, J., Wang, J., Fang, J., Zhang, Q., Zhang, Y.: A saliency-based method for early smoke detection in video sequences. Fire Technol. 52(5), 1271–1292 (2016) 17. Zhang, Z., Shen, T., Zou, J.: An improved probabilistic approach for fire detection in videos. Fire Technol. 50(3), 745–752 (2014) 18. Wang, D.-C., Cui, X., Park, E., JinHakilKim, C.: Adaptive flame detection using randomness testing and robust features. Fire Saf. J. 55, 116–125 (2013) 19. Rong, J., Zhou, D., Yao, W., Gao, W., Chen, J., Wang, J.: Fire flame detection based on GICA and target tracking. Opt. Laser Technol. 47, 283–291 (2013) 20. Chen, J., He, Y., Wang, J.: Multi-feature fusion based fast video flame detection. Build. Environ. 45(5), 1113–1122 (2010)
Morlet Wavelet Threshold Based Glow Warm Optimized X-Means Clustering
1105
21. Prema, C.E., Vinsley, S.S., Suresh, S.: Efficient flame detection based on static and dynamic texture analysis in forest fire detection. Fire Technol. 54(1), 255–288 (2015) 22. Shi, L., Long, F., Lin, C., Zhao, Y.: Video-based fire detection with saliency detection and convolutional neural networks. In: International Symposium on Neural Networks. LNCS, vol. 10262, pp. 299–309 (2017) 23. Qureshi, W.S., Ekpanyapong, M., Dailey, N.M., Rinsurongkawong, S., Malenichev, A., Krasotkina, O.: QuickBlaze: early fire detection using a combined video processing approach. Fire Technol. 52(5), 1293–1317 (2016) 24. Chunyu, Y., Jun, F., Jinjun, F., Yongming, Z.: Video fire smoke detection using motion and colour features. Fire Technol. 46(3), 651–663 (2010) 25. FIRESENSE database. https://zenodo.org/record/836749
Advance Assessment of Neural Network for Identification of Diabetic Nephropathy Using Renal Biopsies Images Yogini B. Patil(&) and Seema Kawathekar Department of Computer Science and Information Technology, Dr. Babasaheb Ambedkar, Marathwada University, Aurangabad, Maharashtra, India [email protected], [email protected]
Abstract. The current era of medical science computer plays an important role in disease identification. The accuracy and robustness of the medical test is improved with the help of computer. The significant cause of chronic kidney dieses is the diabetic nephropathy. The diabetic nephropathy has the end stage of renal failure. The research has been done to identification of causes of diabetic nephropathy. The computational technology enhances its usability towards the diabetic nephropathy research. The research has been executed on dataset of 226 images. The classifier has been applied over the geometric features like area, perimeter, eccentricity, perimeter, mean, standard deviation and correlation etc. The neural network used for classification of the dataset. The classification has been done for 10, 15 ad 20 hidden layer variation. The performance is calculated using True positive and true negative statistical measure. The classier is extracted 0.1133% error rate and 88.66% accuracy over the two class classification as normal and diabetic nephropathy dataset. From the above experimental results authors recommended that the neural network is the strong and dynamic classifier for the diabetic nephropathy identification. Keywords: Diabetic nephropathy Classification Perimeter
Otsu Edge Neural network
1 Introduction In the medical image processing the diabetics causes injury to small blood vessels of excretory organ. The diabetes affect on the functionality of the kidney which is responsible for the multigenic disease of human body. The diabetic nephropathy is the type of advancement of the diabetes which is responsible for the illness of the blood vessels. This disease affect the size of blood capillaries and its functionality. This is a type of a microangiopathy and its damage the functionality of the thickening the blood vessel and structure of eye in the body structure. This type is responsible for the micro vascular sickness of the blood structure and differentiation of the patients in the first and second types [1]. There are numerous ways for observant diagnostic assay pictures such as light microscopy research, Immune fluorescence and negatron research studies are essential to differentiate diabetic kidney disease will facilitate to indicate structural © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 1106–1115, 2020. https://doi.org/10.1007/978-3-030-37218-7_116
Advance Assessment of Neural Network
1107
changes like Mesangial enlargement, diffuse glomerular basement membrane thickening and Nodular glomerulosclerosis [1]. The representation of the graphical manner is easily understand and easy to share the convenient information. Image is the most suitable graphical structure which allowed to transmit message with them. The image transmit data from the basic features like shape, size, area and resolution. They depict spatial information such that it can be recognized as an object. The information deriving from the image is very easy and robust task which has been carried by the human being in day to day life. Maximum information has been shared and transmit through the image and it has observable feature. On the basis of observable feature of human eyes the image is classified for the specific reason [2]. For this experiment the data is normalized as per the resolution, size and shape of the image. The edge detection, preprocessing has been done over it using the structural features of the image, like points, lines, curves and surface. The features can be corner point; edge point and salient point of the image object [4].
2 Related Work This section is provide the detail explanation and highlights the contribution for the work done by the researcher in this domain. Diabetic nephropathy is a small vascular illness of the capillary that affects patient with sorts one and a couple of polygenic disease. The structural and pathological changes in the glomerular affect the thickening in the size of GBM. The structural changes in the GBM has been very poorly understand by the pathology expert. It is correlated to another disease and associated for abnormal treatment for the patients [4]. Glomerular diameter and Bowman’s area breadth square measure terribly necessary for detection varied excretory organ connected diseases, for this urinary organ corpuscle and renal objects like capillary wall, glomerular capillary wall etc. It can be helpful for histo-pathological analysis of excretory organ pictures, particularly capillary. The automation in the glomerular detection is not easy task and its a challenging work has been done using the dynamical intensity. The analyzing techniques based on the median filter and it morphological operation has been applied over the capillary vessel diameter extraction. Bowman’s area square is measure using this technique. This solution was tested on urinary organ corpuscle pictures of twenty one rats [4]. The pathos potter-k is the special assistance for the pathology expert who is responsible for the automation based results analysis. The classical image processing and pattern recognition methodology is applied over capillary vessel identification which extract accuracy eighty eight. 3 ± 3.6%. The result indicate that the approach can be applied to the event of systems designed to coach pathology students and specialist additionally in morphological analysis [5]. Here 34 TEM pictures of urinary organ diagnostic test samples square measure taken. Accuracy of segmentation remains challenging task even though’ a great deal of researcher has been done on this subject. There is no universal solution for this downside here we’ve got projected methodology that is that the modification of the initial Chan-Vese algorithm [7].
1108
Y. B. Patil and S. Kawathekar
3 Methodology For the experiment analysis the database plays a key role. Till data in the era of medical image processing no standard database is available. This research have been collect the dataset which already preprocessed by the medical community for the different application. 3.1
Database Collection
For obtaining the experimental database collection or creation is the challenging task. We have tried several Literatures and on-line sources to search out normal images of excretory organ Glomerular. Various on-line pathological sources are additionally searched, some online pathology, educational sites, medical portals are additionally visited however there was no single resource. Database for this research work has been collect from online medical library and sources like National center for Biotechnology information (NCBI) and Kidney pathology.com [8–13]. 3.2
Normalization
Images collected from on top of totally different sources was not in same format that’s dimension, height, width, pixels, horizontal and vertical resolution, bit depth etc. For analysis of disease from these images they should be in same format. This was main task in this work. Using mat-lab program first all these images are created in 512 * 512 dimension, width and height 512 pixels, horizontal and vertical dimension 96 dpi, bitdepth 32. These images are hold on in MySQL information. Using ODBC data connection these pictures was load in mat-lab for more experiment. 3.3
Segmentation
The image segmentation is the process which divides the images into sub section no adaptive region. The structure of the segmented image is the use to identify the solution. In the medical image processing the various segmentation methodology is applied. For this research otsu segmentation is employed. This segmentation method work on the clustering based thresholding. For this image has been convert from grayscale to binary. This is threshold based algorithm which extract the minimal distance of pixel and class variance in the image sub section. The algorithm assumes that the image contains 2 categories of pixels following bi-modal bar chart (foreground pixels and background pixels), then calculates the optimum threshold separating the two categories so their combined unfold (intra-class variance) is least, or equivalently (because the sum of try wise square distances is constant). The inter-class variance is supreme. Consequently, Otsu’s method is roughly a one-dimensional, discrete analog of Fisher’s Discriminate Analysis. The steps of this segmentation is shown in the Fig. 1. Here result step n = 4 are taken for feature extraction. The quality of Otsu segmentation is evaluated using peak signal to noise ratio (PSNR) and signal to noise ration (SNR) applied math approach. SNR is 90%. The PSNR based quality of segmentation is 100% [14].
Advance Assessment of Neural Network
1109
Fig. 1. Detail steps of otsu image segmentation
3.4
Classification
In the era of medical image processing and machine learning the Neural network is the robust and dynamic classifier for the recognition. Neural network has the basic functionality for statistical learning approach. The classification phenomenon has been applied over the character, handwriting digit, text recognition and more recently to satellite image classification [15]. The Artificial Neural Networks and other nonparametric classifiers have a functionality for the input image name for being study SVMs function by nonlinearly jutting the coaching information in the input area to a feature space of higher (infinite) dimension by use of a kernel perform. This results in a linearly separable dataset that will be separated by a linear classifier. This process permits the classification of image datasets which is typically nonlinearly divisible in the input area. In many instances, classification in high dimension feature areas results in over-fitting within the input space, however, in SVMs over-fitting is controlled through the principle of structural risk minimization the empirical risk of misclassification is minimized by maximizing the margin between the data points and also the call boundary. In practice this criterion is softened to the minimization of a price issue involving each the complexness of the classifier and also the degree to which marginal points are misclassified. The tradeoff between these factors is managed through a margin of error parameter (usually designated C) that is tuned through cross-validation procedures. The functions used to project the information from input area to feature space are typically known as kernels (or kernel machines), examples of which embody
1110
Y. B. Patil and S. Kawathekar
polynomial, Gaussian (more commonly referred to as radial basis functions) and quadratic functions [16, 17].
4 Experimental Analysis This experiment is tested over the 226 collected and normalized dataset. The images are normalized in 512 by 512 and 256 by 256 dimensional size. The flowchart of the experimental work is shown in the Fig. 2. For this challenging research the experimental flow diagram is divided into two main modules. The Module one explains the collection of the images from various medical sources and normalizes them to prepare the standard dataset. Our dataset is standardized or not is verified by the medical expert. In the second module the research is done using the computational image processing method. In that the data is passed through the preprocessing, segmentation, feature extraction classification and recognition of the results.
Fig. 2. Flow diagram of the experimental work [7]
The original two samples of the collected images from internet sources are shown in Fig. 3. The normalize image sample of same size, width and resolution is described in Fig. 4.
Advance Assessment of Neural Network
1111
Fig. 3. Sample of two original databases collected from online sources [8]
Fig. 4. Normalized image set of sample 2 from collected dataset [9]
The preprocessing of the collected normalized images has been done using histogram and histogram equalization approach. The graphical representation of preprocessing using image processing histogram of above two sample images is shown in Fig. 5.
Fig. 5. Preprocessing of the collected normalized two sample image [11]
1112
Y. B. Patil and S. Kawathekar
The otsu segmentation is effectively applied over the 256 image dataset. The Otsu segmentation is applied for the threshold 2, 3 and 4 level. The graphical representation of the threshold 2 and threshold 4 is shown in Figs. 6 and 7 respectively.
Fig. 6. Otsu segmentation of two sample image at threshold
Fig. 7. Otsu segmentation of two sample image at threshold
Fig. 8. ROC curve of the performance for the neural network approach
Advance Assessment of Neural Network
1113
The classification of the neural network is applied over the collected feature of the dataset. The system extracts the 70% training, 15% testing and 15% validation approach. The performance of the neural network is tested over the hidden layer. From the observation the variation of the performance depends on the hidden layer. The system is tested over the 10, 15 and 20 hidden layer approach. The graphical representation of the neural network experiment such as ROC curve for the testing is described in the Fig. 8. The ROC curve for the testing, training and validation is shown in the Fig. 9.
Fig. 9. ROC curve for all training, testing and validation for the neural network
1114
Y. B. Patil and S. Kawathekar
The classification results of the neural network with respective to hidden layer as 10, 15 and 20 is shown in the Table 1. The performance of the neural network is calculated on the basis of true positive, true negative, false positive, and false negative performance ratio. The neural network proposed the 88.66% accuracy towards the classification experiment. Table 1. Experimental classification results of neural network Sr. No 1 2 3 4 5
Parameter for classification True positive ratio (TPR) True negative ratio (TNR) False negative ratio (FNR) False positive ratio (FPR) Accuracy
Hidden layer 10 0.61
Hidden layer 15 0.541
Hidden layer 20 0.5671
Average
0.4286
0.285
0.98
0.565
0.5714
0.7143
0.625
0.637
0.3323
0.3212
0.3169
0.323
88.8286
88.7302
88.4375
88.665
0.573
5 Conclusion From this research the experiment has been tested on 226 image dataset which are collected from various medical online sources. The database has been normalized by mysql and Matlab tool. The preprocessing of the dataset done using edge detection and histrogram method. The tsu segmentation is applied over the resultant image. The Geometrical feature has been extracted from the dataset. The Neural network of the hidden layer variation of 10, 15, and 20 has been applied over the resultant images. Performance of the classifier is calculated using True positive, True negative statistical measure. The system proved the 88.66% accuracy for the identification of the diabetic nephropathy. The two class classification has been done using neural network. The author recommended that neural network is the robust and dominant methodology for diabetic nephropathy identification. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest ✓ No humans/animals involved in this research work. ✓ We have used our own data.
Advance Assessment of Neural Network
1115
References 1. Alsaad, K.O., Herzenberg, A.M.: Distinguishing diabetic nephropathy from other causes of glomerulosclerosis: an update. J. Clin. Pathol. 60(1), 18–26 (2007) 2. Gupta, S., Nijhawan, S.: Comparative analysis of text detection using SVM, KNN and NN. International Journal of Science and Research (IJSR) ISSN (Online), 2319-7064 (2015) 3. Tervaert, T.W.C., Mooyaart, A., Amann, K., Cohen, A., Cook, H.T., Drachenberg, C., Ferrario, F., Fogo, A., Haas, M., De Heer, E., Joh, K., Laure, H.N., Radhakrishnan, J., Seshan, S., Ingeborg, M.B., Bruijn, J.: Pathologic classification of diabetic nephropathy. J. Am. Soc. Nephrol. JASN 21, 556–563 (2010). https://doi.org/10.1681/asn 4. Kotyk, T., et al.: Measurement of glomerulus diameter and Bowman’s space width of renal albino rats. Comput. Methods Prog. Biomed. 126, 143–153 (2016). https://doi.org/10.1016/j. cmpb 5. Barros, G.O., et al.: PathoSpotter-K: a computational tool for the automatic identification of glomerular lesions in histological images of kidneys. Sci. Rep. 7, 46769 (2017) 6. Rangayyan, R.M., Kamenetsky, I., Benediktsson, H.: Segmentation and analysis of the glomerular basement membrane in renal biopsy samples using active contours: a pilot study. J. Digit. Imaging 23(3), 323–331 (2009). https://doi.org/10.1007/s10278-009-9188-6 7. Ravi, M., Hegadi, R.S.: Detection of glomerulosclerosis in diabetic nephropathy using contour-based segmentation. In: International Conference on Advanced Computing Technologies and Applications (ICACTA-2015), pp. 244–249 (2015) 8. Image database. http://medpics.ucsd.edu 9. Image database. http://peir.path.uab.edu 10. Image database. https://library.med.utah.edu 11. Image database. www.ajkd.org 12. Image database. www.ncbi.nlm.nih.gov 13. Image database. www.kidneypathology.com 14. Patil, Y.B., Kawathekar, S.S.: Automated analysis of diabetics nephropathy using image segmentation techniques for renal biopsy images. Int. Multidiscip. J. Pune Res. 3(4) (2017). Impact Factor: 2.46, ISSN 2455-314X 15. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995) 16. Huang, C., Davis, L.S., Townshed, J.R.G.: An assessment of support vector machines for land cover classification. Int. J. Remote Sens. 23, 725–749 (2002) 17. Foody, M.G., Mathur, A.: Toward ıntelligent training of supervised ımage classifications: directing training data acquisition for SVM classification. Remote Sens. Environ. 93, 107– 117 (2004)
Advanced Techniques to Enhance Human Visual Ability – A Review A. Vani1(&) and M. N. Mamatha2 1
Sapthagiri College of Engineering, VTU, Belgaum, India [email protected] 2 BMS College of Engineering, VTU, Belgaum, India [email protected]
Abstract. Human eye is one of the most complex organs in human body. Eyes, the integral part of Visual system receives signals and brain computes the obtained visual details. The brain helps in detection and interpretation of information in the visible light. This Complex process includes reception of light, binocular perception, estimation of distance and categorization of objects. Any anomaly in such process causes Blindness or Visually Impairment. According to WHO survey approximately 1.3 billion people live with some form of vision impairment. The need for visual aid is essential to enhance visual ability for visually impaired. In this paper a detailed review of different types of advance techniques to enhance vision is discussed. Keywords: Neuroprosthesis
BioMEMs tACS IoT Machine learning
1 Introduction The International organization WHO estimates that around 1.3 billion people around the world are suffering from visual impairment. This can be broadly classified as distance and near vision impairment [1]. It includes mild, moderate, severe impairment and blindness. Globally, there are many causes for visual impairment which includes uncorrected refractive errors, age related macular degeneration, cataract, glaucoma, diabetic retinopathy, trachoma, corneal opacity. More than 80% of the visual impairment are unavoidable [2]. There is a need for technological intervention for avoiding these circumstances. The solution many be simple with refractive error correction using glasses, while some solution includes surgeries like cataract disease. For irreversible vision problem rehabilitation can be achieved with visual neuroprosthesis [3]. The technological intervention can be broadly classified as invasive and non invasive techniques. Invasive techniques requires surgeries to restore vision. The non invasive techniques uses modern tools and techniques to help restore vision. The Fig. 1 shows the classification of advance techniques involved in restoring vision for visually impaired.
© Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 1116–1124, 2020. https://doi.org/10.1007/978-3-030-37218-7_117
Advanced Techniques to Enhance Human Visual Ability – A Review
1117
Advanced Techniques
Invasive (BioMEMs)
Retinal Prosthesis
Non Invasive
Visual Evoked Potential signal
transcranial Alternating Current Stimulation
Machine learning and Deep learning
IoT system
Fig. 1. Advanced techniques for enhancing visual ability
2 Invasive Techniques 2.1
Retinal Prosthesis
The prostheses devices are the electronic devices which substitute the organ function. The Retinal prostheses is the implantable device which substitute phototransduction process of retina which is affected significantly by the retinal disease called retinitis pigmentosa [3]. The normal eye has photoreceptors in outer layer of retina which triggers the phototransduction to generate electrical signal in the presence of light stimulus. These complex signals are processed by retina before ganglion cells. Axonal process from Retinal ganglion cell forms a optical nerve to send signals to visual cortex. In diseases such as retinal pigmentosa the outer layer of retinal are gradually lost causing progressive visual loss, however the inner layer is still working. The restoration of vision is possible with suitable devices which receive the light sinal and process to form electrical impulses to the remaining layers of visual functions [4]. Figure 2 shows the Alpha-IMS subretinal microphotodiode array retinal prosthesis. The size of the device is about 3 mm * 3 mm * 70 mm operating at 5–7 Hz. The energy is provided by external coil placed behind ears which magnetically provides energy and communication wirelessly. Similarly there are other specific designs for epiretinal implants such as Argus II retinal prosthesis system, EPI- RET3, Intelligent Retinal Implant System (IRIS V2), Suprachoroidal Retinal Prosthesis and Subretinal Retinal Prosthesis are available. The only limitation of the system is, invasive techniques. This technology is suitable only for the eye where only retinal layers are affected and rest all the region such as visual nerve path, temporal lobe and occipital lobe of the brain should be normal for better vision. 2.2
VEP Signal Technique
Whenever we see any object our eyes generate an electrical signal due to visual stimulus which is referred as Visual Evoked Potential (VEP) signal. This signal is
1118
A. Vani and M. N. Mamatha
Fig. 2. Fundus photo of Retina Implant Alpha-IMS device [3]
considered to be universal, just like a ECG signal from heart. In most of the cases people lose their vision during injury, the injuries causes damage in the frontal lobe and eye whereas the actual vision process takes place in the occipital lobe. The occipital lobes computes the images based on the VEP signal. It means that the computation block of the brain is still working. So it is possible to generate artificially evoked potential which can be applied to the brain. This evoked potential fires the required neuron in the occipital lobe of the brain [5]. Figure 3 shows the block diagram of Visual Evoked Potential signal with object detection system. The visual stimulus for various images are displayed on the screen. The subject is allowed to see the images displayed on the screen. The EEG electrodes are placed on the brain according to 10–20 international electrode placement system. The EEG signals for different images like standard checker box, face and objects are recorded. Visual Evoked Potential can be extracted from the EEG signals. These VEP signal forms the database.
Fig. 3. Block diagram of VEP signal extraction with object detection system [5].
The Camera will be placed on the visually impaired. Simple object and face recognition techniques can be applied to recognize face or objects. Once the object or face is recognized the corresponding VEP signal can be sent to the BioMEMs placed on the occipital lobe of the human brain. The VEP signal helps in partial vision on the
Advanced Techniques to Enhance Human Visual Ability – A Review
1119
visually impaired individual. The placement of BioMEM which is micro electrode, fires the required neuron on the brain in occipital lobe. Figure 4 shows the simulation results of VEP signal extraction from EEG signal for face recognition and Fig. 5 shows VEP signal extraction for standard checker board image. OpenVibe simulation software is multiplatform used for signal processing and other applications. In the figure image stored in the database and corresponding EEG signal is displayed. The output averaged VEP signal is extracted from EEG signal [6]. This signal can be applied to BioMEMs which can be implanted on brain. Further the BioMEMs fires the neurons in the occipital lobe creating partial image in the brain.
Fig. 4. Simulation result for VEP extracted signal from face recognition.
3 Noninvasive Techniques 3.1
transcranial Alternating Current Stimulation (tACS) Device
Most of the blind aided devices available uses audio cues and other devices involves surgery for implant. There is need for visual feedback with non invasive device. The non invasive visual cues are more intuitive and appealing than auditory feedback or invasive systems. This can be achieved by transcranial Alternating Current Simulation tACS device [7].
1120
A. Vani and M. N. Mamatha
Fig. 5. Simulation result for VEP extracted signal from standard checker board
The tACS device provides very low electrical simulation (µA) to the posterior part of the brain. The applied electrical simulation can be perceived as flashes of light for visually impaired. The change in electrical characteristics causes change in rate of flashes of light. The rate of flash of light can be used as visual feedback for visually impaired. As a proof of concept [7], the test was carried out for 9 blind folded individual with the age group 20-35 years mimicking visual impaired. Three electrical simulation parameters slow, medium and fast are presented for the individuals. These electrical parameters are perceived as a flashes of light at different rates. The electrodes used can be conventional Ag/AgCl adhesive foam with electrolyte gel. As shown in the Fig. 6 Fp1 and A1 bipolar combination is used to stimulate left side of the brain. Fp2 and A2 is used for right side of the brain. The flickering rate in the brain can be varied by changing frequency and current intensity of tACS device. The individuals were able to distinguish slow (8 Hz), medium (10 Hz) and fast (20 Hz) flickering rate as flashes of light in brain. This visual cues can be used for visually impaired to detect proximity of the object by making tACS device sensor triggered as shown in the Fig. 7. The sensors like ultrasound sensor which gives the information of the distance can be interfaced to tACS device. Based on the proximity, the tACS generate electrical signal which causes changes in flashes of light. Thus it helps in visual cues for visually impaired with non invasive device. Figure 8 shows the brain stimulation results for both left and right side of the brain. The frequency is kept constant at 8 Hz. The current intensity is varied in steps of 100 µA from 200 µA to 1200 µA. The simulation result is shown for 9 subjects with
Advanced Techniques to Enhance Human Visual Ability – A Review
1121
Fig. 6. Electrode placement for tACS device
Fig. 7. Block diagram of sensor triggered tACS
Fig. 8. Left and Right side brain stimulation for frequency 8 Hz for different current intersity
blind folded mimicking visual impairment. According to the graph, subjects can feel the change in flickering rate as the current intensity is varied. Figure 9 shows the brain stimulation results for both left and right side of the brain. The frequency is kept constant at 10 Hz. The current intensity is varied in steps of 100 µA from 200 µA to 1200 µA. Similar result was obtained for 20 Hz frequency by varying current from 200 µA to 1200 µA.
1122
A. Vani and M. N. Mamatha
Fig. 9. Left and Right side brain stimulation for frequency 10 Hz for different current intersity
3.2
Image Processing, Machine Learning and Deep Learning Tools
Eye disorders is a major health problem globally [8]. There are many types of age related eye diseases such as diabetic retinopathy, glaucoma and age related macular degeneration and many more [9, 10]. There is need for proper Computer aided analysis and classification tool for eye diseases for restoring vision. The features are extracted from the images of eye diseases. The classification strategies are applied for the eye data set. The new technologies like machine learning and deep learning tools can be used to diagnose eye diseases. Millions of medical images are used to train [9]. With this techniques the tool will be able to categories the healthy and unhealthy signs of eyes. The algorithm will also be able to detect some of the conditions like glaucoma, macular degeneration and fluid in retina as accurately as six expert Ophthalmologist [9, 10]. 3.3
IoT Devices
Internet of Things is the device which connects all the devices together. IoT technology plays important role health sector. The sensor will be connected to eye glasses. When the sensor detect threat it immediately sends the message to the nearest hospital [11]. Also through GPS the location of patient can also be detected. IoT with such monitoring can improve health and prevents diseases [12].
4 Conclusion and Future Work In this paper recent technologies related to enhance visual ability is discussed in detail. Table 1 describes the current status of all the technologies. Improvement of all the advanced techniques both invasive and non-invasive techniques is very essential. In invasive techniques biocompatability is very crucial for safety as it may cause tissue damage so optimal material for simulation and implantation, long term implantation studies are essential. In tACS device provide visual feedback as flashes as light. This information is very limited for visually impaired individual. Further there is a need for partial vision to get more information. ML and
Advanced Techniques to Enhance Human Visual Ability – A Review
1123
Table 1. Advanced engineering solution and its current status of major initiatives Techniques
Type
Placement of device
Retinal prosthesis VEP signal
Invasive
Placed in retina Placed in occipital lobe
tACS
Non-invasive
Machine learning
Non-invasive
IoT system
Non-invasive
Invasive
Number of electrodes 25–65 6
On Left and Right Frontal Posterior side of brain Only images of eyes are taken
2
Placed on Eye glasses
0
0
Status
Commercially available in Europe Research of placing BioMEMs on occipital lobe in progress Pilot study is found successful
Early detection of eye diseases such as glaucoma, age-related macular degeneration and diabetic retinopathy found successful IoT with smart sensor integration helps to contact doctor or guardian
deep learning techniques are provided with millions of images for better diagnosis of few eye diseases. This techniques should be further extended for all types of eye diseases. Smart sensor based vision and threat detection can be integrated to form complete eye solution through smart sensors. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest. ✓ No humans/animals involved in this research work. ✓ We have used our own data and author photo.
References 1. World Health Organization visual impairment report 2018. https://www.who.int/news-room/ fact-sheets/detail/blindness-and-visual-impairment 2. Age-Related Eye Disease Study Research Group. Risk factors associated with age-related macular degeneration: a case-control study in the age-related eye disease study: age-related eye disease study report number 3. Ophthalmology 107(12), 2224–2232 (2000) 3. https://eyewiki.aao.org/Retina_Prosthesis#Background 4. Liu, J., Sun, X.: A survey of vision aids for the blind. In: 2006 6th World Congress on Intelligent Control and Automation, Dalian, pp. 4312–4316 (2006)
1124
A. Vani and M. N. Mamatha
5. Suraj, H., Yadav, A.V.B.: The use of VLSI to generate vision in blind people using a suitable neuroprosthesis implant of BIOMEMS in brain. In: 2011 IEEE International Conference on Computer Science and Automation Engineering, Shanghai, pp. 179–183 (2011) 6. Vani, A., Mamatha, M.N.: Artificial generation of visual evoked potential to enhance visual ability. World Acad. Sci. Eng. Technol. Int. J. Med. Health Biomed. Bioeng. Pharm. Eng. 10(9), 443–447 (2016) 7. Vani, A., Mamatha, M.N.: Visual feedback system for visually impaired using tACS – a proof of concept study, vol. 6, no. 1, pp. 738–743, January 2019. ISSN 2349–5162. http:// www.jetir.org/ 8. https://www.nature.com/articles/d41586-018-02431-1 9. Selvathi, D., Suganya, K.: Support vector machine based method for automatic detection of diabetic eye disease using thermal images. In: 2019 1st International Conference on Innovations in Information and Communication Technology (ICIICT), pp. 1–6. IEEE (2019) 10. Bhatia, K., Arora, S., Tomar, R.: Diagnosis of diabetic retinopathy using machine learning classification algorithm. In: 2016 2nd International Conference on Next Generation Computing Technologies (NGCT), pp. 347–351. IEEE (2016) 11. Prouski, G., Jafari, M., Zarrabi, H.: Internet of things in eye diseases, introducing a new smart eyeglasses designed for probable dangerous pressure changes in human eyes. In: 2017 International Conference on Computer and Applications (ICCA), pp. 364–368. IEEE (2017) 12. Bhatt, Y., Bhatt, C.: Internet of things in HealthCare. In: Bhatt, C., Dey, N., Ashour, Amira S. (eds.) Internet of Things and Big Data Technologies for Next Generation Healthcare. SBD, vol. 23, pp. 13–33. Springer, Cham (2017). https://doi.org/10.1007/978-3319-49736-5_2
Cross Media Feature Retrieval and Optimization: A Contemporary Review of Research Scope, Challenges and Objectives Monelli Ayyavaraiah1(&) and Bondu Venkateswarlu2 1
AITS-Rajampet, Dayananda Sagar University, Bangalore, India [email protected] 2 Dayananda Sagar University, Bangalore, India [email protected]
Abstract. Predictive analytics that learns from cross-media is one among the significant research objectives of the contemporary data science strategies. The cross-media information retrieval that often denotes as cross-media feature retrieval and optimization is the crucial and at its infant stage. The traditional approaches of predicative analytics are portrayed in the context of unidimensional media such as text, image, or signal. In addition, the ensemble learning strategies are the alternative, if the given learning corpus is of the multidimensional media (which is the combination of two or more of test, image, video, and signal). However, the contributions those correlates the information of divergent dimensions of the given learning corpus is still remaining in the nascent stage, where it is termed as cross media feature retrieval and optimization. This manuscript is intended to brief the recent escalations and future research scope in regard to cross-media feature retrieval and optimization. In regard to this, a contemporary review of the recent contributions has been portrayed in this manuscript. Keywords: Cross-media retrieval embedding DTRNN
Media-gap UCCG Multi-modal
1 Introduction It is vital for several domains such as healthcare, associating diversified media standards such as text, signals, and images, which often together denotes as cross-media. The methods of processing the cross-media to extract information (often denotes as features) has significant role in learning process of predictive analysis. The methods such as pattern recognition and computer vision are closely bonded to the cross-media that evinced huge significance in performance of corresponding domains [1]. Cross Media Feature Retrieval and Optimization is challenging due to the formation of the cross media learning corpus by interacting with manifold aspects on diverse conditions, comprising of changes in the illumination, stop-words, background interference. The work [2, 3] presents that many of retrieval methods concentrated on the single media like retrieval of image, text, audio, video and signal. © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 1125–1136, 2020. https://doi.org/10.1007/978-3-030-37218-7_118
1126
M. Ayyavaraiah and B. Venkateswarlu
Nevertheless, the challenge is intricate. The unavoidable occurrence of the heterogeneity gap among the information of cross media that is deliberated extensively as fundamental barrier towards feature retrieval of multi-model is the main reason for the challenge to be intricate. The usual model for bridging the gap will be for mapping the data of cross-media into normal feature space where required data is much closer to every other such that queries were carried out and predicted to get desired outcomes. Moreover, the challenge needs machine towards understanding the descriptions in the texts, images & their connections instead of just prominence in the verbs or images & nouns in texts. Through the improvement of network application & information technology, instant release of the information is developing and spreading quickly. Furthermore, the kind of information improved from the text to the multimedia. Voluminous quantity of the multi-media information & the formats of unstructured data make the retrieval of information, monitoring and analyzing the topic more intricate. It is significant aimed at information retrieval method for precisely examining the entire type of related data and combining the topic relevant to the results of cross-media from diverse sources. Hence, it is required for adopting the topic of cross-media analysis & the model of information retrieval for extracting the features from entire probable media data that are combined to identical space of semantic aimed at processing. The work [4] presents that the topic of analyzing the cross media and the information retrieval model can enhance the capacity of decision making and its effectiveness. Moreover, the knowledge will be extracted from dataset of cross-media to reflect contemporary burning topics and the event status is evaluated accurately. Through the increase of the multimedia data like image, audio, 3D model, text, video, the CMR became the significant domain, where the users are attaining the outcomes through several media kinds by submitting 1 query of media-type. The “media-gap” is the challenging issue of CMR, where the depictions of diverse media kinds are unreliable and present in diverse spaces of features. Therefore, it is hugely challenging for assessing the similarities amid them. Here there are numerous models proposed for addressing this problem through examining the correlations comprised in the data of cross-media. Many of the contemporary models are planned for the retrieval of 2 media types mostly text & images, yet CMR emphasizes the variety of media. The objective of CMR is identifying the most related texts in contemporary data for video or image in contemporary data & conversely. The other significant variance among them is caption of video or image concentrates only on video or image and the text that will not be simple for extending to other types of media. The CMR is better for 2 reasons. Primary one is that, providing the related outcomes with entire media types that is more convenient and resourceful for the consumers. Next is that diverse kinds of media cases are utilized always for describing the similar concept of semantic from diverse factors. Hence, in CMR, diverse media types can enhance over each other that is comparatively strong towards noise. Besides, users or consumers might submit any media data type as query that is useful and convenient.
Cross Media Feature Retrieval and Optimization
1127
2 Review of Literature The resources of online multi-media were rich in the visual information, text & audio. Mapping the semantics from low range to high range is often indefinite. For resolving such ambiguity in mapping, some of the researchers carried out joint analysis of the Cross-Media data, related to content & context. The work [5] suggested multimodality depictions of the web videos comprising features of latent semantic, audio, surrounding & visual features and implement later-fusion method to train the classifiers separately for diverse modalities and merged these classifiers outputs for estimating the anonymous video category. Further, analyzing the content based on the attention of videos or images is utilized for approximating the regions, events & objects of consumer interest [6]; the structural information of web page like neighboring context, links were also discussed for facilitating the CMR [7–9]. The work [10] predicted that representation of image can be viewed in the form of analogous towards “cross-lingual retrieval issue” and suggested “CMRM (cross-media relevance model)”. The work [1] built a “UCCG (uniform cross-media correlation graph)” as per the features of media objects and their information. For performing the CMR, the positive score will be allotted to query instance, and score will be spread along graph & objects of the aimed modality with maximum scores were returned. For improving the performance of retrieval, the relevant short term & long-term feedbacks are utilized for UCCG optimizing. The work [1, 11] implements the CMR on the level of semantic based on ontology. The work [12] presents the combined speculative method of mining the multi-media semantic, which utilizes the fundamental multimedia speculation characteristics for attaining the CMR. The work [13] proposes a method which integrated semantic description spatial temporal context with CMR, for implementing the prototype method. The work [14] suggests multi-information method related with semantic representation between retrieval map & multimedia information for completing the CMR. The work [15] presents that contemporary CMR algorithm concentrates mostly on extracting the feature & classification of information of the cross-media. Contemporarily, besides from existing retrieval system based on the content, the novel method came into survival known as CMR. Extracting the correlation amid the documents of heterogeneous multimedia, where modality of document of the input media & retrieved outcomes does not required being similar. The work [16, 17] presents that several systems of cross-media concentrated on combining & handling the data of heterogeneous multimedia. The work [18] proposed “dual cross-media relevance method” for annotating and retrieving the images automatically that deliberates both text-text & text-image associations. The work [15, 19] presents that; gathering of several modalities of media objects with huge amount of similarity amid them is called MMD (Multimedia document). The work [16] presents that generating the MMDs practically might lead to performance problems due to the association among existing images & texts differences. In the recent years, several image annotation models for the images of internet are proposed as there is a rapid growth in the image data of internet. The work [20] presents that same images were searched from web through extracting the common descriptions & representations from surrounding explanations of same images in the
1128
M. Ayyavaraiah and B. Venkateswarlu
form of annotation aimed at image query. The work [21] presents that new “kNNsparse graph-based semi-supervised learning” method is proposed for consecutively harnessing the unlabeled & labeled data that exploits the issue of annotating huge scale corpus of image through label propagation above boisterously web images which are tagged. The work [22] introduced novel algorithm called scalable learning for “multiclass image classification” on the basis of regularization penalty of trace-norm & “multi-nominal logistic loss”. The work [23] presents that utilizing “stochastic gradient descent optimization” might scale learning towards thousands of the classes and millions of the images. The tagging of image is utilized extensively in the website of image sharing in the internet, since the users can simply search, manage and generate the images with the assistance of tags. The work [24] established “large-scale automated photo tagging” model through discovering social-images & represented a method on the basis of retrieval for tagging the photo automatically. The work [25] suggested machine learning structure for extracting the social images & examining its application for tagging the image automatically. The work [26] presents that utilizing the information of image tagging for improving the understanding capabilities of image, where they suggest a new “multi-task multi-feature metric learning method” that models information sharing technique among diverse tasks of learning. This method is capable of learning simultaneously through semantic information & multi task learning structure on the basis of social tagging; hence both of them get assistance from the information offered through each other. Generally, the social-images comprise tags & other literal descriptions, therefore CMR models are resourceful for improving the understandings of the image. The work [27] presents the notion of the semantic propagation on the basis of relevant feedback & query expansion of cross modality that enhances the performances of retrieval when the query will be initialized either through instance image or keyword. The work [28] suggested cross-modality method that will not require entire data trained in the form of pairs of respective text & image. Therefore, their method will exploit accessibility of few pairs of the image-text, along with the text to text & image to image similarities for learning intrinsic associations among text & images. The “Multi-modal embedding” models map the data of cross-media into feature space in respect to search identical or related objects of cross media. The work [29] presents that “kernelized canonical correlation analysis (kCCA)” is utilized for mapping the visual & textual information and formed breakthrough in the segmentation & annotation. The work [30] developed multi-method deep Boltzmann machines for depicting the visual & textual information jointly. The work [31] learned the depiction of cross-media model in the view of optimizing the problem of list wise grading. The objective of this contribution is bi-directionally ranking the data of crossmedia. Contemporarily, there will be developing domain of contribution, which automatically generates the images caption. Feng et al. integrated the language methods, multi-model identical methods and visual detectors for yielding the images captions. The work [32] presents that the modal performance is predicted for having better or equal quality. The work [33] presents that the generative method is on the basis of deep recurrent structure, which integrated the upgrading of machine translation & computer vision. The work [34] introduced “deep visual-semantic embedding” method
Cross Media Feature Retrieval and Optimization
1129
for detecting the visual objects utilizing data of labeled image and semantic information collected from the text which is unlabeled. The work [35] proposed “DTRNN (Dependency Tree Recursive Neural Network)” towards textual information process. The work [36] suggested a method that embeds the fragments of the sentences & images into common space. Amid the models mentioned above, all of them utilized RNN for processing the textual information & used inner products among data features of cross media to explore the similarity. Except the method mentioned in above contribution, remaining methods extracted the global-features. Aimed at images, generally the background takes huge amount even though it is not significant. But when it turns to sentences, the keywords which are mined will match hardly the prominence in image. Therefore, inner product through global features might cause unavoidable mistake. Nevertheless, as stated in contribution [36], the fragments of the sentence were not suitable, specifically when comes to manifold adjectives for 1 noun. Moreover, it will be intricate for corresponding every fragment in 1 image with every phrase or word in related sentence. The work [15] explored regarding the understanding of “Harmonizing hierarchical manifolds for multimedia document semantics” & CMR. The work [15] presents trispace & ranking based model, which will be a heterogeneous measure of similarity for CMR, whereas other existing models concentrates on 3rd common space and their suggested tri-space model concentrates on entire features. Learning the joint representation of cross-media with the semi-supervised and sparse regularization and measuring the similarity content among diverse media will be main challenge. The work [37] proposed novel method called deep learning aimed at retrieving the related cross media. The suggested method focused on 2 stages like distance detection & feature extraction. After attaining the information of feature, they sorts and eradicates the redundant feature vectors. The work [38] suggested new “cross-media distance metric learning framework on the basis of multi-view matching & sparse feature selection”. The model engaged “sparse feature selection” for choosing subset of related features and eradicates the unnecessary features for audio features & “high-dimensional image features”. Further, the model built “Multi-modal Semantic Graph” for finding the “embedded manifold cross-media correlation”. Further, canonical correlations are merged with multi-information into “multiview matching” & constructed a “Cross-media Semantic Space” aimed at “cross-media distance measure”. The comparisons & simulations verified the superiority, applicability and validity of our method from diverse factors. The important confine is that database will be relatively smaller due to insufficient standard database & audio data. The work [16] applied “supervised coarse-to-fine semantic hashing model” for effective search of similarity over the datasets for learning sharing space. Generally, by leveraging semantic similarity among samples and to build coarse-fine similaritymatrix, the hamming space is learned. Later, iterative upgrading method is implemented for deriving optimal solutions which are local. Finally, to develop further the hashing codes discrimination, learning the “orthogonal rotation matrix” by reducing the loss of quantization while storing the undisturbed solution optimality. The simulations extensively utilized NUS-WIDE & Wiki datasets showed that suggested model surpassed the contemporary model. The cost of training affects sophisticated learning methods performance like kernel learning.
1130
M. Ayyavaraiah and B. Venkateswarlu
The work [39] introduced “SCH (Semantic Consistency Hashing)” model for the retrieval of cross-modal. The learned SCH shared simultaneously the semantic space by taking into account both the intra-modal & intermodal semantic correlations. The similar depiction is learned utilizing “non-negative matrix factorization” aimed at samples with diverse modalities in respect to store the semantic consistency of intermodal. In the mean-time, the algorithm of neighbor preserving is adopted for preserving each modality semantic consistency. The productive optimal algorithm is suggested for lessening the complexity of time and also extended easily towards classification of image, annotation and other domains of learning. The cost of training the “CMFH (Collective Matrix Factorization Hashing)” is more than SCH model in huge training dataset. Hence, SCH model might not be utilized effectively in the smallscale implementations. The work [40] discussed efficiency of CMR models in “SBIR (Sketch-based image retrieval)” that have been implemented in matching image-text successfully. Contemporary model could not tackle cross modal issue of SBIR. The simulation results displayed that subspace learning will model effectively the domain gap of sketch photo. Computational intricacy of analysis model is very intricate in the procedure of SBIR. The proposed model “HSNN (Heterogeneous Similarity measure with Nearest Neighbors)” [41] is for extracting the correlation of intra-media. Heterogeneous similarity is achieved through computing the possibility of 2 media objects appropriate to similar semantic class. The “CMCP (Cross-Media Correlation Propagation)” method is to deal simultaneously with negative & positive correlation among media objects of diverse types of media. The negative correlation is very significant due to the offering of effective special information. Moreover, the method is capable of propagating correlation among the heterogeneous-modalities. Lastly, both CMCP & HSNN are adaptable, such that any conventional similarity measure might be included. The effective ranking method is learned through the further fusion of manifold similarity measures by AdaRank aimed at CMR (Table 1).
Table 1. Contemporary contributions and their properties. Reference no.
Technique employed
Performance measure
Dataset
Benefits
Confines
[42]
Multi-graph cross modal hashing
Inter-media hashing, crossview hashing, composite hashing through manifold information sources
NUSWIDE Dataset, Wikipedia
The approximation of Nystrom method is utilized for constructing the productive graph and it displays the improved performance when it is compared with 2 unsupervised cross-modal hashing model
Designing the productive graph with enhancing intricacy in construction of graph, whose time might be linear is more intricate
(continued)
Cross Media Feature Retrieval and Optimization
1131
Table 1. (continued) Reference no.
Technique employed
Performance measure
Dataset
Benefits
Confines
[43]
Semantic correlations
NUSWIDE, Wikipedia, & X-Media datasets
Among 3 databases, this has greater efficiency than contemporary model
The data weight value is not utilized, which assists to increase the system performance
[44]
Correlation among attributes
Joint representation learning, canonical correlation analysis & cross modal factor analysis Ranking score
Shoes dataset
The image retrieval performance need to be enhanced & only image will be retrieved in this model
[45]
Semisupervised semantic factorization hashing
MAP score of cross-modal hash value, recall & precision
NUSWIDE, Wikipedia dataset
[46]
Link confidence measure
Hit ratio
Local dataset
This model takes the expedient of structural sparsity of demands & simulation outcomes display the updated performance when it is compared with contemporary model of the image retrieval method with identical queries This technique alters semantic tag into hash codes, which stores the better semantic values. The simulation outcome displays that its effectiveness is more than contemporary model This model achieves the intermediates match outcomes and estimates potential similarities & this assists the greater connectivity of graph aimed at the confined computational resources
The decay of value reasons in eigenvalue, whereas orthogonally there is a limit of hash code
The assessment will not be provided because of its intricacy to build the criteria of objective
(continued)
1132
M. Ayyavaraiah and B. Venkateswarlu Table 1. (continued)
Reference no.
Technique employed
Performance measure
Dataset
[47]
Semantic entity projection
MAP score
Wikipedia dataset
[48]
[49]
Benefits
The suggested model utilizes this projection model and simulation outcome display that system attains better outcomes when compared with another contemporary model Semantic Precision, MAP X-Media & The algorithm of information score, & recall Wikipedia iterative dataset optimization resolves the issue of resulting optimization and displays the improved outcome in both cross-media & single-media retrieval through state-of-art model Joint graph CCA+SMN CCA, X-Media, This assists for regularized JGRHML & CFA & learning greater heterogeneous Wikipedia level of semantic metric learning dataset metric with the label propagation
[50]
A cross-modal Precision, MAP & recall self-taught learning (CMSTL)
NUSWIDE & Wikipedia articles
[51]
Bayesian Precision, MAP & personalized Recall ranking based heterogeneous metric learning
Core l5k image, Wikipedia dataset
Confines The correlation among the levels of entity are not exploited fully and provides low result for lesser quantity of data
Many types of correlation amid diverse media will be utilized to enhance the performance
The “jointly modeling manifold modality” is not utilized in this model, which assists for utilizing in several applications The method is Hierarchical enlarged towards generation is image hashing & utilized for also over other achieving multimodal theme and it media. provides the improved performance in the cross-modal retrieval They combine the This model is carried out on heterogeneous & images and is homogeneous graph regularization required for testing other media such as into the required value & generated video, text & audio the enhanced performance
(continued)
Cross Media Feature Retrieval and Optimization
1133
Table 1. (continued) Reference no.
Technique employed
Performance measure
[52]
Deep MAP score multimodal learning model
Dataset
Benefits
Confines
NUS Wide 10 k dataset, Wikipedia
The simulation outcome show that suggested method is more effective when it compared with the other models
This process needs to be applied for utilizing the data which is unlabeled in the method for enhancing the performance of system
3 The Issues and Challenges in Cross-media Feature Retrieval Optimization The scope of data accumulation in recent era of computer aided automation and networks is massive. This tempo of massive data collection often leads to the accumulated corpus with unstructured, dynamic formats, large in volume, and multidimensional factors. The semantic gaps are the significant constraints of the cross-media feature retrieval and optimization, and the semantic gaps noted in sources of cross-media are (i) between the corpus tuples framed under temporal barriers, (ii) vivid dimensions of data representation formats, (iii) distribution diversity of the data projections a. within the format b. between the formats of the cross media. Hence, the critical research objectives of the cross-media centric predictive analytics by machine learning strategies are entails to defining significant methods of feature retrieval and optimization from cross-media with semantic gaps stated above. The methodological approaches those surpasses the semantic gaps of the crossmedia in regard to feature retrieval and optimization are Features of diversified mediums of the cross media are being transformed from multivariate to univariate. This is often done by considering the features of divergent dimensions of the cross-media data having significant correlation. Adapting the significance of the diversified features of the vivid mediums from the cross media. This is scaled by the significant associability between the features from vivid mediums. Identifying diversity or similarity of horizontal and vertical distributions of the values projecting corresponding features is another significant strategy. This enables to achieve maximal learning accuracy by partitioning the features in to diversified tuples, such that the entries of each tuple evince the high correlation.
1134
M. Ayyavaraiah and B. Venkateswarlu
4 Conclusion The contribution of this manuscript is a contemporary review that intended to identify the significance, scope and challenges of the cross-media information retrieval and optimization. The term “information” often denotes as features in regard to predictive analytics using machine learning. The contemporary review that portrayed in this manuscript is evincing the considerable research scope in cross media feature retrieval and optimization. The possible methodological approaches, which can be adapted by future research also concluded. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest. ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Zhuang, Y.-T., Yang, Y., Fei, W.: Mining semantic correlation of heterogeneous multimedia data for cross-media retrieval. IEEE Trans. Multimed. 10(2), 221–229 (2008) 2. Ma, Q., Nadamoto, A., Tanaka, K.: Complementary information retrieval for cross-media news content. Inf. Syst. 31(7), 659–678 (2006) 3. Mao, X., et al.: Parallel field alignment for cross media retrieval. In: Proceedings of the 21st ACM International Conference on Multimedia. ACM (2013) 4. Wu, Y.: Information management and information service in emergency forecast system. J. Doc. Inf. Knowl. 5, 73–75 (2006) 5. Yang, L., et al.: Multi-modality web video categorization. In: Proceedings of the International Workshop on Workshop on Multimedia Information Retrieval. ACM (2007) 6. Liu, H., et al.: A generic virtual content insertion system based on visual attention analysis. In: Proceedings of the 16th ACM International Conference on Multimedia. ACM (2008) 7. Gao, B., et al.: Web image clustering by consistent utilization of visual features and surrounding texts. In: Proceedings of the 13th Annual ACM International Conference on Multimedia. ACM (2005) 8. Noah, S.A.M., et al.: Exploiting surrounding text for retrieving web images. J. Comput. Sci. 4(10), 842–846 (2008) 9. Rege, M., Dong, M., Hua, J.: Graph theoretical framework for simultaneously integrating visual and textual features for efficient web image clustering. In: Proceedings of the 17th International Conference on World Wide Web. ACM (2008) 10. Jeon, J., Lavrenko, V., Manmatha, R.: Automatic image annotation and retrieval using crossmedia relevance models. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM (2003) 11. Hu, T., et al.: Ontology-based cross-media retrieval technique. Comput. Eng. 8 (2009) 12. Xu, D.-Z., Yi, T.: Multi-ontology query based on ontology matching. Comput. Technol. Dev. 18(11), 13–17 (2008) 13. Yi, Y., Tongqiang, G., Yueting, Z., Wenhua, W.: Cross-media retrieval based on synthesis reasoning model. J. Comput.-Aided Des. Comput. Graph. 9 (2009)
Cross Media Feature Retrieval and Optimization
1135
14. Yang, L., Fengbin, Z., Baoqing, J.: Research of cross-media information retrieval model based on multimodal fusion and temporal-spatial context semantic. J. Comput. Appl. 29(4), 1182–1187 (2009) 15. Guo, Y.-T., Luo, B.: Image semantic annotation method based on multi-modal relational graph. J. Comput. Appl. 30(12), 3295–3297 (2010) 16. Yang, Y., et al.: Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. IEEE Trans. Multimed. 10(3), 437–446 (2008) 17. Yang, J., Li, Q., Zhuang, Y.: Octopus: aggressive search of multi-modality data using multifaceted knowledge base. In: Proceedings of the 11th International Conference on World Wide Web. ACM (2002) 18. Kim, H.H., Park, S.S.: Mediaviews: a layered view mechanism for integrating multimedia data. In: International Conference on Object-Oriented Information Systems. Springer, Heidelberg (2003) 19. Liu, J., et al.: Dual cross-media relevance model for image annotation. In: Proceedings of the 15th ACM International Conference on Multimedia. ACM (2007) 20. Zhuang, Y., et al.: Manifold learning based cross-media retrieval: a solution to media object complementary nature. J. VLSI Signal Process. Syst. Signal Image Video Technol. 46(2–3), 153–164 (2007) 21. Martinec, R., Salway, A.: A system for image–text relations in new (and old) media. Vis. Commun. 4(3), 337–371 (2005) 22. Wang, X.-J., et al.: Annotating images by mining image search results. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1919–1932 (2008) 23. Li, X., Snoek, C.G.M., Worring, M.: Learning social tag relevance by neighbor voting. IEEE Trans. Multimed. 11(7), 1310–1322 (2009) 24. Harchaoui, Z., et al.: Large-scale image classification with trace-norm regularization. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2012) 25. Perronnin, F., et al.: Towards good practice in large-scale learning for image classification. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2012) 26. Wu, L., et al.: Distance metric learning from uncertain side information with application to automated photo tagging. In: Proceedings of the 17th ACM International Conference on Multimedia. ACM (2009) 27. Wu, P., et al.: Mining social images with distance metric learning for automated image tagging. In: Proceedings of the fourth ACM International Conference on Web Search and Data Mining. ACM (2011) 28. Wang, S., et al.: Multi-feature metric learning with knowledge transfer among semantics and social tagging. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2012) 29. Zhang, H.J., Su, Z.: Improving CBIR by semantic propagation and cross modality query expansion. In: Proceedings of the International Workshop on Multimedia Content-Based Indexing and Retrieval, Brescia (2001) 30. Jia, Y., Salzmann, M., Darrell, T.: Learning cross-modality similarity for multinomial data. In: 2011 International Conference on Computer Vision. IEEE (2011) 31. Socher, R., Fei-Fei, L.: Connecting modalities: semi-supervised segmentation and annotation of images using unaligned text corpora. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE (2010) 32. Srivastava, N., Salakhutdinov, R.R.: Multimodal learning with deep boltzmann machines. In: Advances in Neural Information Processing Systems (2012) 33. Wu, F., et al.: Cross-media emantic representation via bi-directional learning to rank. In: Proceedings of the 21st ACM International Conference on Multimedia. ACM (2013)
1136
M. Ayyavaraiah and B. Venkateswarlu
34. Fang, H., et al.: From captions to visual concepts and back. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015) 35. Vinyals, O., et al.: Show and tell: a neural image caption generator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015) 36. Frome, A., et al.: Devise: a deep visual-semantic embedding model. In: Advances in Neural Information Processing Systems (2013) 37. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997) 38. Karpathy, A., Joulin, A., Fei-Fei, L.F.: Deep fragment embeddings for bidirectional image sentence mapping. In: Advances in Neural Information Processing Systems (2014) 39. Jiang, B., et al.: Internet cross-media retrieval based on deep learning. J. Vis. Commun. Image Represent. 48, 356–366 (2017) 40. Zhang, H., et al.: A cross-media distance metric learning framework based on multi-view correlation mining and matching. World Wide Web 19(2), 181–197 (2016) 41. Yao, T., et al.: Supervised coarse-to-fine semantic hashing for cross-media retrieval. Digit. Signal Process. 63, 135–144 (2017) 42. Yao, T., et al.: Semantic consistency hashing for cross-modal retrieval. Neurocomputing 193, 250–259 (2016) 43. Xu, P., et al.: Cross-modal subspace learning for fine-grained sketch-based image retrieval. Neurocomputing 278, 75–86 (2018) 44. Zhai, X., Peng, Y., Xiao, J.: Cross-media retrieval by intra-media and inter-media correlation mining. Multimed. Syst. 19(5), 395–406 (2013) 45. Xie, L., Zhu, L., Chen, G.: Unsupervised multi-graph cross-modal hashing for large-scale multimedia retrieval. Multimed. Tools Appl. 75(15), 9185–9204 (2016) 46. Peng, Y., et al.: Semi-supervised cross-media feature learning with unified patch graph regularization. IEEE Trans. Circuits Syst. Video Technol. 26(3), 583–596 (2016) 47. Cao, X., et al.: Image retrieval and ranking via consistently reconstructing multi-attribute queries. In: European Conference on Computer Vision. Springer, Cham (2014) 48. Deng, J., Du, L., Shen, Y.-D.: Heterogeneous metric learning for cross-modal multimedia retrieval. In: International Conference on Web Information Systems Engineering. Springer, Heidelberg (2013) 49. Kim, K., et al.: Match graph construction for large image databases. In: European Conference on Computer Vision. Springer, Heidelberg (2012) 50. Huang, L., Peng, Y.: Cross-media retrieval via semantic entity projection. In: International Conference on Multimedia Modeling. Springer, Cham (2016) 51. Zhai, X., Peng, Y., Xiao, J.: Learning cross-media joint representation with sparse and semisupervised regularization. IEEE Trans. Circuits Syst. Video Technol. 24(6), 965–978 (2014) 52. Zhai, X., Peng, Y., Xiao, J.: Heterogeneous metric learning with joint graph regularization for cross-media retrieval. In: Twenty-Seventh AAAI Conference on Artificial Intelligence (2013) 53. Xie, L., et al.: Cross-modal self-taught learning for image retrieval. In: International Conference on Multimedia Modeling. Springer, Cham (2015) 54. Wang, J., et al.: Semi-supervised semantic factorization hashing for fast cross-modal retrieval. Multimed. Tools Appl. 76(19), 20197–20215 (2017) 55. Qi, J., Huang, X., Peng, Y.: Cross-media retrieval by multimodal representation fusion with deep networks. In: International Forum of Digital TV and Wireless Multimedia Communication. Springer, Singapore (2016)
Conversion from Lossless to Gray Scale Image Using Color Space Conversion Module C. S. Sridhar1(&), G. Mahadevan2, S. K. Khadar Basha3, and P. Sudir3 1
Bharathiar University, Coimbatore, India [email protected] 2 ANNAI College, Kumbakonam, India [email protected] 3 SJCIT, Chickballapur, India [email protected], [email protected]
Abstract. Compression plays a vital role in better utilization of available memory space and bandwidth. Hence it is widely used for image archiving, image transmissions, multimedia applications, cloud-computing, E-commerce, etc. For effective presentation and utilization of digital image specific techniques are required to reduce the number of bits. There are two types of image namely Lossy and Lossless. Conversion from lossless to lossy images requires specialized and licensed software. A software prototype module has been developed using color space conversion module using OpenCV which converts lossless image to lossy image after resizing, further convert’s lossy image to gray scale image and vice-versa. The lossless images such as TIFF, BITMAP, and PNG are initially resized, converted to lossy image and further converted to Gray scale. On conversion to gray-scale it has been found that the number of bits has been considerably reduced. Performance analysis of software proto type is obtained by comparing the Decompressed image quality with respect to compression ratio, compression factor, PSNR (Power to Signal Noise Ratio) and MSE (Maximum Signal Error). The results of the developed color space module are compared and presented in the form of tables and snapshots. Keywords: Color space module
OpenCV Bitmap PNG Tiff JPEG
1 Introduction The compression of digital images has been largely focused in research area in recent years. New algorithms are developed for compression and compact representation of images. Digital image processing exploits the problem of redundancy by taking the exact amount of information required. It is a process that intends to yield a compact representation of an image and redundant data. The compression occurs by taking the benefits of redundancy information of the image. This property is exploited for achievement of better storage space and bandwidth. Compression of image is achieved by exploiting the properties of inter pixel, coding redundancy, psycho visual redundancy [4]. There are two types of image compression, namely lossless and lossy image © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 1137–1145, 2020. https://doi.org/10.1007/978-3-030-37218-7_119
1138
C. S. Sridhar et al.
compression [6]. Bandwidth plays a very important role for transmission of images over wireless network. Hence it becomes very crucial for reducing the size of the image before transmission.
2 Methodology 2.1
Image Compression
Basically there are two types of image compression (Fig. 1).
Fig. 1. Compression types
Lossless image compression reduces the size of an image without any quality loss. Hence it is used to retain the highest quality of the image by using more storage space. In case of Lossy image compression the size of the image reduces significantly by losing some data from the original. It plays a very important role for bloggers, photography, E-commerce and cloud computing. 2.2
Parameters for Measurement
It mainly depends on compressed file, algorithms and exploitation of redundancy in the image. The parameters considered for measurement are compression ratio, compression factor, percentage of saving, mean square error and peak signal to noise ratio. Figure 2 shown below, describes the Block Diagram of the software prototype module developed using open source tool. Initially the incoming image is resized using
Fig. 2. Block diagram for lossless to Gray scale and vice-versa
Conversion from Lossless to Gray Scale Image
1139
interpolation technique. Conversion of Lossless Image to Lossy Image after Resizing [2, 3] is done. Lossy image is further converted to gray scale Image using color space conversion module. The converted Gray scale image is stored. Gray scale image will be converted into the original in the reverse process using decompression algorithm [9]. 2.3
Equations compressed image size original image size
1.
compression ratio ¼
2.
compression factor ¼
3. Percentage of saving in bits ¼
4.
original image size compressed image size
Original image size - compressed image size 100 original image size
MSE¼
m1 X n1 1 X ½Iði; jÞ Kði; jÞ2 m n i¼0 j¼0
Where I denotes Original image and K denotes decompressed image MAX12 PSNR ¼ 10 log10 MSE MAXI 5. ¼ 20 log10 pffiffiffiffiffiffiffiffiffiffi MSE ¼ 20 log10 ðMAXI Þ 10 log10 ðMSEÞ Qualitative measurements are taken to compare different image enhancement algorithms. The parameters to be considered for measurements are chosen according to their requirements (Fig. 3).
1140
C. S. Sridhar et al.
Fig. 3. Flow chart for proto type model
Compression ratio is a dimensionless quantity. It is measured as a ratio of compressed image to original image. Compression factor is a reciprocal of compression ratio. Percentage of saving is the number of bits saved after compression [6, 8]. The methodology of the flow chart for conversion from lossless to grayscale is as same as that explained in the block diagram.
Conversion from Lossless to Gray Scale Image
2.4
Comparative Results
See Table 1.
Image Format
Original Image Size in Pixels
Compressed Image Size in Pixels
Compression Ratio
Compression Factor
Percentage of saving
Table 1. Comparisions of results
PNG
432000
276480
0.64
1.5625
36%
TIFF
262144
128164
0.48
2.045
51.11%
BITMAP
480000
307200
0.64
1.5625
36%
JPEG
341000
242110
0.71
1.408
29%
Image Format
Mean Square Error
PSNR
(Lossy)
PNG
12.175
37.276
TIFF
8.4717
38.851
BITMAP
6.842
39.778
JPEG
19.8113
35.161
(Lossy)
1141
1142
C. S. Sridhar et al.
TIFF format
Original Image
Resized Imag
L o s s y Im a g e (Jp e g Im a g e )
Conversion from Lossless to Gray Scale Image
1143
1144
C. S. Sridhar et al.
3 Conclusion and Future Enhancement This software prototype module can convert lossless image to gray scale image using Open source. From the results it is evident that there is a great savings in number of bits after converting to gray scale and highest quality of the De-compressed image has been maintained. In future Color space conversion module can be implemented using hardware [7]. Acknowledgement. We kindly acknowledge NCH software for using trail version of pixillion image converter for comparison of results. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest. ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Sridhar, C.S., Mahadevan, G., Khadar basha, S.K.: Design and ımplementation of a color space conversion module using open- computer vision. J. Adv. Res. Dyn. Control. Syst. Scopus SJR, vol. 12-special issue, 2321–2329 (2017). ISSN 1943-023X 2. Sridhar, C.S., Mahadevan, G., Khadar basha, S.K.: Compression of image using hybrid combination of haar wavelet transform and color space model in noisy environment. International journal of Engineering and technology 7(1.5), 283–285 (2018). 12-special issue
Conversion from Lossless to Gray Scale Image
1145
3. Sridhar, C.S., Mahadevan, G., Khadar Basha, S.K.: Transmission of ımages using ımage compressions for low powered wireless networks. J. Adv. Res. Dyn. Control. Syst. 2158– 2165 (2018). ISSN 1943-023x. 12-special issue 4. Donoho, D.L., Vetterli, M., Devore, R.A., Daubechies, I.: Data Compression and harmonic analysis. IEEE Trans. Inf. Theory 44, 2435–2476 (1998) 5. Haar, A.: Zur theorieder orthogonalen funktionensysteme. Math. Ann. 69(3), 331–371 (1910) 6. Besslich, P., Trachtenberg, E.A.: The sign transform: an ınvertible non-linear transform with quantized coefficients. In: Moraga, C. (ed.) Theory and Application of Spectral Technique, pp. 150–160. University Dortmund Press, Dortmund (1988) 7. Chen, F., Chandrakasan, A.P., Stojanovic, V.M.: Design and analysis of a hardware-efficient compressed sensing architecture for data compression in wireless sensors. IEEE J. SolidState Circuits 47(3), 744–756 (2012) 8. Díaz, M., et al.: Compressed sensing of data with a known distribution. Appl. Comput. Harmon. Anal. 45(3), 486–504 (2018) 9. Duda, K., Turcza, P., Marszałek, Z.: Low complexity wavelet compression of multichannel neural data. In: 2018 International Conference on Signals and Electronic Systems (ICSES). IEEE (2018) 10. Nassar, M.R., Helmers, J.C., Frank, M.J.: Chunking as a rational strategy for lossy data compression in visual working memory. Psychol. Rev. 125(4), 486 (2018) 11. Sahin, A., Martin, O., O’Hare, J.J.: Intelligent data compression. U.S. Patent Application No. 10/013,170 12. Parvanova, R., Todorova, M.: Compression of images using wavelet functions. Appl. Comput. Technol. ACT, 90 (2018) 13. Zemliachenko, A.N., Abramov, S.K., Lukin, V.V., Vozel, B., Chehdi, K.: Preliminary filtering and lossy compression of noisy remote sensing images. In: Image and Signal Processing for Remote Sensing XXIV, 10789, p. 107890 V. International Society for Optics and Photonics, October 2018 14. Gregor, K., Besse, F., Rezende, D. J., Danihelka, I., Wierstra, D.: Towards conceptual compression. In: Advances In Neural Information Processing Systems, pp. 3549–3557 (2016)
Iris Recognition Using Bicubic Interpolation and Multi Level DWT Decomposition C. R. Prashanth1(&), Sunil S. Harakannanavar2, and K. B. Raja3 1
3
Dr. Ambedkar Institute of Technology, Bangalore, Karnataka, India [email protected] 2 S. G. Balekundri Institute of Technology, Belagavi, Karnataka, India [email protected] University Visvesvaraya College of Engineering, Bangalore, Karnataka, India
Abstract. Iris recognition is endorsed as the most important one as it takes a live tissue verification. Iris recognition model is presented having multiple descriptors viz., DWT, ICA and BSIF for producing hybrid features. The 2nd level DWT is performed to produce unique iris features. The high frequency bands of DWT are employed by performing bicubic interpolation to generate new sub-bands. Inverse DWT is applied on low frequency and higher frequency subbands of DWT and interpolation techniques for reconstruction of images. The features are selected based on ICA in collabration with BSIF for extraction of features. Test image and database features are matched in the presence of ED to accept or reject the sample on CASIA database. The iris model performed better accuracy compared with other iris biometric system. Keywords: Biometrics Euclidean distance
Physiological Bicubic interpolation Behavioral
1 Introduction Biometric trait “iris” falls under physiological characteristics of humans to authenticate an individual. Unique features of iris texture provide a better for authentication of person. Compared to other biometric traits, iris recognition is considered as most important one as it takes a live tissue verification. Reliability of system of the authentication depends on the acquired signal and the recorded signal of a live person, which cannot be manufactured as a template. The human eye consists of pupil, iris and sclera [3]. The texture of iris is categorized as quite complex and unique, so it is considered as one of the secured biometric trait to identify an individual. It is more reliable and accurate for identification. So it is widely accepted in national identification programs for processing of millions of peoples data. It is a muscular portion within pupil. It is observed that no two irises are same even for the twins of parents. It represents point between sclera and pupil and having unusual structure that generates numerous features (crypts, collarets, freckles, corona etc). Here, iris recognition model is presented. The multistage decomposition of DWT is performed for extraction of features. Bicubic interpolation is applied on higher components of 1st stage DWT to produce new set of sub bands. Inverse DWT is used to reconstruct the image. The BSIF © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 1146–1153, 2020. https://doi.org/10.1007/978-3-030-37218-7_120
Iris Recognition using Bicubic Interpolation
1147
and ICA are adopted for obtaining hybrid features. The features are matched using ED classifier. The structure of paper continues in such a way that the existing methodologies of iris recognition are explained in Sect. 2. Iris model is described in Sect. 3. Performance analysis is explained in Sect. 4. Proposed iris work is concluded in Sect. 5.
2 Related Work Neda et al. [1] adopted multilayer perceptron NN to recognize the human iris. The two dimensional Gabor kernel technique was employed to extract the detailed coefficients of iris and experiments are conducted on CASIA v-3. The model needs to work with the combination of fuzzy systems and MLPNN-PSO methods with different classification methods. Syed et al. [2] described HT and optimized light version technique to recognize the iris samples in smart phones. The generated iris features are compared with test features using HD on CASIA v4 dataset to measure accuracy. Bansal et al. [4] designed adaptive fuzzy filter to remove the noise. The extracted features using Haar transformation are compared with test features using HD on IITD database. However, the model was developed only for the images corrupted with impulse noise and the method needs to be extended for other types of noises to improve the recognition rate. Li [5] applied HT and calculus techniques to obtain the location of iris from the eye samples. Using Gabor filter bank iris features are extracted. The coefficients of test and database are matched in presence of ED. The experiments are tested on own iris database consisting of 8 individuals and 16 different samples of iris. However the model is tested only on limited samples of iris, the recognition rate decreases for large scale sample of database. Muthana et al. [6] adopted PCA and Fourier descriptors to obtain genuine features of image. The significant details of the iris patterns were represented by high spectrum coefficients whereas low spectrum coefficients provide the general description of iris pattern. The performance of iris model was evaluated on three classifiers viz., cosine, Euclidean and Manhattan classifiers to accept or reject the image samples. Some of the existing algorithms for iris recognition is tabulated in Table 1. Table 1. Summary of different Existing methodologies for Iris Recognition Authors Kiran et al. [7]
Techniques BSIF and SIFT
Kavita et al. [8]
LGW and HAAR Wavelet.
Aparna et al. [9]
Combinations of HAAR transform and sum block algorithms are used. Ximing et al. [10] 2D Gabor Wavelet transform
Limitations Iris model needs to be improved for larger database and new deep method should be developed to get accuracy Work to be carried for other artificial intelligence techniques to improve FRR Algorithms should develop to robust in its results Work to be carried for other artificial intelligence techniques to improve accuracy
1148
C. R. Prashanth et al.
3 Proposed Methodology In proposed model, multistage decomposition of DWT is performed for extraction of unique iris features. Bicubic interpolation is applied on higher components of 1st stage DWT for the production of new subbands. Further, inverse DWT is used to reconstruct the image. The final features of iris are generated using BSIF and ICA filter. Finally features are matched using ED classifier on CASIA database. The iris model is given in Fig. 1.
Fig. 1. Block diagram of proposed model
The sample of CASIA iris images are considered to extract iris region. The morphological operations are carried out on eye image to obtain segmented iris of regions (left and right), each side is having a size of 40 pixels. The obtained regions are concatenated to form a template and is shown in Fig. 2.
Iris Recognition using Bicubic Interpolation
1149
Fig. 2. Iris template creation [12]
Fig. 3. Decomposition of 1st level 2D-DWT on iris template
The 2D DWT [9, 10] is employed on iris template to produce LL1 band having significant content of iris and other information is having detailed data. Figures 3 and 4, shows the decomposition of 1st level 2D-DWT and 2nd level 2D-DWT on iris template.
Fig. 4. Decomposition of 2nd level 2D-DWT on iris template
Bicubic interpolation is employed on detailed region of DWT for generation of new sub-band images. The IDWT is considered for reconstruction of images from subbands having smaller bandwidths. ICA having 5X5 with size 10 are adopted to filter iris template inturn produces binary images [11]. Binariezed features are extracted using BSIF and the sample image of BSIF is given in Fig. 5.
1150
C. R. Prashanth et al.
Fig. 5. BSIF output
Finally the hybrid features are matched in presence of ED classifier for identification of an individual using Eq. 1 [12]. Euclidean Distance ¼
M X ð P i qi Þ 2
ð1Þ
i¼1
Where, ‘M’ is no. of coefficients, Pi denotes coefficient values of database and qi is related to test images.
4 Experimental Results The accuracy of iris model is evaluated by considering CASIA iris database [13–15]. First 6 images of the database are used to calculate FAR, FRR and TSR [14]. The remaining image of the database is used as out of database to evaluate FAR [16–19]. Table 2 gives the OTSR and MTSR for different combinations of PID and POD’s. It is seen that, the values of OTSR and EER varies and doesn’t maintain constant for increasing in PID’s when POD kept constant. The MTSR for different combinations of [20, 30, 40, 50, 60]:[30] PIDs and POD results 100%. Figure 6 shows the plot of performance parameters for 40:30 combination of PID and POD. For the iris model, as the threshold value varies the FRR decreases and FAR, TSR increases respectively. The TSR value of 95% and EER of 5% is recorded for the proposed iris model for 40:30 of PID and POD values. Table 3 gives the OTSR and MTSR for different values of PID and POD’s. It is seen that, the values of OTSR and EER remains 93%, and 6.67% respectively for increasing in POD’s when PID kept constant. The MTSR for different combinations of [30]:[20, 30, 40, 50, 60] PIDs and POD results 100%.
Table 2. EER, OTSR and MTSR for different values of PID and POD’s PID 20 30 40 50 60
POD 30 30 30 30 30
EER 10 6.67 5 7 10
OTSR 90 93.33 95 93 90
MTSR 100 100 100 100 100
Iris Recognition using Bicubic Interpolation
1151
Figure 7 shows the plot of performance parameters for 30:40 combination of PID and POD. For the iris model, as the threshold value varies the FRR decreases and FAR, TSR increases respectively. The TSR value of 93.33% and EER of 6.67% is recorded for the proposed iris model for 30:40 combinations of PID and POD.
Fig. 6. Simulation of Performance parameters for 40:30 (PID:POD)
Table 3. EER, OTSR and MTSR for different values of PID and POD’s PID 30 30 30 30 30
POD 20 30 40 50 60
EER 6.67 6.67 6.67 6.67 6.67
OTSR 93.33 93.33 93.33 93.33 93.33
MTSR 100 100 100 100 100
1152
C. R. Prashanth et al.
Fig. 7. Simulation of Performance parameters for 30:40 (PID:POD)
5 Conclusion In this paper, 2nd level DWT is performed to produce unique iris features. The high frequency bands of DWT are employed by performing bicubic interpolation to generate new sub-bands. Inverse DWT is applied on lower and higher frequency subbands of DWT and interpolation techniques for reconstruction of images. The features are selected based on ICA in collabration with BSIF for extraction of features. Test image and database features are matched in the presence of ED to accept or reject the sample on CASIA database. In future, model needs to be adopted with fusion of different domains viz spatial and transform domains to improve its recognition. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest. ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Ahmadi, N., Akbarizadeh, G.: Hybrid robust iris recognition approach using Iris image preprocessing, two-dimensional Gabor features and multi-layer perceptron neural network/PSO. IET Biometrics 7(2), 153–162 (2018)
Iris Recognition using Bicubic Interpolation
1153
2. Ali, S., Shah, M.A., Javed, T., Abdullah, S., Zafar, M.: Iris recognitionsystem in smartphones using light version recognition algorithm. In: IEEE International Conference on Automation & Computing, pp. 1–6 (2017) 3. Choudhury, B., Then, P., Issac, B., Raman, V., Haldar, M.K.: A survey on biometrics and cancellable biometrics systems. Int. J. Image Graph. 18(1), 1–28 (2018) 4. Bansal, R., Juneja, A., Agnihotri, A., Taneja, D., Sharma, N.: A fuzzified approach to Iris recognition for mobile security. In: IEEE International Conference on Electronics, Communication and Aerospace Technology, pp. 246–251 (2017) 5. Li, Z.: An Iris recognition algorithm based on coarse and fine location. In: IEEE International Conference on Big Data Analysis, pp. 744–747 (2017) 6. Hamd, M., Ahmed, S.: Biometric system design for Iris recognition using ıntelligent algorithms. Int. J. Modern Educ. Comput. Sci. 10(3), 9–16 (2018) 7. Joshi, K., Agrawal, S.: An Iris recognition based on robust ıntrusion detection. In: Annual IEEE India Conference, pp. 1–6 (2016) 8. Arunalatha, J.S., Rangaswamy, Y., Shaila, K., Raja, K.B., Anvekar, D., Venugopal, K.R., Iyengar, S.S., Patnaik, L.M.: IRHDF: Iris recognition using hybrid domain features. In: Annual IEEE India Conference, pp. 1–5 (2015) 9. Gale, A.G., Salankar, S.S.: Evolution of performance analysis of Iris recognition system by using hybrid method of feature extraction and matching by hybrid classifier for Iris recognition system. In: IEEE International Conference on Electrical, Electronics and Optimization Techniques, pp. 3259–3263 (2016) 10. Alkassar, S., Woo, W.L., Dlay, S.S., Chambers, J.A.: Robust sclera recognition system with novel sclera segmentation and validation techniques. IEEE Trans. Syst. Man Cybern. Syst. 47(3), 474–486 (2017) 11. Zhao, D., Fang, S., Xiang, J., Tian, J., Xiong, S.: Iris template protection based on local ranking. Int. J. Secur. Commun. Netw. 2018, 1–9 (2018) 12. Kaur, B., Singh, S., Kumar, J.: Robust Iris recognition using moment invariants. Wireless Pers. Commun. 99(2), 799–828 (2018) 13. Bhagat, K., Deshmukh, R., Patil, P., Kirange, D.K., Waghmare, S.: Iris recognition using radon transform and GLCM. In: IEEE International Conference on Advances in Computing, Communications and Informatics, pp. 2257–2263 (2017) 14. Wei, X., Wang, H., Wan, H., Scotney, B.: Multiscale feature fusion for face ıdentification. In: IEEE International Conference on Cybernetics, pp. 1–6 (2017) 15. Wang, Q., Liu, Z., Tong, S., Yang, Y., Zhang, X.: Efficient Iris localization via optimization model. Hindawi Math. Probl. Eng. 2017, 1–9 (2017) 16. Joshi, K., Agrawal, S.: An Iris recognition based on robust intrusion detection. In: IEEE Annual India Conference, pp. 1–6 (2016) 17. Vishwakarma, D.K., Jain, D., Rajora, S.: Iris detection and recognition using two-fold techniques. In: International Conference on Computing, Communication and Automation, pp. 1046–1051 (2017) 18. Zhang, W., Xue, P., Wang, C.: Application of convolution neural network in Iris recognition technology. In: IEEE International Conference on Systems and Informatics, pp. 1169–1174 (2017) 19. Samant, P., Agarwal, R., Bansal, A.: Enhanced discrete cosine transformation feature based iris recognition using various scanning techniques. In: IEEE International Conference on Electrical, Computer and Electronics, pp. 660–665 (2017)
Retinal Image Classification System Using Multi Phase Level Set Formulation and ANFIS A. Jayachandran1(&), T. Sreekesh Namboodiri2, and L. Arokia Jesu Prabhu3 1
3
Department of CSE, Presidency University, Bangalore, India [email protected] 2 Department of CSE, PSN College of Engineering and Technology, Tirunelveli, India [email protected] Department of CSE, Sri Sakthi Institute of Engineering and Technology, Coimbatore, India [email protected]
Abstract. Computerized systems for eye maladies distinguishing proof are significant in the ophthalmology field. Conservative systems for the detection of eye disease depend on labour-intensive awareness of the retinal segments. This work exhibits another directed technique for hemorrhages discovery in advanced retinal images. This strategy utilizes an ANFIS plot for pixel association and registers a 5-D vector made out of dim dimension and Cross Section Profie (CSP) Study-constructed highlights for pixel portrayal. Classification of diseases is a crucial aspect in eye disease categorization through image processing techniques. The categorization of diseases according to pathogen groups is a significant research domain and potentially a challenging area of work. Various classification techniques for single as well as multiple diseases is identified. Classification and detection are very similar, but in classification primary focus is on the categorization of various diseases and then the classification according to various pathogen groups. Keywords: DR extraction
CSP ANFIS Classification Retinal imaging Feature
1 Introduction Ongoing advances in PC innovation have empowered the advancement of various sorts of fully automatic computer based diagnosis Presently, restorative picture investigation is an exploration zone that pulls in a ton of worry from the two researchers and doctors. Electronic restorative imaging and examination techniques utilizing different modalities have encouraged early determination [1, 2]. The presence of hemorrhages is commonly used to analyze DR or hypertensive retinopathy by utilizing the arrangement plan of Scheie. Notwithstanding distinguishing microaneurysms, it is troublesome for ophthalmologists to discover them in noncontrast fundus pictures. The unpredictability found in a microaneurysm picture is extraordinarily low; thus, ophthalmologists for the © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 1154–1161, 2020. https://doi.org/10.1007/978-3-030-37218-7_121
Retinal Image Classification System Using Multi Phase Level Set Formulation
1155
most part recognize microaneurysms by utilizing fluorescein angiograms. In any case, it is hard to utilize fluorescein as a difference mode for diagnosing all the medicinal examinees exposed to mass screening. A few robotized methods have been accounted for to evaluate the adjustments in morphology of retinal pots demonstrative of retinal ailments. A portion of the procedures quantity the pot morphology as a normal esteem speaking to the whole vessel organize, e.g., normal tortuosity [3]. Anyway as of late, vessel morphology estimation explicit to supply routes or veins was observed to be related with malady. Contours usually contain key visual information of an image. In computer vision, contours have been widely used in many practical tasks. Although quite a few contour detection methods have been developed over the past several decades, contour detection is still a challenging problem in the image field. Among the non-learning approaches, many early methods, such as the famous Canny detector, find contours by extracting edges where the brightness or color changes sharply. However, such methods usually employ regular kernels, e.g., Gaussian filter and Gabor filter, to measure the extents of local changes, and thus can hardly deal with textures. To address this problem, many texture suppression methods have been proposed. Examples are the method based on nonclassical receptive field inhibition, the method based on sparseness measures, the method based on surround-modulation, etc. It has been validated that texture suppression can help improve contour detection performance. Nonetheless, these methods still mainly use low-level local features. Moreover, some of them are computationally heavy, which leads to difficulties in practical applications [4–6]. In classification, SVM is a popular supervised machine learning technique which is designed primarily to solve binary classification tasks. In an SVM, the classification task is performed by finding the optimum separating hyperplane with maximum margin using structural risk minimization (SRM) principle [7]. Since the present study deals with four binary classification tasks, therefore SVM is chosen in this work. In the case of non-linear SVMs, a method to map the training sample to a high dimension feature space exists. For mapping purpose, different kernel functions are used and the mapping operation is done satisfying Mercer’s theorem. The different kernel functions present in an SVM are linear, polynomial and radial basis function. k-nearest neighbour (kNN) is another well-known machine learning technique widely used for solving different classification problems. The advantages of using kNN algorithm are its robustness and relatively ease of implementation. A kNN assigns class to the test data by majority voting technique, depending on the most frequent class of its nearest neighbours [8–10]. In a kNN algorithm, there are two important parameters. The first one is the distance parameter and the second one is the kvalue. In this study, the performance of the classifier is evaluated considering Euclidean distance parameter and by varying the k value from 2 to 10. One downside to these methodologies is their reliance upon strategies for finding the beginning stages, which should dependably be either at the optic nerve or at in this way recognized branch focuses. Veins were distinguished by methods for scientific morphology [11, 12]. Coordinated channels were connected related to different methods. The planned discharge identification strategy is displayed in area 2, the test results are exhibited in segment 3 and the end in segment 4.
1156
A. Jayachandran et al.
2 Hemorrhages Detection System In this work, we propose a system for optic disc localization and its segmentation. The method can be used as a initial step in the design of a CAD system for many retinal diseases. The proposed method is based on ANFIS system. The optic disc is located using the brightest pixel in the green frequency of the retinal image, and the region of detection algorithm is employed to precisely extract the optic disc. The segmented optic disc is approximated to the nearest possible circle to calculate its diameter in terms of numbers of pixels. The method is designed and implemented using a hospital fundus database and it is quantitatively evaluated by the opthalmology domain experts on seven publicaly available fundus databases namely, STARE and DRIVE. In this study, for designing an optimized CNN to segment the ROI areas from the background, GOA is utilized. The duty of the GOA here is to justify the number of hyper-parameters of CNN to achieve better performance than the manual justification. The candidate solutions in this configuration are a sequence of integers. For preventing the system error, the minimum (min) and the maximum (max) limitations are considered 2 and the size of the sliding window, respectively. The minimum value 2 here represents that the allowed min value for the max pooling so that there does not exist any lower size. It should be noted that this optimization problem has an inequality constraint such that the value of the sliding window should be less than the input data. The half-value precision of CNN is selected as the cost function on a breast cancer validation process. Because of the presence of CNN and GOA, the general configuration has a high computational cost; because each agent of the swarm of the CNN needs to be trained on the breast cancer dataset by applying backpropagation technique for 1000 iterations. After initializing and evaluating the cost of the agents, the position of the search agents has been updated based on the GOA parameters and the process repeats until the stop criteria have been achieved. The preprocessing state is given in Fig. 1.
3 Extraction of Features The point of the element extraction arrange is pixel portrayal by methods for an element vector, a pixel portrayal as far as some quantifiable estimations which might be effectively utilized in the order stage to choose whether pixels have a place with a genuine vein or not. A clear broadness first filter count is associated for the calculation of grayscale morphological multiplication. Pixels of the image are taken care of progressively, and diverged from their 8-neighbors. In case all neighbors have a lower control, the pixel itself is a LMR. In case there is a neighboring pixel with higher power, the present pixel may not be a generally extraordinary. Pixels of a LMR are seen as solely as possible hopefuls, and the pixel with the most outrageous last score will address the territory; this strategy is suggested as non most prominent camouflage. To inspect the encompassing of a solitary most extreme pixel in a MA applicant locale, the force esteems along discrete line fragments of various introductions, whose focal pixel is the competitor pixel, are recorded. Along these lines, a lot of cross-sectional power profiles is acquired. On the acquired cross-area profiles a pinnacle discovery step
Retinal Image Classification System Using Multi Phase Level Set Formulation
1157
Fig. 1. Processing steps. (a) Input image [4] (b) De-blurred green channel image (c) Contrast enhanced image (d) Brightest pixel detected (marked as a dot in the optic disc) (e) Marking of Region-of-interest (f) Extraction of foreground object
is performed. Our point is to choose whether a pinnacle is available at the focal point of the profile, i.e., at the area of the competitor point for a particular heading. A few properties of the pinnacle are determined, and the last list of capabilities comprises of a lot of factual estimates that show how these qualities shift as the introduction of the cross-segment is evolving. Along these lines, the variety of critical qualities, for example, symmetry and state of the structure, and its distinction from the foundation might be numerically communicated. According to the two clues, lumen and illumination high-light, we processed RETINAL images with histogram equilibrium to enhance image contrast as the first step. Obviously there are others image manipulation enhanced algorithms, such as linear contrast broadening, retinex model and so on. However all the transformation must be homeomoriphic, since related research [11] showed that in a image, the shape information of objectives was represented through levelset of the image, in another words the shape information of objectives was included in the isolux lines of image. In this paper, we adopted histogram equilibrium to improve image contrast. It makes the illumination of tumors prominent. Allow for the whole cell’s boundary, which imbedding in the black background. We adopt multiphase level set formulation, in theory, two boundary can indicate 4 regions, according to analyzing, that the region of interest is whole included in the bigger boundary, so we use two boundary represent 3 regions in this work. Moreover, this method shows a desirable performance in the presence of intensity inhomogeneities for both synthetic and real images. Therefore, the method is good at intensity
1158
A. Jayachandran et al.
inhomogeneities, suitable for the illumination high-light. Simultaneously, the method is appropriate for another clue, the lumen highlight. We consider the three-phase RETINAL image domain is segmented into three disjoint regions 1, 2 and 3. Meanwhile, two level set function 1 and 2 is used to define the three regions.
4 ANFIS System ANFIS is one of mixture insightful neuro-fluffy induction frameworks and it working under Takagi-Sugeno-type fluffy derivation framework. With respect to the forecast of directing plot for portable robot we accept the fluffy derivation framework under thought of four information sources i.e., Front hindrance remove, Right deterrent separation, Left impediment separate, target point which are gathered from sensors and each information variable has three participation functions(MF) individually, at that point a Takagi-Sugeno-type fuzzy inference framework., if-then rules is set up as per Eq. (1) fn ¼ pn X1 þ qn X2 þ rn X3 þ sn X4 þ un
ð1Þ
In ANFIS Classifier, Gradient drop and Backpropagation computations are used to alter the parameters of enlistment limits (fleecy sets) and the heaps of defuzzification (neural frameworks) for cushioned neural frameworks. ANFIS applies two techniques in reviving parameters. The ANFIS is a FIS completed in the arrangement of a flexible cushy neural framework. It combines the express data depiction of a FIS with the learning force of ANNs. The objective of ANFIS is to fuse the best features of fleecy structures and neural framework. The classifier is utilized to recognize the irregularities in the retinal cerebrum images [12–15]. By and large the info layer comprise of seven neurons relating to the seven highlights. The yield layer comprise of one neuron demonstrating whether the retinal is of an ordinary cerebrum or strange and the concealed layer changes as indicated by the quantity of principles that give best acknowledgment rate for each gathering of highlights. The framework is a blend of neural system and fluffy frameworks in which that neural system is utilized to decide the limits of fluffy framework. It to a great extent evacuates the necessity for physical enhancement of limits of fluffy scheme. The framework by the learning abilities of neural system and with the benefits of the standard base fluffy framework can enhance the execution altogether and neuro-fluffy framework can likewise give an instrument to consolidate past perceptions into the grouping process. In neural system the preparation basically fabricates the framework.
5 Experimental Results 5.1
Performance Measures
So as to evaluate the execution of the planned strategy on a experimental image, the subsequent division is contrasted with its relating best quality level picture.
Retinal Image Classification System Using Multi Phase Level Set Formulation
1159
Five performance metrics have been used for the analysis the proposed method as per the standard evaluation metrics such as sensitivity, PPV, NPV, specificity and accuracy and the segmentation results of proposed method in hemorrhages is given in Fig. 2.
Fig. 2. (a) Fundus image [7] (b) Hemorrhage obtained
5.2
Evaluation of Proposed System
This technique was assessed on DRIVE and STARE database pictures with accessible best quality level pictures. Since the pictures’ dim foundation outside the FOV is effectively distinguished. Affectability, explicitness, positive prescient esteem, negative prescient esteem and exactness esteems were figured for each picture considering FOV pixels as it were. Since FOV covers are not accommodated STARE pictures, they were created with a surmised breadth of 650 * 550. The trial consequences of affectability, particularity, PPV, NPV and exactness of DRIVE information base is appeared in Fig. 3 and STARE information base is appeared in Fig. 4.
6 Conclusion The hemorrhages are difficult to recognize from foundation varieties since it normally low differentiation. Programmed identification of discharge can be befuddled by other dim zones in the picture, for example, the veins, fovea, and microaneurysms. It have a variable size and regularly they are small to the point that can be effectively mistaken for the pictures clamor or microaneurysms and no standard database that characterize discharge by shape. The most false discovery is the situation when the veins are contiguous or covering with hemorrhages. So the powerful recognition of hemorrhages strategy is required. The proposed methodology contains highlight extraction and characterization. The advantage of the framework is to help the doctor to settle on a ultimate choice without vulnerability. Our proposed discharge division strategy does not require any client intercession, and has steady execution in both typical and strange pictures. The proposed drain identification calculation delivers over 96% of division precision in both Databases.
1160
A. Jayachandran et al.
Fig. 3. Classification results of proposed method using DRIVE database
Fig. 4. Classification results of proposed method STARE database Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest. ✓ No humans/animals involved in this research work. ✓ We have used our own data.
Retinal Image Classification System Using Multi Phase Level Set Formulation
1161
References 1. Doi, K.: Computer-aided diagnosis in medical imaging: historical review, current status and future potential. Comput. Med. Imaging Graph. 31, 198–211 (2007) 2. Sundaraj, G.K., Jayachandran, A.: Abnormality segmentation and classification of multi model brain tumor in MR images using Fuzzy based hybrid kernel SVM. Int. J. Fuzzy Syst. 17(3), 434–443 (2016) 3. Ronald, P.C., Peng, T.K.: A Textbook of Clinical Ophthalmology: A Practical Guide to Disorders of the Eyes and Their Management, 3rd edn. World Scientific Publishing Company, Singapore (2003) 4. Mahiba, C., Jayachandran, A.: Severity analysis of diabetic retinopathy in retinal images using hybrid structure descriptor and modified CNNs. Measurement 135, 762–767 (2019) 5. Sukkaew, L., Makhanov, B., Barman, S., Panguthipong, S.: Automatic tortuosity-based retinopathy of prematurity screening system. IEICE Trans. Inf. Syst. 91(12), 2868–2874 (2008) 6. Jayachandran, A., Dhanasekaran, R.: Automatic detection of brain tumor in magnetic resonance images using multi-texton histogram and support vector machine. Int. J. Imaging Syst. Technol. 23(2), 97–103 (2013) 7. Niemeijer, M., Xu, X., Dumitrescu, A., Gupta, P., Ginneken, B., et al.: Automated measurement of the arteriolar-to-venular width ratio in digital color fundus photographs. IEEE Trans. Med. Imaging 30(11), 1941–1950 (2011) 8. Vickerman, M., Keith, P., Mckay, T.: VESGEN 2D: automated, user-interactive software for quantification and mapping of angiogenic and lymphangiogenic trees and networks. Anat. Record 292, 320–332 (2009) 9. Jayachandran, A., Dhanashakeran, R., Sugel Anand, O., Ajitha, J.H.M.: Fuzzy information system based digital image segmentation by edge detection. In: 2010 IEEE International Conference on Computational Intelligence and Computing Research, 28–29 December 2010 10. Leandro, J.J., Cesar, J.R., Jelinek, H.F.: Blood vessels segmentation in retina: preliminary assessment of the mathematical morphology & the wavelet transform techniques. In: Proceding on Computer Graphics and Image Processing, pp. 84–90 (2001) 11. Jayachandran, A., Dhanasekaran, R.: Multi class brain tumor classification of RETINAL images using hybrid structure descriptor and fuzzy logic based RBF kernel SVM. Iranian J. Fuzzy Syst. 14(3), 41–54 (2017) 12. Zana, F., Kelin, J.C.: Segmentation of vessel-like patterns using mathematical morphology and curvature evaluation. IEEE Trans. Image Process. 10, 1010–1019 (2001) 13. Al-Rawi, M., Karajeh, H.: Genetic algorithm matched filter optimization for automated detection of blood vessels from digital retinal images. Comput. Methods Programs Biomed. 87, 248–253 (2007) 14. Jayachandran, A., Dhanasekaran, R.: Severity analysis of brain tumor in RETINAL images using modified multi-text on structure descriptor and kernel-SVM. Arabian J. Sci. Eng. 39(10), 7073–7086 (2014) 15. Jayachandran, A., Dhanasekaran, R.: Brain tumor detection using fuzzy support vector machine classification based on a texton co-occurrence matrix. J. Imaging Sci. Technol. 57(1), 10507-1–10507-7 (2013)
FSO: Issues, Challenges and Heuristic Solutions Saloni Rai and Amit Kumar Garg(&) Electronics and Communications Department, Deenbandhu Chhotu Ram University of Science and Technology, Murthal, Haryana, India [email protected], [email protected]
Abstract. Most dominant and viable advanced technology for broadband wireless applications is free space communication (FSO) that provides unlicensed spectrum, very high data rate and has no electromagnetic interference. Atmospheric turbulences has great effect on FSO link and quality of light beam gets Detroit while propagating to long distance through the atmosphere. The signal exhibits a random fluctuation in the existence of atmospheric turbulence which degrades functioning of system. Bit Error Rate (BER), Outage Probability (OP) and Quality Factor are most important factors that decide quality of transmission of any data. Free space optical communication concept is introduced in this paper along with its issues, challenges and heuristic solutions. The findings in the paper demonstrate deployment of different methods for enhancing the system performance. It is seen that by using heuristic solutions such as FSO-WDM, hybrid RF/FSO better results with respect to BER can be obtained in comparison to simple FSO for high speed wireless communication. Keywords: Free space optical communication Atmospheric turbulences LOS BER FSO/RF OOK DPSK QPSK NRZ RZ
1 Introduction To transmit data wirelessly, free-space optical communication (FSO) is best suited approach which uses light that propagates in free space. It is a recent growing technology that uses point to point laser links and transmits the data from transmitter to receiver and uses direct LOS [3–14]. This technology provides unlicensed spectrum, flexibility, cost effectiveness, bandwidth scalability, speed of deployment and security and is an optimal technology for future generations. FSO link working is similar to fiber optics communication but channel is air [2]. It ranges from about 100 m to few kilometers. With the increase in demand of wireless broadband communications, it has received appreciable attention because it has low cost as compared to fiber optic communication and offers low dispersion with high data rate and very less bit error rate. It is attaining market acceptance as a functional, high band-width access tool. It is preferred over Radio fiber a traditional RF system has limited frequency spectrum. This system is appropriate for regions where optical fiber cables cannot be deployed like in hilly areas [4, 5]. This problem of limited frequency spectrum and cost of laying fibers can be relieved by the © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 1162–1170, 2020. https://doi.org/10.1007/978-3-030-37218-7_122
FSO: Issues, Challenges and Heuristic Solutions
1163
implementation of this communication technology. The various challenges for this system is atmospheric turbulence like rain, snow, dust, haze, fog, Aerosol, gases and various other suspended particles geometric loss, atmospheric loss, blocking, shadowing which provides the significant attenuation to the system. Reliability and functionality of this system is adversely affected by the atmospheric conditions. To enhance the system reliability and effectiveness and to mitigate all atmospheric impairments several techniques have been deployed [8–11]. Techniques like hybrid RF/FSO, WDM and modulation schemes can be deployed at physical layer and retransmission, reconfiguration and rerouting can be implemented at network layer level. This paper is arranged as follows. In Sect. 2, the basic architecture of FSO link system is given along with its schematic diagram. Issues and challenges are presented in Sect. 3. Heuristic solutions to enhance the system performance is presented in Sect. 4. Results and discussions are presented in Sect. 5 and conclusion and future scope is presented in Sect. 6.
2 Functional Schematic of FSO The functional schematic of FSO communication is illustrated in Fig. 1. It constitutes of transmitter, receiver and channel i.e. air. The information is communicated through the source over this system. To modulate the data for transmission, different modulation techniques such as OOK, BPSK, QPSK and PPM can be implemented on this system. This source output is modulated to an optical carrier called laser and then transmitted through the atmospheric channel. Transmitter converts incoming electrical signal into optical form and transmits to atmospheric channel. Atmospheric channel through which the optical beam passes possess a lot of challenges for the transmitted signal like rain, snow, dust, haze, fog, Aerosol, gases and various other suspended particles geometric loss, atmospheric loss, blocking, shadowing which provides the significant attenuation to the system. Thus the signal has to be properly modulated before transmission. The signal is optically collected and detected at the receiving side which converts received optical signal to electrical form [12].
Fig. 1. Functional schematic diagram of FSO [12]
1164
S. Rai and A. K. Garg
Quality of transmission of data and system performance can be measured by bit error rate (BER). It is the prob. that an error may occur in a bit i.e. ‘1’ bit turns to ‘0’ bit or ‘0’ bit turns to ‘1’ bit. The Typical Value of BER required for commercial optical wireless systems is 10-9 [13]. The system should have less error probability less than acceptable rate with particular data rate. BER ¼
Number of errors Total number of bitssent
ð1Þ
P0 P1 þ 2 2
ð2Þ
BER ¼
P0 is the prob. of receiving ‘0’ instead of ‘1’ and P1 is the prob. of receiving ‘1’ instead of ‘0’. Multiplication factor here is because ‘0’ and ‘1’ are equally likely events [13]. Figure 2 represents the BER vs. range under different weather conditions. Due to excessive heating, turbulences, particles fluctuations and scintillation atmospheric conditions leads to temporal and random spatial changes.
Fig. 2. BER v/s range (different weather conditions) [14]
3 Issues and Challenges Various challenges occur with the deployment of FSO technology and there is a need to reduce these challenges to harness complete capability of technology. 3.1
Atmospheric Disturbances
Atmospheric turbulence is the major challenge that affects the most in FSO. Fluctuations in refractive index degrade the performance of link due to in homogeneities in
FSO: Issues, Challenges and Heuristic Solutions
1165
temperature [15, 16]. The parameters like length, operating wavelength, fog, rain, gases amongst transmitter and receiver leads in attenuation due to photon absorption and scattering [17]. Gamma–Gamma model is used for atmospheric turbulence i.e. “weak turbulence (WT), moderate turbulence (MT) and strong turbulence (ST) conditions [31]”. When the signal is transmitted over system, it suffers from misalignment and atmospheric turbulence between transmitter and receiver, using Gamma–Gamma model, the PDF of Ia is expressed as
ð3Þ C (.) denotes Gamma Function. a is no. of larger scale turbulence cell and b denotes no. of small scale turbulence cell. Most dominant forms of interference for free space optics is fog. It consists of small water droplets, so due to moisture, it may reflect and scatter. Its visibility range lies between 0–2,000 m, so the phrases “advection fog” and “convection fog” are used [18]. Rain also has influence on this link, but its impact is comparatively less than other weather conditions [19]. Attenuation because of snow lies in range 30–350 dB/km which is more than rain and less than fog [29, 30]. The attenuation factor value ranges from 0 to 3 dB/km for clear weather conditions [19]. 3.2
Pointing Error (PE)
Error occurs between transceiver because FSO equipment is placed on high rise structure in urban areas. Building sway leads to the performance of FSO links degrades. [20]. 3.3
Physical Obstructions
FSO is LOS communication so any barriers in the path can cause interruption in the communication. These interruptions are temporary and can be easily and automatically resumed [21]. 3.4
Link Margin (LM)
It can be calculated by the power received at the receiver [27]. It is expressed as: LM ¼ 10log
Pr
ð4Þ
Ps
Where, Pr = power received, Ps = receiver sensitivity. The value of value of Ps is in between −20 dbm to −40 dbm.
Pr
>
Ps.
The
1166
3.5
S. Rai and A. K. Garg
Geometric Attenuation (GE)
It is square of ratio aperture diameter of receiver (dr) by summation aperture diameter of transmitter (dt) and hL [27, 28]. GE ¼ ðdr=ð£l þ dtÞÞ2
ð5Þ
Table 1 indicates various weather conditions with respect to attenuation values in dB/Km. Weather conditions has the major influence on performance of FSO links as explained above. Table 1. Weather conditions vs. attenuation value [22] Weather conditions Advection fog Convection fog Light rain Heavy rain Haze
Attenuation value (dB/Km) 4.28 15.6 6.27 19.8 20.68
4 Heuristic Solutions to Enhance System Performance Different mitigation techniques are deployed to increase FSO reliability and its performance. These techniques can be deployed within physical layer or network layer. At Physical layer, Wavelength division multiplexing (WDM) is the technique that can be deployed to reduce atmospheric attenuation and has high-capacity so high data rate for long distance can be transmitted. It is a multiplexing technique in which the many optical signals are multiplexed on a medium at altered wavelengths to maximize bandwidth usage [1]. The weather dependence is a significant factor in performance of FSO link so WDM system is designed to overcome this. FSO and RF have some of the common features between them but impact of atmospheric conditions is not same [24]. Fog is the major issue that affects the performance of the FSO link, but RF link has least influence of fog. Similarly, rain does not have any effect on FSO link, but RF link has great influence on performance [25, 26]. RF/FSO hybrid systems are deployed to provide good signal quality. Thus Hybrid system does not experience abrupt changes in turbulence and the weather conditions are steady. Different modulation schemes can also be deployed to increase the efficiency of system. OOK is simplest scheme and is band-width efficient. The light source transmits logic ‘1’ when it is on and transmits logic ‘0’ at off state. It is also known as non-return to zero (NRZ) modulation. A return to zero (RZ) coding can also be used in which the logic “1” returns to zero in the middle of the sample. RZ is more sensitive than NRZ [35]. Despite from the above schemes, coherent modulation schemes like differential phase shift keying (DPSK) and binary phase shift keying (BPSK) can also be used. Under the weather conditions, the BER performance of BPSK is better than OOK.
FSO: Issues, Challenges and Heuristic Solutions
1167
Pulse position modulation (PPM) scheme is an orthogonal modulation technique and it needs both slot and symbol synchronization [33, 34]. In Network layer, retransmission of data is one of the solutions. It is useful when there is strong atmospheric turbulence. Repeated transmission of data may increase the probability of correct detection. Rerouting is also one of the solutions and can be handled by Rerouting transmit paths. When direct connection is disabled, networking between nodes can reduce communication problems by enabling alternative routes from sender to receiver. Best route can be selected while considering traffic demands. It provides point to multipoint transceiver. Table 2 describes various types of mitigation techniques deployment at physical level and network layer in order to increase FSO system reliability and performance. Table 2. Mitigation techniques Physical layer Hybrid RF/FSO WDM Aperture averaging Modulation and coding
Network layer Retransmission Rerouting Reconfiguration Replay
5 Results and Discussions Simulation results of different techniques are discussed in this section. The values of BER at different weather conditions are recorded in Table 3. It is detected that value of BER decreases at clear weather conditions and is improved using WDM technique. Table 3. Weather conditions vs. BER values (WDM) Weather conditions Thin fog Light fog Medium fog Heavy fog Clear weather Haze
BER values (distance = 2.5 km) 10−12 10−10 10−4 10−2 10−34 10−15
Figure 3 represents the graph between BER vs. range under different weather conditions for WDM system. Graph shows different weather conditions along with its BER values for transmission range. It can be seen that by deploying WDM technique the functionality of system can be improved. Thus, as shown in graph at clear weather it ranges more than 13 km and under heavy haze it last to 1 km [23].
1168
S. Rai and A. K. Garg
Fig. 3. BER vs. range (WDM FSO) [23]
Figure 4 shows comparison amongst hybrid RF/FSO links and FSO link for moderate and strong turbulences. From the graph it can be illustrated that fso link has more error rate as compared to hybrid RF/FSO network. By using hybrid RF/FSO links, the reliability and system performance is improved which can be seen in the graph [32].
Fig. 4. FSO link vs hybrid RF for strong turbulence and moderate turbulence [32]
6 Conclusion and Future Scope In this paper a brief survey of FSO its issues, challenges and heuristic solutions is presented. The major advantages of FSO prove that it is good and reliable technology for future generations. The adverse effect on the performance of FSO link results from atmospheric turbulence. To improve the reliability of system certain mitigation techniques are deployed. Different mitigation techniques like Hybrid FSO/RF system, WDM-FSO system and modulation schemes are latest approach to improve the system performance and reliability for faster communication. The obtained results prove that
FSO: Issues, Challenges and Heuristic Solutions
1169
better performance can be achieved by deploying mitigation techniques to enhance system performance. By implementing the above techniques this technology can be deployed in future for high speed switching system, free space exploration and it can be analogous to satellite communication which is the major requirement nowadays.
References 1. Kaur, G., Singh, H.: Simulative Investigation of 32x5 Gb/s DWDM-FSO system using semiconductor optical amplifier under different weather conditions. Int. J. Adv. Res. Comput. Sci. 8(4), 392–397 (2017) 2. Mazin, A.A.A.: Performance analysis of WDM-FSO link under turbulence channel. World Sci. News 50, 160–173 (2016) 3. Elgala, H., Mesleh, R., Haas, H.: Indoor optical wireless communication: potential and stateof-the-art. IEEE Commun. Mag. 49(9), 56–62 (2011). https://doi.org/10.1109/mcom.2011. 6011734 4. Esmail, M.A., Ragheb, A., Fathallah, H., Alouini, M.S.: Investigation and demonstration of high speed full-optical hybrid FSO/fiber communication system under light sand storm condition. IEEE Photon. J. 9(1) (2017). Article no. 7900612 5. Khalighi, M.A., Uysal, M.: Survey on free space optical communication: a communication theory perspective. IEEE Commun. Surv. Tutors. 16(4), 2231–2258 (2014) 6. Ghassemlooy, Z., Popoola, W., Rajbhandari, S.: Optical Wireless Communications: System and Channel Modeling with Matlab. CRC Press Taylor and Francis Group, Boca Raton (2013) 7. Luong, D.A., Truong, C.T., Pham, A.T.: Effect of APD and thermal noises on the performance of SC-BPSK/FSO systems over turbulence channels. In: 18th Asia-Pacific Conference on Communications (APCC), pp. 344–349 (2012) 8. Ramirez-Iniguez, R., Idrus, S.M., Sun, Z.: Atmospheric transmission limitations. In: Optical Wireless Communications: IR for Wireless Connectivity, p. 40. Taylor & Francis Group, LLC, London (2008) 9. Li, J., Liu, J.Q., Taylor, D.P.: Optical communication using subcarrier PSK intensity modulation through atmospheric turbulence channels. IEEE Trans. Commun. 55, 1598–1606 (2007) 10. Popoola, W.O., Ghassemlooy, Z.: BPSK subcarrier intensity modulated free space optical communication in atmospheric turbulence. J. Light Wave Technol. 27, 967–973 (2009) 11. Fried, D.L.: Optical heterodyne detection of an atmospherically distortion wave front. Proc. IEEE 55, 57–77 (1967) 12. Alkholidi, A., Altowij, K.: Effect of clear atmospheric turbulence on quality of free space optical communications in Western Asia. In: Das, N. (ed.) Optical Communications Systems (2012) 13. Majumdar, A.K., Ricklin, J.C.: Free Space Laser Communications: Principles and Advances. Springer, Heidelberg (2008). ISBN-13 978-0-387-28652-5 14. http://www.laseroptronics.com/index.cfm/id/57-66.htm 15. Andrews, L.: Atmospheric Optics. SPIE Optical Engineering Press, Bellingham (2004) 16. Andrews, L., Phillips, R., Hopen, C.: Laser Beam Scintillation with Applications. SPIE Optical Engineering Press, Bellingham (2001) 17. Gagliardi, R., Karp, S.: Optical Communications. Wiley, New York (1995)
1170
S. Rai and A. K. Garg
18. Kim, I., Mcarthur, B., Korevaar, E.: Comparison of laser beam propagation at 785 and 1550 nm in fog and haze for optical wireless communications. In: Proceedings of SPIE, vol. 4214, pp. 26–37 (2001) 19. Akiba, M., Ogawa, K., Walkamori, K., Kodate, K., Ito, S.: Measurement and simulation of the effect of snow fall on free space optical propagation. Appl. Opt. 47(31), 5736–5743 (2008) 20. Achour, M.: Simulating free space optical communication; part I, rain fall attenuation. In: Proceedings of SPIE, vol. 3635 (2002) 21. Bouchet, O., Marquis, T., Chabane, M., Alnaboulsi, M., Sizun, H.: FSO and quality of service software prediction. In: Proceedings of SPIE, vol. 5892, pp. 1–12 (2005) 22. Niaz, A., Qamar, F., Ali, M., Farhan, R., Islam, M.K.: Performance analysis of chaotic FSO communication system under different weather conditions. Trans. Emerg. Telecommun. Technol. (2019). https://onlinelibrary.wiley.com/doi/abs/10.1002/ett.3486 23. Sarangal, H., Singh, A., Malhotra, J., Thapar, S.S.: Performance evaluation of hybrid FSOSACOCDMA system under different weather conditions. J. Opt. Commun. (2018). https:// doi.org/10.1515/joc-2018-0172 24. Chatzidiamantis, N.D., Karagiannidis, G.K., Kriezis, E.E., Matthaiou, M.: Diversity combining in hybrid RF/FSO systems with PSK modulation. In: IEEE International Conference on Communications (2011) 25. Nadeem, F., Kvicera, V., Awan, M.S., Leitgeb, E., Muhammad, S.S., Kandus, G.: Weather effects on hybrid FSO/RF communication link. IEEE J. Sel. Areas Commun. 27(9), 1687– 1697 (2009) 26. Ghoname, S., Fayed, H.A., El Aziz, A.A., Aly, M.H.: Performance evaluation of an adaptive hybrid FSO/RF communication system: impact of weather attenuation. Iran. J. Sci. Technol. Trans. Electr. Eng. (2019). https://doi.org/10.1007/s40998-019-00244-0 27. Alma, H., Al-Khateeb, W.: Effect of weather conditions on quality of free space optics links (with focus on Malaysia). In: 2008 International Conference on Computer and Communication Engineering, Kuala Lumpur, pp. 1206–1210 (2008). https://doi.org/10.1109/ ICCCE.2008.4580797 28. Chaudhary, S., Amphawan, A.: The role and challenges of free-space systems. J. Opt. Commun. 35, 327–334 (2014). https://doi.org/10.1515/joc-2014-0004 29. Kim, I.I., Korevaar, E.: Availability of free space optic (FSO) and hybrid FSO/RF systems. Light pointe Tech report. http://www.opticalaccess.com 30. Kaushal, H., Kaddoum, G.: Optical communication in space: challenges and mitigation techniques. IEEE Commun. Surv. Tutor. https://doi.org/10.1109/comst.2016.2603518 31. Sarkar, D., Metya, S.K.: Effects of atmospheric weather and turbulence in MSK based FSO communication system for last mile users. Telecommun. Syst. (2019). https://doi.org/10. 1007/s11235-019-00602-7 32. Singh, N.: To analyze the aftermath of using hybrid RF/FSO link over FSO link under various turbulence of atmosphere. Int. J. Eng. Res. Technol. (IJERT) 8(05) (2019). http:// www.ijert.org, ISSN 2278-0181 IJERTV8IS050421 (This work is licensed under a Creative Commons Attribution 4.0 International License) 33. Majumdar, A.K., Ricklin, J.C.: Free-Space Laser Communications Principles and Advances. Springer, New York (2008) 34. David, F.: Scintillation loss in free-space optic systems. In: LASER 2004, San Jose, USA, vol. 5338 (2004) 35. Henniger, H., Wilfert, O.: An introduction to free space optical communications. Radio Eng. 19(2), 203–212 (2010)
Design of Canny Edge Detection Hardware Accelerator Using xfOpenCV Library Lokender Vashist(&) and Mukesh Kumar National Institute of Technology Kurukshetra, Kurukshetra 136119, Haryana, India [email protected], [email protected]
Abstract. Real time image processing using CPU takes more computational power and thereby processing speed also decreases. FPGA with its parallel processing capability does complex processes with much more speed and less power consumption sharing load of CPU hence it can be used for edge detection in real time where large number of pixels are to be processed. In this paper we developed programmable logic for canny edge detection algorithm intended for Zynq 7000 based Zybo board. xfOpenCV library is used to generate the hardware logic for algorithm after application of a wrapper function so as to make it portable with AXI stream interface of Zynq device. Better edges are detected even for noisy image using xfOpenCV library. Resources utilization of FPGA are very less thus the edge detection done using Zynq 7000 device will be 100x faster and more reliable then CPU’s Keywords: xfOpenCV library algorithm Zybo board
Vivado HLS Canny edge detection
1 Introduction Real time image and video processing has been key element for development of algorithms for object detection, video surveillance, medical imaging, multimedia, space technology and security [4]. Image segmentation, image categorization, object detection, and feature tracking are key functions helpful in above mentioned applications. The video pipeline contains edge detection as its initial phase. Image in general contains large number of edges and detecting them using CPU takes lot of computational time. Moreover, in real time framework processing becomes more complex and it requires high computational power to handle enormous number of pixels associated with the incoming video [9]. At high frame rate, CPU’s with little computation capability becomes vulnerable in correct detection of edges in real time framework. FPGA with its parallel processing capability makes it a suitable candidate for the above mentioned applications. FPGA uses hardware accelerators to reduce the amount of processes to be done by CPU thereby helps in reducing speed and energy [1]. Some pre-processing functions like filtering, thresholding which are high speed operations can be implemented using hardware accelerator before sending pixels to CPU which reduces the data to be processed in CPU leaving only regions of interest to be taken © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 1171–1178, 2020. https://doi.org/10.1007/978-3-030-37218-7_123
1172
L. Vashist and M. Kumar
care of. Also control logic can be implemented directly using FPGA without the interaction of CPU which reduces high computational power and increases speed considerably. Gudis, Lu et al. [12] used hardware accelerators for computer vision application using embedded solution to describe an architecture framework. They provided a structure using a comprehensive library of real-time vision hardware accelerators and software architecture. In [13] the use of OpenCV library and its implementation on Zynq platform is shown. This application was for road sign detection. This algorithm was illustrated by using Zynq Z7102 based zedboard in which the input image was of 1920 1080 resolution which was coming from an ON-Semiconductor VITA-2000 sensor attached via the FPGA Mezzanine Card (FMC) slot. The programmable logic in the design was used for preprocessing of image analysis and image segmentation. In this paper we are going to implement canny edge detection technique on Zynq Z7010 processor based Digilent Zybo board. The canny edge detection hardware accelerator is generated using Vivado HLS by setting constraint and using optimization through directives of HLS tool. xfOpenCV library provided by Xilinx for SDSoC environment is used containing the top-level canny edge detector function for synthesis using HLS. As xfOpenCV function generates the input and output port of IP as pointers hence it cannot be used directly for zynq z7010 which requires their input and output port in accordance with AXI stream format. Hence a wrapper function is designed so as to transform the original function in AXI stream format and then synthesis is done using this wrapper function.
2 Canny Edge Detection Algorithm It is one of the most widely used algorithm in modern day for detecting edges as its performance is reliable for noisy images [11]. It is based on kernel convolution [10]. The steps involved in this algorithm are described below: 1. Smoothening of image The very first step is to remove noise by using gaussian filter which removes high frequency components from image. If input image I (i, j) convolved with gaussian filter G it results in smoothened image [2]. F ði; jÞ ¼ Iði; jÞ G
ð1Þ
2. Gradients in horizontal and vertical direction The sobel operator is used to find gradient in horizontal and vertical direction. Where the change in greyscale intensity is maximum this operator is applied to obtain edges for every pixel. Sobel operator in horizontal and vertical direction: 1 H ¼ 2 1
0 0 0
1 2 1
and
1 V¼ 0 1
2 0 2
1 0 1
Design of Canny Edge Detection Hardware Accelerator Using xfOpenCV Library
1173
These operators are applied convolved with smoothened image giving gradients in the respective directions. Gh = F ði, jÞ H
and
Gv ¼ F ði, jÞ V
ð2Þ
Magnitude and direction of gradient of pixel is given by G¼
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðGh^ 2 þ Gv^ 2Þ and
h ¼ arctan
Gv Gh
ð3Þ
3. Non Maximum Suppression To obtain thin edges local maxima in the gradient image is kept and rest is deleted for every pixel. Start with gradient direction of 45° for the current pixel K (i, j), then in the positive and negative gradient direction find the gradient magnitude and mark them as P (i, j) and N (i, j) respectively. If the strength of edge of the pixel K (i. j) is more than that of P (i, j) and N (i, j) then mark this as an edge pixel if not discard it or partially suppress it. 4. Hysteris Thresholding As noise is still present, there are local maxima found. To supress them choose two threshold THIGH and TLOW. If the gradient magnitude is less then TLOW get rid of edge. If magnitude is high then THIGH preserve it. If it falls in between them and if any of its neighbouring pixel has magnitude greater than THIGH keep that edge otherwise remove it.
3 Implementation of Canny Edge Detection on Vivado HLS Using xfOpenCV Library Targeted for Zybo Board Most of the emerging embedded processor and SoC devices require hardware logic to be written in Verilog/VHDL which is difficult for complex applications. The automated flow of HLS tool enables the hardware logic to be written in high level language C/C++ [3]. In this work we are using Xilinx Vivado HLS tool to convert the canny edge detection function from xfOpenCV library into HDL implementation (Verilog/VHDL) which is then synthesized using Vivado tool targeted for FPGA (Zybo board) [5]. The IP to be generated for canny algorithm is targeted for Zybo board having Zynq 7000 Z7010 device. The Zynq 7000 contains ARM cortex A9 processor with a single FPGA which provides extreme high speed [6]. The generated IP has to be used in accordance with AXI4 Stream protocol for the input and output ports of IP. Figure 1 shows the architecture of Zynq 7000 SoC. The algorithm to generate a wrapper function around the original top level canny edge detection function from xfOpenCV library is discussed below so as to generate IP in accord with AXI protocol. xf::Mat is the class used to define the image object in xfOpenCV library. H - Height of image, W - width of image, input and output are objects of class hls::stream which defines the image objects in AXI format. xf::AXIvideo2xfMat – function which converts Input video in AXI format into xf::mat image format, xf::xfMat2AXIvideo –
1174
L. Vashist and M. Kumar
Fig. 1. System architecture of Zynq 7000 SoC [5]
function which converts xf::mat image object back to AXI format video object. Steps for algorithm are described below: 1. Denote Original top-level function from xfOpenCV library for synthesis written in C++ as: Canny_accel (source image, destination image, kernel matrix) 2. Denote wrapper function for synthesis as: xf_ip_accel_app (input, output, H, W, kernel matrix). 3. Use pragma directive of HLS for interface #pragma HLS INTERFACE axisport = in #pragma HLS INTERFACE axis port = out 4. Define two temporary image objects of xf::mat class as temp_in1, temp_out1. 5. Use pragma directive given below to define the HLS stream variables which are temporary. #pragma HLS stream variable = temp_in1.data #pragma HLS stream variable = temp_out1.data 6. Inside the data flow pragma perform the below steps: a. Use function as xf::AXIvideo2xfMat (input, temp_in1) which convert temp_in1 of mat type in to input object of AXI type. b. Now call function Canny_accel to obtain edges: Canny_accel (temp_in1, temp_out1, kernel) c. After this convert temp_out1 back to AXI video as: xf::xfMat2AXIvideo (temp_out1, output)
Design of Canny Edge Detection Hardware Accelerator Using xfOpenCV Library
1175
4 Results 4.1
Simulation Results
The input and output image generated after C-Simulation in Vivado HLS after using canny edge detection function of xfOpenCV library is shown in Figs. 2 and 3 respectively.
Fig. 2. Input image for C-simulation in Vivado HLS [9]
Figure 2 is the still image given as an input to the standalone application developed in Vivado HLS for C-simulation. The input image file is of resolution 1280 720. Figure 3 shows the edges detected using the canny edge detection after C-simulatiton. Edges are present at every pixel having a gradient change as visible from Fig. 3.
Fig. 3. Output Image after C-simulation
1176
4.2
L. Vashist and M. Kumar
IP Generated Before and After Application of C++ Wrapper Function
The IP generated before applying wrapper function having input and output port generated as pointers is shown in Fig. 4 and after applying the wrapper function xf_ip_accel_app is shown in Fig. 5.
Fig. 4. IP generated before applying the wrapper function on top level C function for canny edge detection from xfOpenCV library.
Fig. 5. IP generated after applying wrapper function
The IP generated after C-synthesis and export function in Vivado HLS is depicted in Fig. 4. It contains input ports on left and output ports on right of the block which are of pointer type standard. Figure 5 depicts the IP generated using the C-synthesis and export function on the wrapper function. The generated IP block contains input and output ports having AXI standard protocol which is the port standard used on Zynq processor. 4.3
Resource Utilization
Table 1 gives the estimate of the resources of FPGA that are going to be used by the generated RTL in the system design. Utilization of BRAM_18K is only 5% which is
Design of Canny Edge Detection Hardware Accelerator Using xfOpenCV Library
1177
used to implement line buffer for canny edge detection algorithm. Utilization of Flip Flops is 4% which is used to implement window buffer for canny edge detection C++ code.
Table 1. Resource Utilization of the hardware accelerator for canny edge detection.
5 Conclusion and Future Work In this paper we have successfully used the xfOpenCV library having a top level function for canny edge detector in vivado HLS and converted it into a synthesizable hardware accelerator having ports in AXI Stream format with the help of a C++ wrapper function. The simulated output image indicates that edges are better in terms of brightness and clarity. The algorithm from xfOpenCV gives better results than its counterpart OpenCV library. The resource utilization report indicates that very little available resources of FPGA are utilized and hence more complex blocks can be included with it to enhance the system design. In the classic canny edge detection system gaussian filter is used for filtering which smoothens the image without preserving some edges, hence we can use median filter [7] or bilateral filter which smoothens image along with preserving edges in place of gaussian filter and change the C++ code in xfOpenCV library accordingly and generate IP for that. Secondly, in image thresholding the upper and lower threshold are fixed by user which results in the loss of edges in the real time [8]. We can develop automatic setting of threshold values and this part of Canny algorithm can be changed in xfOpenCV C++ function to obtain better edges. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest. ✓ No humans/animals involved in this research work. ✓ We have used our own data.
1178
L. Vashist and M. Kumar
References 1. Asano, S., Maruyama, T., Yamaguchi, Y.: Performance comparison of FPGA, GPU and CPU in image processing. In: 2009 International Conference on Field Programmable Logic and Applications, pp. 126–131. IEEE (2009) 2. Kumar, M., Saxena, R.: Algorithm and technique on various edge detection: a survey. Sig. Image Process. 4(3), 65 (2013) 3. Chinnadurai, M., Joseph, M.: High level synthesis tools-an overview from model to ımplementation. Middle-East J. Sci. Res. 22, 241–254 (2014) 4. Nayagam, M.G., Ramar, K.: A survey on real time object detection and tracking algorithms. Int. J. Appl. Eng. Res 10(9), 8290–8297 (2015) 5. Amara, A.B., Pissaloux, E., Atri, M.: Sobel edge detection system design and integration on an FPGA based HD video streaming architecture. In: 2016 11th International Design & Test Symposium (IDT), pp. 160–164. IEEE (2016) 6. Sadri, M., Weis, C., Wehn, N., Benini, L.: Energy and performance exploration of accelerator coherency port using Xilinx ZYNQ. In: Proceedings of the 10th FPGAworld Conference, p. 5. ACM (2013) 7. Dong, Y., Li, M., Li, J.: Image retrieval based on improved Canny edge detection algorithm. In: 2013 International Conference on Mechatronic Sciences, Electric Engineering and Computer (MEC), 20–22 December 2013, pp. 1453–1457 (2013) 8. Bao, Y., Wu, D.: An ımproved Canny algorithm based on median filtering and adaptive threshold in robot training system. In: 2015 6th International Conference on Manufacturing Science and Engineering. Atlantis Press (2015) 9. Stefaniga, S.-A., Gaianu, M.: Performance analysis of morphological operation in CPU and GPU for medical ımages. In: 2017 19th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), pp. 377–384. IEEE (2017) 10. Vincent, O.R., Folorunso, O.: A descriptive algorithm for sobel image edge detection. In: Proceedings of Informing Science & IT Education Conference (InSITE), vol. 40, pp. 97– 107. Informing Science Institute, California (2009) 11. Zeng, J., Li, D.: An ımproved canny edge detector against ımpulsive noise based on CIELAB space. In: 2010 International Symposium on Intelligence Information Processing and Trusted Computing, pp. 520–523. IEEE (2010) 12. Gudis, E., Lu, P., Berends, D., Kaighn, K., Wal, G., Buchanan, G., Chai, S., Piacentino, M.: An embedded vision services framework for heterogeneous accelerators. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 598–603 (2013) 13. Russell, M., Fischaber, S.: OpenCV based road sign recognition on Zynq. In: 2013 11th IEEE International Conference on Industrial Informatics (INDIN), pp. 596–601. IEEE (2013)
Application of Cloud Computing for Economic Load Dispatch and Unit Commitment Computations of the Power System Network Nithiyananthan Kannan1(&), Youssef Mobarak1,2, and Fahd Alharbi1 1
2
Faculty of Engineering Rabigh, King Abdulaziz University, Jeddah, Saudi Arabia {nmajaknap,ysoliman,fahdalharbi}@kau.edu.sa Faculty of Energy Engineering, Aswan University, Aswan, Egypt
Abstract. The objectives of this research paper is to discuss the impacts of the economic load dispatch solutions and unit commitment solutions for online power system applications in cloud computing environment. The economic load dispatch solutions and unit commitment solutions calculations for multi- area power system implemented on cloud computing environment is to give solutions to the real time monitoring arrangement, it is affected by the interoperability problems among conventional client – server architecture. The existing load dispatch and commitment system computing cannot able to meet the expectations of the efficient monitoring and accurate delivery of the output with the massive enlargement of power system network in day to day basis. Besides, the cloud computing implementation on economic load dispatch and unit commitment and through threads of single server – multi client architecture is suggested to implement by considering each client as thread instance as power system client. The current methods on solving these problems through high performance computing infrastructures creased more issues on scalability, more maintenance and investments. The proposed method is ideal replacement due to high capacity, very high scalability and less cost in all the services. Keywords: Cloud computing High-performance computing Unit commitment Economic load dispatch Power system applications
1 Introduction Over past twenty years the power systems expanded constantly and reached a state of isolated system to highly interconnected network. This interconnection makes power systems to develop regional and International grids worldwide to achieve maximum possible power system reliability. Recent growth in extraction of power from distributed sources through various renewable energy sources is increased the complexity in integrating the power systems and its management. This network growth brought lot of problems such as large scale storage of data, big level of scale computing, more hardware and software requirements etc. High number of research work has been reported the contribution of computers in to Electrical related applications [1–32]. Among various types of online power system analysis the economic load dispatch and © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 1179–1189, 2020. https://doi.org/10.1007/978-3-030-37218-7_124
1180
N. Kannan et al.
unit commitment monitoring place wider role is to utilize the resources in an optimized way to produce the power at much cheaper cost per unit as indicates in Fig. 1. Economic Load dispatch (ELD) is the algorithm which allocates the generation levels among power plants in tandem to meet the load demand as cheap as possible. The objectives of the ELD algorithm are to bring out the best optimal power values of a power plant which satisfies all the constraints and should reduce the fuel cost to an optimal value as shown in Fig. 1.
Fig. 1. Display of economic load dispatch concept
Normally the ELD based issues are formed by a quadratic equation and it is find solution by using numerical methods based computer program using the standard optimization algorithms based on standard algorithms lambda iteration, dynamic programming methods and gradient methods. The Eq. 1 indicates the total fuel cost. FT ¼
Xn i¼1
Xn
F ðPi Þ ¼
i¼1
ai P2i þ bi Pi þ ci
ð1Þ
Where FT indicates the overall fuel cost. ai, bi, and ci are the coefficient amount, Pi is the power generated. Equation 2 indicates the minimum and maximum power limits. Pi min Pi Pi max
for i ¼ 1; 2; . . .; n
ð2Þ
The Eq. 3 indicates the calculations for power demand which related with transmission line losses. PD ¼
Xn i¼1
Pi Ploss
The Eq. 4 indicates the power loss equation based on B matrix.
ð3Þ
Application of Cloud Computing for Economic Load Dispatch
Ploss ¼
n X n X
Pi Bij Pj
1181
ð4Þ
j¼1 j¼1
Bij’s are the elements of coefficient of loss matrix B. ELD reduces the cost of the energy sources and provides better control over emissions. ELD is also identifies expensive non performing power plants. ELD is also improves power system reliability, efficiency and capable of change in output in a short span of time. The optimal power generation allotment for different power plants helps to reduce the fuel cost to minimal. The same optimization methods are extended to allocate the generating units in the power plant to reduce the fuel cost to next leve. This method of allocation is called Unit commitment. Unit commitment is to find out which unit should be ON/OFF. Unit Commitment, as the name suggests, is an hourly commitment schedule of units determined ahead in time, varying from few hours to a week, with the main goal of meeting load demand. In general, UC schedules are determined a day ahead. The hourly load demand for UC problem is available from precise load forecasting. The only optimizing criterion in determining UC schedule is the cost of generation which needs to be minimized over the planning period while fulfilling all system constraints arising from the generating unit’s physical capabilities and the transmission system’s network configuration. A generating unit has various limitations such as min up-down time, max and min generation limits, and ramp rate limits etc. Similarly transmission network configuration governs the maximum power flow possible through lines while transformer tap settings and phase shifting angle limits impose the physical constraints. UC problem is a combination of two sub-problems. One determines the generating units to be committed and the other actually focuses on the amount of generation from each of these committed units. Generating units exhibit different operating efficiencies and performance characteristics which reflect on the required inputs. Thus the cost of generation also depends on the amount of output from each committed unit apart from the choice of generating units. Thus, a UC problem is solved in two stages. The combination of generating units yielding the least final production cost is thus chosen as the UC schedule for the hour as shown in Fig. 2.
Fig. 2. Combination of generating units
1182
N. Kannan et al.
2 Cloud Computing and Its Benefits Cloud computing is the latest technology uses multiple computer in a network connected with internet to meet out fast growing web based service request. It is capable providing variety of services to the business community and the society. Cloud computing technology able to serve at greater flexibility and able to provide computing resources as minimal cost. This latest technology creates new revolution in the electronic services of various disciplines. In this research work, it has been exploring the features of cloud computing services and its suitability of ELD and UC applications. Large number of cloud services providers available in the market who pays vital role in utilization of resource effectively and make the user friendly approach enables all the small to big enterprices to get benifited [33–42] (Fig. 3).
Fig. 3. Cloud computing features
The powerful computational capacity, rapid scalability, efficient, high reliability, flexibility and high cost effectiveness are the major benefits to cloud computing that can be exploited for important power system application such as ELD and UC. The cloud computing offers three important services as shown in Fig. 4. The services are software as service, Platform as service and Hardware as service. In software as service the service is offered via internet the several standard software applications can be assessed by the user as shared resource. The user can access the software without installing any software in his own computer as end user. In Platform as a service is offered by several service providers. Any one want to upload their own applications, they can upload in the cloud server. Based on the increase on the usage cloud computing architecture supports high level of scalability. It helps developer to code the applications without worrying about the platform infrastructure requirements. In hardware as a service some service providers in the world offers their infrastructure to the users. Instead of offering physical infrastructure they offer their hardware in a virtualize environment. This service is used for computing or storing large data [42–52].
Application of Cloud Computing for Economic Load Dispatch
1183
Fig. 4. Services of cloud computing
3 Cloud Computing Based Economic Load Dispatch and Unit Commitment Solutions To give fast and reliable solutions to the existing economic load dispatch and unit commitment calculations it needs to upgrade the systems, improve the data communications among power system clients, overcome the complex operations and to increase the computing levels are very much required. This complexity in handling data indicates the need of very high level accuracy in control, operation and coordination capabilities with efficient computing techniques for online load dispatch and commitment of units monitoring. In the view of these existing problems and to overcome the difficulties associated computing in this proposal the on-line economic load dispatch and unit commitment calculations through cloud computing environment is proposed as shown in Fig. 5. This proposed work outlines a new approach to develop a cloud based power system solutions in a distributed environment without additional investment. This application developed will have a character of super scalability achieved through 100
Fig. 5. Proposed architecture of cloud computing based ELD and UC
1184
N. Kannan et al.
million servers offered by google cloud computing, and other famous vendors who hold large number of servers. ELD and UC based cloud computing model will become virtualization. The cloud computing model supports the power system clients in various places to have an interface for ELD and UC service. The required ELD and UC calculations results will arrive from the clouds server, not from the typical physical server. The Economic load dispatch calculations are done in different locations somewhere in the cloud which is not known by the individual power system client. This indicates the decentralized computing and strong virtualization. This high reliability model has a strongly able to recover under disaster or abnormal situation. The application and power system information will be back-up in three different positions of the architecture. Since the ELD and UC applications are developed in cloud computing it becomes scalable which cope up with the vast expansion of power system network. Usage of cloud computing technology makes the entire application cheaper because it is not required to buy any additional hardware facilities, natural fault tolerant facilities, effective utilization of computing resources and no maintenance charges. The short comings of the current economic load dispatch and unit commitment calculations will be eliminated in the proposed design of the research work. The system is classified in to four architecture layers as shown in Fig. 6. The physical layer, application layer, service oriented architecture layer and access layer. The Economic Load dispatch service will be deployed in cloud computing platform based SOA layer and various power system clients will access the server with their data through advance access layer. This model indicates five types of interface services such as data, graphics, topology, ELD calculation and web services. The cloud computing platform enables the communications through server, network, storage, and application virtualization.
Fig. 6. Virtualization of proposed economic load dispatch system.
Application of Cloud Computing for Economic Load Dispatch
1185
Since the legacy power system hardware structure itself utilized for implementation of cloud will not need to have an additional investment. Hence it will have a very attractive market share with other conventional energy management organizations. Even though the initial cost of the proposed cloud based system is costlier. Since the smart grid and micro grid is the future of power systems, the proposed model will be widely utilized by energy management organizations. So the financial risk is very minimal. This proposed cloud based architecture is robust, fault tolerant and highly secure in nature. This makes the implementation of cloud for the legacy power system gives good climate for business model. Cloud based Economic load dispatch calculations implementation creates job for the young electrical and computer engineers for the development, deployment and maintenance area.
4 Results and Conclusion After successful implementation of the proposed model brings out the cloud based distributed environment model to monitor at regular intervals the online load dispatch and commitment of generating units for a multi- area power system network in a distributed environment as shown in Figs. 7 and 8.
Fig. 7. Power system client side virtualization of economic load dispatch output
Fig. 8. Power system client side virtualization unit commitment output
1186
N. Kannan et al.
The proposed systems developed based on cloud architecture are highly scalable and sustainable. It can have the ability to interoperate with any kind of legacy power systems in a platform independent, language independent and architectural independent way through XMLisation of Power system data. Although the conventional calculations for load dispatch and commitment of units is available, the importance of this research emphasizes its uniqueness of interaction of large number of power systems clients access the service in a distributed environment which enables the strong interoperability nature of the various types of power system clients across various platforms. Each power systems client approaches the ELD cloud service with the dynamic power system data and able to get the results in a distribute environment at regular intervals. The proposed model will be tested and validated through standard IEEE Power system data. Accordingly, the outcome of the proposed work can be implemented for big real time power systems network. After implementing this system, load dispatch and commitment solutions for multi- area power systems network performed effectively to get the advantages in language independent, architectural independent and platform independent way. This proposed model will reduce the complexity in collecting, processing and analyzing the power system related and it also provides solutions to legacy issues. This deployment of new technology implementation will bring a good revolution in the energy management sector especially in smart grid and micro grid applications. It makes easier connection of renewable energy sources in to the grid. Many energy management centers will adopt this technology for the benefit of their service. Acknowledgements. This project was funded by the Deanship of Research Grant, King Abdulaziz University, Jeddah, Grant No (D-201-829-1440). The authors, therefore, gratefully acknowledge the DSR technical and financial support. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest. ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Thomas, S., Nithiyananthan, K.: A novel method to implement MPPT algorithms for PV panels on a MATLAB/SIMULINK environment. J. Adv. Res. Dyn. Control Syst. 10(4), 31– 40 (2018) 2. Gowrishankar, K., Thomas, S., Nithiyananthan, K.: Wireless integrated-sensor network based subsea tunnel monitoring system. J. Adv. Res. Dyn. Control Syst. 10(12), 647–658 (2018) 3. Gowrishankar, K., Nithiyananthan, K., Mani, P., Thomas, S.: GSM based dual power enhanced LED display notice board with motion detector. Int. J. Eng. Technol. 7(2.8 Special Issue 8), 559–566 (2018) 4. Sing, T.Y., Nithiyananthan, K., Siraj, S.E.B., Raguraman, R., Marimuthu, P.N.: Application of cluster analysis and association analysis model based power system fault ıdentification. Eur. J. Sci. Res. 138, 16–28 (2016)
Application of Cloud Computing for Economic Load Dispatch
1187
5. Nair, P., Nithiyananthan, K., Sing, T.Y., Raguraman, R., Siraj, S.E.B.: Enhanced R packagebased cluster analysis fault identification models for three phase power system network. Int. J. Bus. Intell. Data Min. 14, 106–120 (2019) 6. Nithiyananthan, K., Varma, S.: MATLAB simulations based ıdentification model for various points in global positioning system. Int. J. Comput. Appl. 138(13), 15–18 (2016) 7. Amudha, A., Nithiyananthan, K., Raja, S.S., Sundar, R.: Unit commitment calculator model for three phase power system network. Int. J. Power Renew. Energy Syst. 3(1), 37–42 (2016) 8. Gowrishankar, K., Nithiyananthan, K., Mani, P., Venkatesan, G.: Neural network based mathematical model for feed management technique in aquaculture. J. Adv. Res. Dyn. Control Syst. 9(Special Issue 18), 1142–1161 (2017) 9. Sing, T.Y., Siraj, S.E.B., Raguraman, R., Marimuthu, P.N., Nithiyananthan, K.: Cosine similarity cluster analysis model based effective power systems fault identification. Int. J. Adv. Appl. Sci. 4(1), 123–129 (2017) 10. Nair, P., Nithiyananthan, K., Dhinakar, P.: Design and development of variable frequency ultrasonic pest repeller. J. Adv. Res. Dyn. Control Syst. 9(Special Issue 12), 22–34 (2017) 11. Nithiyananthan, K., Sundar, R., Ranganathan, T., Raja, S.S.: Virtual instrumentation based simple economic load dispatch estimator model for 3-phase power system network. J. Adv. Res. Dyn. Control Syst. 9(11), 9–15 (2017) 12. Amudha, A., Raja, T.S., Nithiyananthan, K., Sundar, R.: Virtual stability estimator model for three phase power system network. Indones. J. Electr. Eng. Comput. Sci. 4(3), 520–525 (2016) 13. Nithiyananthan, K., Nair, M.P.: An effective cable sizing procedure model for ındustries and commercial buildings. Int. J. Electr. Comput. Eng. 6(1), 34–39 (2016) 14. Nair, M.P., Nithiyananthan, K.: Feasibility analysis model for mini hydropower plant in Tioman Island. Malaysia. Distrib. Gener. Altern. Energy J. 31(2), 36–54 (2016) 15. Umasankar, U., Nithiyananthan, K.: Environment friendly voltage up-gradation model for distribution power systems. Int. J. Electr. Comput. Eng. 6(6), 2516–2525 (2016) 16. Kumaar, G.S., Nithiyananthan, K., Thomas, S., Karthikeyan, S.P.: MATLAB/SIMULINK simulations based modified SEPIC DC to DC converter for renewable energy applications. J. Adv. Res. Dyn. Control Syst. 11(4), 285–295 (2019) 17. Venkatesan, G., Nithiyananthan, K.: Static and transient thermal analysis of power factor converters in switched reluctance motor drive. J. Adv. Res. Dyn. Control. Syst. 10(13), 1656–1662 (2018) 18. Ramachandran, V., Nithiyananthan, K.: Effective data compression model for on-line power system. Int. J. Eng. Model. 27(3–4), 101–109 (2014) 19. Nithiyananthan, K., Ramachandran, V.: Distributed mobile agent model for multi area power systems automated online state estimation. Int. J. Comput. Aided Eng. Technol. 5(4), 300– 310 (2013) 20. Ramachandran, V., Nithiyananthan, K.: Versioning-based service-oriented model for multiarea power system online economic load dispatch. Comput. Electr. Eng. 39(2), 433–440 (2013) 21. Ramachandran, V., Nithiyananthan, K.: Location ındependent distributed model for on-line load flow monitoring for multi-area power systems. Int. J. Eng. Model. 24(1–4), 21–27 (2011) 22. Nithiyananthan, K., Loomba, A.K.: MATLAB/Simulink based speed control model for converter controlled DC drives. Int. J. Eng. Model. 24(1–4), 49–56 (2011) 23. Thomas, S., Nithiyananthan, K., Eski, A.: Investigations on transient behavior of an energy conservation chopper fed DC series motor subjected to a change in duty cycle. J. Green Eng. 9(1), 92–111 (2019)
1188
N. Kannan et al.
24. Nithiyananthan, K., Ramachandran, V.: A plug and play model for JINI based on-line relay control for power system protection. Int. J. Eng. Model. 21(1–4), 65–68 (2008) 25. Ramachandran, V., Nithiyananthan, K.: A distributed model for capacitance requirements for self-excited induction generators. Int. J. Autom. Control 2(4), 519–525 (2008) 26. Ramachandran, V., Nithiyananthan, K.: Component model simulations for multi-area power system model for on-line economic load dispatch. Int. J. Emerg. Electr. Power Syst. 1(2). Art no: 1011 27. Elavenil, V., Nithiyananthan, K.: CYMGRD based effective earthling design model for substation. Int. J. Comput. Appl. Eng. Sci. 1(3), 341–346 (2011) 28. Nithiyananthan, K., Ramachandran, V.: Distributed mobile agent model for multi-area power system on-line economic load dispatch. Int. J. Eng. Model. 17(3–4), 87–90 (2004) 29. Mobarak, Y.A., Hemeida, A.M., Nithiyananthan, K.: Voltage and frequency based load dependent analysis model for Egyptian power system network. J. Adv. Res. Dyn. Control Syst. 11(6), 971–978 (2019) 30. Hau, L.K., Kasilingam, G., Nithiyananthan, K.: Development of prototype model for wireless based control pick and place robotic vehicle. TELKOMNIKA Indones. J. Electr. Eng. 14(1), 110–115 (2015) 31. Nithiyananthan, K., Ramachandran, V.: A distributed model for multi-area power systems on-line dynamic security analysis. In: IEEE Region 10 Annual International Conference, Proceedings/TENCON, vol. 3, pp. 1765–1767 (2002) 32. Nithiyananthan, K., Ramachandran, V.: EJB based component model for distributed load flow monitoring of multi-area power systems. Int. J. Eng. Model. 15(1–4), 63–67 (2002) 33. Huang, Q., Zhou, M., Zhang, Y., Wu, Z.: Exploiting cloud computing for power system analysis. In: 2010 International Conference on Power System Technology, Hangzhou, pp. 1– 6 (2010) 34. Dongxu, Y., Hua, W., Hongbo, W.: Architecture design of power system fault calculation based on cloud computing technology. In: 2017 IEEE Conference on Energy Internet and Energy System Integration (EI2), Beijing, pp. 1–5 (2017) 35. Pepermans, G., et al.: Cloud generation: definition, benefits and issues. Energy Policy 33(6), 787–798 (2005) 36. Di Santo, M., et al.: A cloud architecture for online power systems security analysis. IEEE Trans. Ind. Electron. 51(6), 1238–1248 (2004) 37. Li, Z., Liu, Y.: Reactive power optimization using agent based grid computing. In: The 7th International Power Engineering Conference (2005) 38. Huang, Q., Qin, K., Wang, W.: Development of a grid computing platform for electric power system applications. In: Power Engineering Society General Meeting. IEEE (2006) 39. AI-Khannak, R., Bitzer, B.: Load balancing for cloud and ıntegrated power systems using grid computing. In: Clean Electrical Power, ICCEP 2007 (2007) 40. Schainker, R., et al.: On-line dynamic stability analysis using cloud computing. In: Power and Energy Society General Meeting – Conversion and Delivery of Electrical Energy in the 21st Century. IEEE (2008) 41. Huang, Q., et al.: Cloud state estimation with PMU using grid computing. In: Power & Energy Society General Meeting. IEEE (2009) 42. Ali, M., Dong, Z.Y., Zhang, P.: Adoptability of grid computing technology in power systems analysis, operations and control. IET Gener. Transm. Distrib. 3(10), 949–959 (2009) 43. Di Silvestre, M.L., Gallo, P., Ippolito, M.G., Sanseverino, E.R., Sciumè, G., Zizzo, G.: An energy blockchain, a use case on tendermint. In: 2018 IEEE International Conference on Environment and Electrical Engineering and 2018 IEEE Industrial and Commercial Power Systems Europe (EEEIC/I&CPS Europe), Palermo, pp. 1–5 (2018)
Application of Cloud Computing for Economic Load Dispatch
1189
44. Vecchiola, C., Pandey, S., Buyya, R.: High-performance cloud computing: a view of scientific applications. In: 10th International Symposium on Pervasive Systems, Algorithms and Networks (ISPAN), pp. 4–16 (2009) 45. Evangelinos, C., Hill, C.: Cloud computing for parallel scientific HPC applications: feasibility of running coupled atmosphere-ocean climate models on Amazon’s EC2. In: Cloud Computing and Its Applications (2008) 46. Feng, Y., Li, P., Liang, S.: The design concepts of power analysis software for smart dispatching in china southern power grid. South. Power Syst. Technol. 4(1), 29–34 (2010). (in Chinese) 47. Geberslassie, M., Bitzer, B.: Cloud computing for renewable power systems. In: International Conference on Renewable Energies and Power Quality (2012) 48. Agamah, S., Ekonomou, L.: A PHP application library for web-based power systems analysis. In: European Modelling Symposium. IEEE (2015) 49. Milano, F.: An open source power system analysis toolbox. IEEE Trans. Power Syst. 20(3), 1199–1206 (2005) 50. Saha, A.K.: Challenges in power systems operations, analyses, computations and application of cloud computing in electrical power systems. In: 2018 IEEE PES/IAS PowerAfrica, Cape Town, pp. 208–213 (2018) 51. Wang, X.Z., Ge, Z.Q., Ge, M.H., Wang, L., Li, L.: The research on electric power control center credit monitoring and management using cloud computing and smart workflow. In: 2018 China International Conference on Electricity Distribution (CICED), Tianjin, pp. 2732–2735 (2018) 52. Venugopal, G., Jees, S.A., Kumar, T.A.: Cloud model for power system contingency analysis. In: 2013 International Conference on Renewable Energy and Sustainable Energy (ICRESE), Coimbatore, pp. 26–31 (2013). https://doi.org/10.1109/ICRESE.2013.6927827
Hamiltonian Fuzzy Cycles in Generalized Quartic Fuzzy Graphs with Girth k N. Jayalakshmi1,2 , S. Vimal Kumar3(B) , and P. Thangaraj4 1
Bharathiar University, Coimbatore 641046, India Department of Mathematics, RVS Technical Campus-Coimbatore, Coimbatore 641402, Tamilnadu, India [email protected] Department of Mathematics, Dr.R.K. Shanmugam College of Arts & Science, Indili, Kallakurichi 606213, Tamilnadu, India [email protected] 4 Department of CSE, Bannari Amman Institute of Technology, Sathiamangalam 638401, Tamilnadu, India [email protected] 2
3
Abstract. In this paper, we show that the Hamiltonian fuzzy cycles in generalized quartic fuzzy graph on n(n ≥ 19) vertices with girth k, where k = 10, 11, 12, 13, 14, .... This, together with the result of Hamiltonian fuzzy cycles in Quartic fuzzy graphs with girth k, k = 3, 4, 5, 6, 7, 8, 9. Moreover, we prove that the maximum number of Hamiltonian fuzzy cycles in generalized quartic fuzzy graph on n(n ≥ 6) vertices with girth k(k ≥ 3). Keywords: Fuzzy Graph · Quartic fuzzy graph Hamiltonian fuzzy cycles · Girth
1
· Hamiltonian cycles ·
Introduction
Graphs theory serve as mathematical models to analyse with success several concrete real-world issues. In modern world, fuzzy graph theory is extremely young it’s been growing in no time and has various applications modern science and technology particularly within the fields of information technology, neural network and cluster analysis, medical diagnosis. The authors in [8] discussed the total degree for totally regular fuzzy graphs. [5] addressed the concept Fuzzy If-Then Rule in k-Regular Fuzzy Graph. Vimal Kumar et al. [6,12] introduced the concept of domination and dynamic chromatic number for 4-regular graphs with girth 3. Rosenfeld et al. [10] discussed the paths, cycles and connectedness results for fuzzy graphs. The author [9], proposed the concept of Hyper-Hamiltonian generalized Petersen graphs. Recently, Vimal Kumar et al. [3] introduced the concept of Hamiltonian fuzzy cycles in Quartic fuzzy graphs with girth k, k = 3, 4, 5, 6, 7, 8, 9. Throughout this paper, we discuss the various aspects of Hamiltonian fuzzy cycles (HFCs) in Quartic fuzzy graphs (QFGs) c Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 1190–1202, 2020. https://doi.org/10.1007/978-3-030-37218-7_125
Hamiltonian Fuzzy Cycles in Generalized Quartic Fuzzy Graphs
1191
with girth k, where k = 10, 11, 12, 13, 14, .... For more information about regular graphs and fuzzy graphs we refer the reader to [1,2,4,7,11]. In this paper, our aim is to extend the result from [3] and present to find the maximum number (max. no.) of Hamiltonian fuzzy cycles (HFCs) in generalized quartic fuzzy graphs (GQFGs) with girth k for n ≥ 6 and k = 3, 4, 5, ... 1.1
Preliminaries
We, first give a few definitions are useful for development in the succeeding articles. Definition 1.1.1. [3]. A graph G∗ : (V, E) is said to be a Quartic (4-regular), if each vertex has degree 4 or d(v) = 4 for all v ∈ V . Definition 1.1.2. [3]. A cycle C in a graph G∗ : (V, E) is said to be a Hamiltonian cycle, if it covers all the vertices of G∗ exactly once except the end vertices. Definition 1.1.3. [3]. The girth of G∗ is the length of a shortest cycle in G∗ and it is denoted by k. Definition 1.1.4. [3]. Let V be a non-empty set. A fuzzy graph G : (σ, μ) is a pair of function σ : V → [0, 1] and μ : V × V → [0, 1] such that μ(v1 , v2 ) ≤ μ(v1 ) ∧ μ(v2 ) for all v1 , v2 in V . Here μ(v1 ) ∧ μ(v2 ) denotes the minimum of μ(v1 ) & μ(v2 ) and v1 v2 denotes the edge between v1 and v2 . Definition 1.1.5. [3]. Let G : (σ, μ) be a fuzzy graph on G∗ : (V, E). A graph G is Quartic fuzzy graph, if each vertex σ(v) in G has same degree κ, i.e., d[σ(v)] = κ for all v ∈ V . Definition 1.1.6. [3]. In a fuzzy graph G a fuzzy cycle C covers all the vertices of G exactly once except the end vertices then the cycle is called Hamiltonian fuzzy cycle. The GQFGs with girth 3, 4 and 5 are presented in Figs. 1, 2 and 3, respectively.
Fig. 1. Fuzzy graphs G with girth 3
1192
N. Jayalakshmi et al.
Fig. 2. Fuzzy graphs G with girth 4
2
Fig. 3. Fuzzy graphs G with girth 5
Main Results
In this section, we provide the HFCs in GQFGs with girth k, for k = 10, 11, 12, ..., 20. Finally, the main theorem, we provide the HFCs in GQFGs with girth k, for k = 3, 4, 5, .... Theorem 2.1. For n ≥ 19, GQFG with girth 10 is decomposable into 1, if n is a multiple of 3 , HFCs. 2, otherwise Proof. Let G be a GQFG on n(n ≥ 19) vertices with girth 10. Let v1 be an initial and a terminal vertex. By the definition of HFCs as following cases: Case 1: Assume that n is multiples 3, Cycle C1 : σ(v1 )σ(v2 )σ(v3 )...σ(vn−1 )σ(vn )σ(v1 ). Therefore, the max. no. of HFCs in GQFGs with girth 10 is one. Case 2: Assume that n is not a multiples 3, Cycle C1 : σ(v1 )σ(v2 )σ(v3 )...σ(vn−1 )σ(vn )σ(v1 ). Cycle C2 : P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 or P1 P9 P8 P7 P6 P5 P4 P3 P2 P10 or P1 P5 P9 P4 P8 P3 P7 P6 P2 P10 or P1 P8 P6 P4 P2 P9 P7 P5 P3 P10 or P1 P6 P2 P7 P3 P8 P4 P9 P5 P10 or P1 P3 P5 P7 P9 P2 P4 P6 P8 P10 , where P1 = σ(v1 )σ(v10 )σ(v19 )σ(v28 )σ(v37 )...σ(v9m−8 ), P2 = σ(v2 )σ(v11 )σ(v20 )σ(v29 )σ(v38 )...σ(v9m−7 ), P3 = σ(v3 )σ(v12 )σ(v21 )σ(v30 )σ(v39 )...σ(v9m−6 ), P4 = σ(v4 )σ(v13 )σ(v22 )σ(v31 )σ(v40 )...σ(v9m−5 ), P5 = σ(v5 )σ(v14 )σ(v23 )σ(v32 )σ(v41 )...σ(v9m−4 ), P6 = σ(v6 )σ(v15 )σ(v24 )σ(v33 )σ(v42 )...σ(v9m−3 ), P7 = σ(v7 )σ(v16 )σ(v25 )σ(v34 )σ(v43 )...σ(v9m−2 ), P8 = σ(v8 )σ(v17 )σ(v26 )σ(v35 )σ(v44 )...σ(v9m−1 ), P9 = σ(v9 )σ(v18 )σ(v27 )σ(v36 )σ(v45 )...σ(v9m ),
Hamiltonian Fuzzy Cycles in Generalized Quartic Fuzzy Graphs
1193
P10 = σ(v1 ) and v9m−8 ,v9m−7 ,v9m−6 ,v9m−5 ,v9m−4 ,v9m−3 , v9m−2 , v9m−1 ,v9m ≤ vn , m = 1, 2, 3,... Therefore, the max. no. of HFCs in GQFGs with girth 10 is two. This completes the proof. Theorem 2.2. For n ≥ 21, GQFG with girth 11 is decomposable into 1, if n is even , HFCs. 2, otherwise Proof. Let G be a GQFG on n(n ≥ 21) vertices with girth 11. Let v1 be an initial and a terminal vertex. By the definition of HFCs as following cases: Case 1: Suppose n is even, Cycle C1 : σ(v1 )σ(v2 )σ(v3 )...σ(vn−1 )σ(vn )σ(v1 ). Therefore, the max. no. of HFCs in GQFGs with girth 11 is one. Case 2: Suppose n is not even, Cycle C1 : σ(v1 )σ(v2 )σ(v3 )...σ(vn−1 )σ(vn )σ(v1 ). Cycle C3 : P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 P10 or P11 P20 P19 P18 P17 P16 P15 P14 P13 P12 P10 or P11 P14 P17 P20 P13 P16 P19 P12 P15 P18 P10 or P11 P18 P15 P12 P19 P16 P13 P20 P17 P14 P10 or P11 P19 P17 P15 P13 P20 P18 P16 P14 P12 P10 or P11 P13 P15 P17 P19 P12 P14 P16 P18 P20 P10 , where P11 = σ(v1 )σ(v11 )σ(v21 )σ(v31 )σ(v41 )...σ(v10m−9 ), P12 = σ(v2 )σ(v12 )σ(v22 )σ(v32 )σ(v42 )...σ(v10m−8 ), P13 = σ(v3 )σ(v13 )σ(v23 )σ(v33 )σ(v43 )...σ(v10m−7 ), P14 = σ(v4 )σ(v14 )σ(v24 )σ(v34 )σ(v44 )...σ(v10m−6 ), P15 = σ(v5 )σ(v15 )σ(v25 )σ(v35 )σ(v45 )...σ(v10m−5 ), P16 = σ(v6 )σ(v16 )σ(v26 )σ(v36 )σ(v46 )...σ(v10m−4 ), P17 = σ(v7 )σ(v17 )σ(v27 )σ(v37 )σ(v47 )...σ(v10m−3 ), P18 = σ(v8 )σ(v18 )σ(v28 )σ(v38 )σ(v48 )...σ(v10m−2 ), P19 = σ(v9 )σ(v19 )σ(v29 )σ(v39 )σ(v49 )...σ(v10m−1 ), P20 = σ(v10 )σ(v20 )σ(v30 )σ(v40 )σ(v50 )...σ(v10m ), P10 = σ(v1 ) and v10m−9 , v10m−8 , v10m−7 , v10m−6 , v10m−5 , v10m−4 , v10m−3 , v10m−2 , v10m−1 , v10m ≤ vn , m = 1, 2, 3,... Therefore, the max. no. of HFCs in GQFGs with girth 11 is two. This completes the proof. Theorem 2.3. For n ≥ 23, GQFG with girth 12 is decomposable into 1, if n is multiple of 11 , HFCs. 2, otherwise Proof. Let G be a GQFG on n(n ≥ 23) vertices with girth 12. Let v1 be an initial and a terminal vertex. By the definition of HFCs as following cases: Case 1: Assume that n is multiple of 11, Cycle C1 : σ(v1 )σ(v2 )σ(v3 )...σ(vn−1 )σ(vn )σ(v1 ). Therefore, the max. no. of HFCs in GQFGs with girth 12 is one.
1194
N. Jayalakshmi et al.
Case 2: Assume that n is not a multiple of 11, Cycle C1 : σ(v1 )σ(v2 )σ(v3 )...σ(vn−1 )σ(vn )σ(v1 ). Cycle C4 : P21 P22 P23 P24 P25 P26 P27 P28 P29 P30 P31 P10 or P21 P31 P30 P29 P28 P27 P26 P25 P24 P23 P22 P10 or P21 P30 P28 P26 P24 P22 P31 P29 P27 P25 P23 P10 or P21 P23 P25 P27 P29 P31 P22 P24 P26 P28 P30 P10 or P21 P25 P29 P22 P26 P30 P23 P27 P31 P24 P28 P10 or P21 P26 P31 P25 P30 P24 P29 P23 P28 P22 P27 P10 , where P21 = σ(v1 )σ(v12 )σ(v23 )σ(v34 )σ(v45 )...σ(v11m−10 ), P22 = σ(v2 )σ(v13 )σ(v24 )σ(v35 )σ(v46 )...σ(v11m−9 ), P23 = σ(v3 )σ(v14 )σ(v25 )σ(v36 )σ(v47 )...σ(v11m−8 ), P24 = σ(v4 )σ(v15 )σ(v26 )σ(v37 )σ(v48 )...σ(v11m−7 ), P25 = σ(v5 )σ(v16 )σ(v27 )σ(v38 )σ(v49 )...σ(v11m−6 ), P26 = σ(v6 )σ(v17 )σ(v28 )σ(v39 )σ(v50 )...σ(v11m−5 ), P27 = σ(v7 )σ(v18 )σ(v29 )σ(v40 )σ(v51 )...σ(v11m−4 ), P28 = σ(v8 )σ(v19 )σ(v30 )σ(v41 )σ(v52 )...σ(v11m−3 ), P29 = σ(v9 )σ(v20 )σ(v31 )σ(v42 )σ(v53 )...σ(v11m−2 ), P30 = σ(v10 )σ(v21 )σ(v32 )σ(v43 )σ(v54 )...σ(v11m−1 ), P31 = σ(v11 )σ(v22 )σ(v33 )σ(v44 )σ(v55 )...σ(v11m ), P10 = σ(v1 ) and v11m−10 , v11m−9 , v11m−8 , v11m−7 , v11m−6 , v11m−5 , v11m−4 , v11m−3 , v11m−2 , v11m−1 , v11m ≤ vn , m = 1, 2, 3,... Therefore, the max. no. of HFCs in GQFGs with girth 12 is two. This completes the proof. Theorem 2.4. For n ≥ 25, GQFG with girth 13 is decomposable into 1, if n is even and multiple of 27 , HFCs. 2, otherwise Proof. Let G be a GQFG on n(n ≥ 25) vertices with girth 13. Let v1 be an initial and a terminal vertex. By the definition of HFCs as following cases: Case 1: Assume that n is even and multiple of 27, Cycle C1 : σ(v1 )σ(v2 )σ(v3 )...σ(vn−1 )σ(vn )σ(v1 ). Therefore, the max. no. of HFCs in GQFGs with girth 13 is one. Case 2: Assume that n is odd and not a multiple of 27, Cycle C1 : σ(v1 )σ(v2 )σ(v3 )...σ(vn−1 )σ(vn )σ(v1 ). Cycle C5 : P32 P33 P34 P35 P36 P37 P38 P39 P40 P41 P42 P43 P10 or P32 P43 P42 P41 P40 P39 P38 P37 P36 P35 P34 P33 P10 or P32 P39 P34 P41 P36 P43 P38 P33 P40 P35 P42 P37 P10 or P32 P37 P42 P35 P40 P33 P38 P43 P36 P41 P34 P39 P10 , where P32 = σ(v1 )σ(v13 )σ(v25 )σ(v37 )σ(v49 )...σ(v12m−11 ), P33 = σ(v2 )σ(v14 )σ(v26 )σ(v38 )σ(v50 )...σ(v12m−10 ), P34 = σ(v3 )σ(v15 )σ(v27 )σ(v39 )σ(v51 )...σ(v12m−9 ), P35 = σ(v4 )σ(v16 )σ(v28 )σ(v40 )σ(v52 )...σ(v12m−8 ), P36 = σ(v5 )σ(v17 )σ(v29 )σ(v41 )σ(v53 )...σ(v12m−7 ), P37 = σ(v6 )σ(v18 )σ(v30 )σ(v42 )σ(v54 )...σ(v12m−6 ), P38 = σ(v7 )σ(v19 )σ(v31 )σ(v43 )σ(v55 )...σ(v12m−5 ), P39 = σ(v8 )σ(v20 )σ(v32 )σ(v44 )σ(v56 )...σ(v12m−4 ), P40 = σ(v9 )σ(v21 )σ(v33 )σ(v45 )σ(v57 )...σ(v12m−3 ),
Hamiltonian Fuzzy Cycles in Generalized Quartic Fuzzy Graphs
1195
P41 = σ(v10 )σ(v22 )σ(v34 )σ(v46 )σ(v58 )...σ(v12m−2 ), P42 = σ(v11 )σ(v23 )σ(v35 )σ(v47 )σ(v59 )...σ(v12m−1 ), P43 = σ(v12 )σ(v24 )σ(v36 )σ(v48 )σ(v60 )...σ(v12m ), P10 = σ(v1 ) and v12m−11 , v12m−10 , v12m−9 , v12m−8 , v12m−7 , v12m−6 , v12m−5 , v12m−4 , v12m−3 , v12m−2 , v12m−1 , v12m ≤ vn , m = 1, 2, 3,... Therefore, the max. no. of HFCs in GQFGs with girth 13 is two. This completes the proof. Theorem 2.5. For n ≥ 27, GQFG with girth 14 is decomposable into 1, if n is multiple of 13 , HFCs. 2, otherwise Proof. Let G be a GQFG on n(n ≥ 27) vertices with girth 14. Let v1 be an initial and a terminal vertex. By the definition of HFCs as following cases: Case 1: Assume that n is multiple of 13, Cycle C1 : σ(v1 )σ(v2 )σ(v3 )...σ(vn−1 )σ(vn )σ(v1 ). Therefore, the max. no. of HFCs in quartic fuzzy graphs with girth 14 is one. Case 2: Assume that n is not a multiple of 13, Cycle C1 : σ(v1 )σ(v2 )σ(v3 )...σ(vn−1 )σ(vn )σ(v1 ). Cycle C6 : P44 P45 P46 P47 P48 P49 P50 P51 P52 P53 P54 P55 P56 P10 or P44 P54 P51 P48 P45 P55 P52 P49 P46 P56 P53 P50 P47 P10 or P44 P52 P47 P55 P50 P45 P53 P48 P56 P51 P46 P54 P49 P10 or P44 P50 P56 P49 P55 P48 P54 P47 P53 P56 P52 P45 P51 P10 or P44 P46 P48 P50 P52 P54 P56 P45 P47 P49 P51 P53 P55 P10 or P44 P53 P49 P45 P54 P50 P46 P55 P51 P47 P56 P52 P48 P10 or P44 P55 P53 P51 P49 P47 P45 P56 P54 P52 P50 P48 P46 P10 , where P44 = σ(v1 )σ(v14 )σ(v27 )σ(v40 )σ(v53 )...σ(v13m−12 ), P45 = σ(v2 )σ(v15 )σ(v28 )σ(v41 )σ(v54 )...σ(v13m−11 ), P46 = σ(v3 )σ(v16 )σ(v29 )σ(v42 )σ(v55 )...σ(v13m−10 ), P47 = σ(v4 )σ(v17 )σ(v30 )σ(v43 )σ(v56 )...σ(v13m−9 ), P48 = σ(v5 )σ(v18 )σ(v31 )σ(v44 )σ(v57 )...σ(v13m−8 ), P49 = σ(v6 )σ(v19 )σ(v32 )σ(v45 )σ(v58 )...σ(v13m−7 ), P50 = σ(v7 )σ(v20 )σ(v33 )σ(v46 )σ(v59 )...σ(v13m−6 ), P51 = σ(v8 )σ(v21 )σ(v34 )σ(v47 )σ(v60 )...σ(v13m−5 ), P52 = σ(v9 )σ(v22 )σ(v35 )σ(v48 )σ(v61 )...σ(v13m−4 ), P53 = σ(v10 )σ(v23 )σ(v36 )σ(v49 )σ(v62 )...σ(v13m−3 ), P54 = σ(v11 )σ(v24 )σ(v37 )σ(v50 )σ(v63 )...σ(v13m−2 ), P55 = σ(v12 )σ(v25 )σ(v38 )σ(v51 )σ(v64 )...σ(v13m−1 ), P56 = σ(v13 )σ(v26 )σ(v39 )σ(v52 )σ(v65 )...σ(v13m ), P10 = σ(v1 ) and v13m−12 , v13m−11 , v13m−10 , v13m−9 , v13m−8 , v13m−7 , v13m−6 ,v13m−5 , v13m−4 , v13m−3 , v13m−2 , v13m−1 , v13m ≤ vn , m = 1, 2, 3,... Therefore, the max. no. of HFCs in GQFGs with girth 14 is two. This completes the proof. Theorem 2.6. For n ≥ 29, GQFG with girth 15 is decomposable into 1, if n is even , HFCs. 2, otherwise
1196
N. Jayalakshmi et al.
Proof. Let G be a GQFG on n(n ≥ 29) vertices with girth 15. Let v1 be an initial and a terminal vertex. By the definition of HFCs as following cases: Case 1: Assume that n is even, Cycle C1 : σ(v1 )σ(v2 )σ(v3 )...σ(vn−1 )σ(vn )σ(v1 ). Therefore, the max. no. of HFCs in quartic fuzzy graphs with girth 15 is one. Case 2: Assume that n is odd, Cycle C1 : σ(v1 )σ(v2 )σ(v3 )...σ(vn−1 )σ(vn )σ(v1 ). Cycle C7 : P57 P58 P59 P60 P61 P62 P63 P64 P65 P66 P67 P68 P69 P70 P10 or P57 P70 P69 P68 P67 P66 P65 P64 P63 P62 P61 P60 P59 P58 P10 or P57 P60 P63 P66 P69 P58 P61 P64 P67 P70 P59 P62 P65 P68 P10 or P57 P62 P67 P58 P63 P68 P59 P64 P69 P60 P65 P70 P61 P68 P10 or P57 P59 P61 P63 P65 P67 P69 P58 P60 P62 P64 P66 P68 P70 P10 or P57 P68 P65 P62 P59 P70 P67 P64 P61 P58 P69 P66 P63 P60 P10 , where P57 = σ(v1 )σ(v15 )σ(v29 )σ(v43 )σ(v57 )...σ(v14m−13 ), P58 = σ(v2 )σ(v16 )σ(v30 )σ(v44 )σ(v58 )...σ(v14m−12 ), P59 = σ(v3 )σ(v17 )σ(v31 )σ(v45 )σ(v59 )...σ(v14m−11 ), P60 = σ(v4 )σ(v18 )σ(v32 )σ(v46 )σ(v60 )...σ(v14m−10 ), P61 = σ(v5 )σ(v19 )σ(v33 )σ(v47 )σ(v61 )...σ(v14m−9 ), P62 = σ(v6 )σ(v20 )σ(v34 )σ(v48 )σ(v62 )...σ(v14m−8 ), P63 = σ(v7 )σ(v21 )σ(v35 )σ(v49 )σ(v63 )...σ(v14m−7 ), P64 = σ(v8 )σ(v22 )σ(v36 )σ(v50 )σ(v64 )...σ(v14m−6 ), P65 = σ(v9 )σ(v23 )σ(v37 )σ(v51 )σ(v65 )...σ(v14m−5 ), P66 = σ(v10 )σ(v24 )σ(v38 )σ(v52 )σ(v66 )...σ(v14m−4 ), P67 = σ(v11 )σ(v25 )σ(v39 )σ(v53 )σ(v67 )...σ(v14m−3 ), P68 = σ(v12 )σ(v26 )σ(v40 )σ(v54 )σ(v68 )...σ(v14m−2 ), P69 = σ(v13 )σ(v27 )σ(v41 )σ(v55 )σ(v69 )...σ(v14m−1 ), P70 = σ(v14 )σ(v28 )σ(v42 )σ(v56 )σ(v70 )...σ(v14m ), P10 = σ(v1 ) and v12m−11 , v12m−10 , v12m−9 , v12m−8 , v12m−7 , v12m−6 , v12m−5 , v12m−4 , v12m−3 , v12m−2 , v12m−1 , v12m ≤ vn , m = 1, 2, 3,... Therefore, the max. no. of HFCs in GQFGs with girth 15 is two. This completes the proof. Theorem 2.7. For n ≥ 31, GQFG with girth 16 is decomposable into 1, if n is multiple of 15 , HFCs. 2, otherwise Proof. Let G be a GQFG on n(n ≥ 31) vertices with girth 16. Let v1 be an initial and a terminal vertex. By the definition of HFCs as following cases: Case 1: Assume that n is multiple of 15, Cycle C1 : σ(v1 )σ(v2 )σ(v3 )...σ(vn−1 )σ(vn )σ(v1 ). Therefore, the max. no. of HFCs in GQFGs with girth 16 is one. Case 2: Assume that n is not a multiple of 15, Cycle C1 : σ(v1 )σ(v2 )σ(v3 )...σ(vn−1 )σ(vn )σ(v1 ). Cycle C8 : P71 P72 P73 P74 P75 P76 P77 P78 P79 P80 P81 P82 P83 P84 P85 P10 or P71 P85 P84 P83 P82 P81 P80 P79 P78 P77 P76 P75 P74 P73 P72 P10 or P71 P84 P82 P80 P78 P76 P74 P72
Hamiltonian Fuzzy Cycles in Generalized Quartic Fuzzy Graphs
1197
P85 P83 P81 P79 P77 P75 P73 P10 or P71 P52 P78 P74 P85 P81 P77 P73 P84 P80 P86 P72 P83 P79 P75 P10 or P71 P75 P79 P83 P72 P76 P80 P84 P73 P77 P81 P85 P74 P78 P82 P10 or P71 P78 P85 P77 P84 P76 P83 P75 P82 P74 P81 P83 P80 P72 P79 P10 , where P71 = σ(v1 )σ(v16 )σ(v31 )σ(v46 )σ(v61 )...σ(v15m−14 ), P72 = σ(v2 )σ(v17 )σ(v32 )σ(v47 )σ(v62 )...σ(v15m−13 ), P73 = σ(v3 )σ(v18 )σ(v33 )σ(v48 )σ(v63 )...σ(v15m−12 ), P74 = σ(v4 )σ(v19 )σ(v34 )σ(v49 )σ(v64 )...σ(v15m−11 ), P75 = σ(v5 )σ(v20 )σ(v35 )σ(v50 )σ(v65 )...σ(v15m−10 ), P76 = σ(v6 )σ(v21 )σ(v36 )σ(v51 )σ(v66 )...σ(v15m−9 ), P77 = σ(v7 )σ(v22 )σ(v37 )σ(v52 )σ(v67 )...σ(v15m−8 ), P78 = σ(v8 )σ(v23 )σ(v38 )σ(v53 )σ(v68 )...σ(v15m−7 ), P79 = σ(v9 )σ(v24 )σ(v39 )σ(v54 )σ(v69 )...σ(v15m−6 ), P80 = σ(v10 )σ(v25 )σ(v40 )σ(v55 )σ(v70 )...σ(v15m−5 ), P81 = σ(v11 )σ(v26 )σ(v41 )σ(v56 )σ(v71 )...σ(v15m−4 ), P82 = σ(v12 )σ(v27 )σ(v42 )σ(v57 )σ(v72 )...σ(v15m−3 ), P83 = σ(v13 )σ(v28 )σ(v43 )σ(v58 )σ(v73 )...σ(v15m−2 ), P84 = σ(v14 )σ(v29 )σ(v44 )σ(v59 )σ(v74 )...σ(v15m−1 ), P85 = σ(v15 )σ(v30 )σ(v45 )σ(v60 )σ(v75 )...σ(v15m ), P10 = σ(v1 ) and v15m−14 , v15m−13 , v15m−12 , v15m−11 , v15m−10 , v15m−9 , v15m−8 , v15m−7 , v15m−6 , v15m−5 , v15m−4 , v15m−3 , v15m−2 , v15m−1 , v15m ≤ vn , m = 1, 2, 3,... Therefore, the max. no. of HFCs in GQFGs with girth 16 is two. This completes the proof. Theorem 2.8. For n ≥ 33, GQFG with girth 17 is decomposable into 1, if n is even , HFCs. 2, otherwise Proof. Let G be a GQFG on n(n ≥ 33) vertices with girth 17. Let v1 be an initial and a terminal vertex. By the definition of HFCs as following cases: Case 1: Assume that n is even, Cycle C1 : σ(v1 )σ(v2 )σ(v3 )...σ(vn−1 )σ(vn )σ(v1 ). Therefore, the max. no. of HFCs in GQFGs with girth 17 is one. Case 2: Assume that n is odd, Cycle C1 : σ(v1 )σ(v2 )σ(v3 )...σ(vn−1 )σ(vn )σ(v1 ). Cycle C9 : P86 P87 P88 P89 P90 P91 P92 P93 P94 P95 P96 P97 P98 P99 P100 P101 P10 or P86 P101 P100 P99 P98 P97 P96 P95 P94 P93 P92 P91 P90 P89 P88 P87 P10 or P86 P97 P92 P87 P98 P93 P88 P99 P94 P89 P100 P95 P90 P101 P96 P91 P10 or P86 P99 P96 P93 P90 P87 P100 P97 P94 P91 P88 P101 P98 P95 P92 P89 P10 , or P86 P95 P88 P97 P90 P99 P92 P101 P94 P87 P96 P89 P98 P91 P100 P93 P10 , where P86 = σ(v1 )σ(v17 )σ(v33 )σ(v49 )σ(v65 )...σ(v16m−15 ), P87 = σ(v2 )σ(v18 )σ(v34 )σ(v50 )σ(v66 )...σ(v16m−14 ), P88 = σ(v3 )σ(v19 )σ(v35 )σ(v51 )σ(v67 )...σ(v16m−13 ), P89 = σ(v4 )σ(v20 )σ(v36 )σ(v52 )σ(v68 )...σ(v16m−12 ), P90 = σ(v5 )σ(v21 )σ(v37 )σ(v53 )σ(v69 )...σ(v16m−11 ), P91 = σ(v6 )σ(v22 )σ(v38 )σ(v54 )σ(v70 )...σ(v16m−10 ),
1198
N. Jayalakshmi et al.
P92 = σ(v7 )σ(v23 )σ(v39 )σ(v55 )σ(v71 )...σ(v16m−9 ), P93 = σ(v8 )σ(v24 )σ(v40 )σ(v56 )σ(v72 )...σ(v16m−8 ), P94 = σ(v9 )σ(v25 )σ(v41 )σ(v57 )σ(v73 )...σ(v16m−7 ), P95 = σ(v10 )σ(v26 )σ(v42 )σ(v58 )σ(v74 )...σ(v16m−6 ), P96 = σ(v11 )σ(v27 )σ(v43 )σ(v59 )σ(v75 )...σ(v16m−5 ), P97 = σ(v12 )σ(v28 )σ(v44 )σ(v60 )σ(v76 )...σ(v16m−4 ), P98 = σ(v13 )σ(v29 )σ(v45 )σ(v61 )σ(v77 )...σ(v16m−3 ), P99 = σ(v14 )σ(v30 )σ(v46 )σ(v62 )σ(v78 )...σ(v16m−2 ), P100 = σ(v15 )σ(v31 )σ(v47 )σ(v63 )σ(v79 )...σ(v16m−1 ), P101 = σ(v16 )σ(v32 )σ(v48 )σ(v64 )σ(v80 )...σ(v16m ), P10 = σ(v1 ) and v16m−15 , v16m−14 , v16m−13 , v16m−12 , v16m−11 , v16m−10 , v16m−9 , v16m−8 , v16m−7 , v16m−6 , v16m−5 , v16m−4 , v16m−3 , v16m−2 , v16m−1 , v16m ≤ vn , m = 1, 2, 3,... Therefore, the max. no. of HFCs in GQFGs with girth 17 is two. This completes the proof. Theorem 2.9. For n ≥ 35, GQFG with girth 18 is decomposable into 1, if n is multiple of 17 , HFCs. 2, otherwise Proof. Let G be a GQFG on n(n ≥ 35) vertices with girth 18. Let v1 be an initial and a terminal vertex. By the definition of HFCs as following cases: Case 1: Assume that n is multiple of 17, Cycle C1 : σ(v1 )σ(v2 )σ(v3 )...σ(vn−1 )σ(vn )σ(v1 ). Therefore, the max. no. of HFCs in GQFGs with girth 18 is one. Case 2: Assume that n is not a multiple of 17, Cycle C1 : σ(v1 )σ(v2 )σ(v3 )...σ(vn−1 )σ(vn )σ(v1 ). Cycle C10 : P102 P103 P104 P105 P106 P107 P108 P109 P110 P111 P112 P113 P114 P115 P116 P117 P118 P10 or P102 P118 P117 P116 P115 P114 P113 P112 P111 P110 P109 P108 P107 P106 P105 P104 P103 P10 or P102 P104 P106 P108 P110 P112 P114 P116 P118 P103 P105 P107 P109 P111 P113 P115 P117 P10 or P102 P106 P110 P114 P118 P105 P109 P111 P113 P117 P104 P108 P112 P116 P103 P107 P105 P10 or P102 P107 P112 P117 P105 P110 P115 P103 P108 P113 P118 P106 P111 P116 P104 P109 P104 P10 or P102 P105 P108 P111 P114 P117 P103 P106 P109 P112 P115 P118 P104 P107 P110 P113 P116 P10 or P102 P110 P118 P109 P117 P108 P116 P107 P115 P114 P105 P113 P104 P112 P103 P111 P106 P10 , or P102 P111 P103 P112 P104 P103 P105 P114 P106 P115 P107 P116 P108 P117 P109 P118 P110 P10 , where P102 = σ(v1 )σ(v18 )σ(v35 )σ(v52 )σ(v69 )...σ(v17m−16 ), P103 = σ(v2 )σ(v19 )σ(v36 )σ(v53 )σ(v70 )...σ(v17m−15 ), P104 = σ(v3 )σ(v20 )σ(v37 )σ(v54 )σ(v71 )...σ(v17m−14 ), P105 = σ(v4 )σ(v21 )σ(v38 )σ(v55 )σ(v72 )...σ(v17m−13 ), P106 = σ(v5 )σ(v22 )σ(v39 )σ(v56 )σ(v73 )...σ(v17m−12 ), P107 = σ(v6 )σ(v23 )σ(v40 )σ(v57 )σ(v74 )...σ(v17m−11 ), P108 = σ(v7 )σ(v24 )σ(v41 )σ(v58 )σ(v75 )...σ(v17m−10 ), P109 = σ(v8 )σ(v25 )σ(v42 )σ(v59 )σ(v76 )...σ(v17m−9 ), P110 = σ(v9 )σ(v26 )σ(v43 )σ(v60 )σ(v77 )...σ(v17m−8 ),
Hamiltonian Fuzzy Cycles in Generalized Quartic Fuzzy Graphs
1199
P111 = σ(v10 )σ(v27 )σ(v44 )σ(v61 )σ(v78 )...σ(v17m−7 ), P112 = σ(v11 )σ(v28 )σ(v45 )σ(v62 )σ(v79 )...σ(v17m−6 ), P113 = σ(v12 )σ(v29 )σ(v46 )σ(v63 )σ(v80 )...σ(v17m−5 ), P114 = σ(v13 )σ(v30 )σ(v47 )σ(v64 )σ(v81 )...σ(v17m−4 ), P115 = σ(v14 )σ(v31 )σ(v48 )σ(v65 )σ(v82 )...σ(v17m−3 ), P116 = σ(v15 )σ(v32 )σ(v49 )σ(v66 )σ(v83 )...σ(v17m−2 ), P117 = σ(v16 )σ(v33 )σ(v50 )σ(v67 )σ(v84 )...σ(v17m−1 ), P118 = σ(v17 )σ(v34 )σ(v51 )σ(v68 )σ(v85 )...σ(v17m ), P10 = σ(v1 ) and v17m−16 , v17m−15 , v17m−14 , v17m−13 , v17m−12 , v17m−11 , v17m−10 , v17m−9 , v17m−8 , v17m−7 , v17m−6 , v17m−5 , v17m−4 , v17m−3 , v17m−2 , v17m−1 , v17m ≤ vn , m = 1, 2, 3,... Therefore, the max. no. of HFCs in GQFGs with girth 18 is two. This completes the proof. Theorem 2.10. For n ≥ 37, GQFG with girth 19 is decomposable into 1, if n is even and multiple of 39 , HFCs. 2, otherwise Proof. Let G be a GQFG on n(n ≥ 37) vertices with girth 19. Let v1 be an initial and a terminal vertex. By the definition of HFCs as following cases: Case 1: Assume that n is even and a multiple of 39, Cycle C1 : σ(v1 )σ(v2 )σ(v3 )...σ(vn−1 )σ(vn )σ(v1 ). Therefore, the max. no. of HFCs in GQFGs with girth 19 is one. Case 2: Assume that n is odd and not a multiple of 39, Cycle C1 : σ(v1 )σ(v2 )σ(v3 )...σ(vn−1 )σ(vn )σ(v1 ). Cycle C11 : P119 P120 P121 P122 P123 P124 P125 P126 P127 P128 P129 P130 P131 P132 P133 P134 P135 P136 P10 or P119 P136 P135 P134 P133 P132 P131 P130 P129 P128 P127 P126 P125 P124 P123 P122 P121 P120 P10 or P119 P132 P127 P122 P135 P130 P125 P120 P133 P128 P123 P136 P131 P126 P121 P134 P129 P124 P10 or P119 P130 P123 P134 P120 P131 P124 P135 P128 P121 P132 P125 P136 P129 P122 P133 P126 P121 P10 or P119 P124 P129 P134 P121 P126 P131 P136 P123 P128 P133 P120 P125 P130 P135 P122 P127 P132 P10 or P119 P126 P133 P122 P129 P136 P125 P132 P121 P128 P135 P126 P131 P120 P127 P134 P123 P130 P10 , where P119 = σ(v1 )σ(v19 )σ(v37 )σ(v55 )σ(v73 )...σ(v18m−17 ), P120 = σ(v2 )σ(v20 )σ(v38 )σ(v56 )σ(v74 )...σ(v18m−16 ), P121 = σ(v3 )σ(v21 )σ(v39 )σ(v57 )σ(v75 )...σ(v18m−15 ), P122 = σ(v4 )σ(v22 )σ(v40 )σ(v58 )σ(v76 )...σ(v18m−14 ), P123 = σ(v5 )σ(v23 )σ(v41 )σ(v59 )σ(v77 )...σ(v18m−13 ), P124 = σ(v6 )σ(v24 )σ(v42 )σ(v60 )σ(v78 )...σ(v18m−12 ), P125 = σ(v7 )σ(v25 )σ(v43 )σ(v61 )σ(v79 )...σ(v18m−11 ), P126 = σ(v8 )σ(v26 )σ(v44 )σ(v62 )σ(v80 )...σ(v18m−10 ), P127 = σ(v9 )σ(v27 )σ(v45 )σ(v63 )σ(v81 )...σ(v18m−9 ), P128 = σ(v10 )σ(v28 )σ(v46 )σ(v64 )σ(v82 )...σ(v18m−8 ), P129 = σ(v11 )σ(v29 )σ(v47 )σ(v65 )σ(v83 )...σ(v18m−7 ), P130 = σ(v12 )σ(v30 )σ(v48 )σ(v66 )σ(v84 )...σ(v18m−6 ), P131 = σ(v13 )σ(v31 )σ(v49 )σ(v67 )σ(v85 )...σ(v18m−5 ), P132 = σ(v14 )σ(v32 )σ(v50 )σ(v68 )σ(v86 )...σ(v18m−4 ),
1200
N. Jayalakshmi et al.
P133 = σ(v15 )σ(v33 )σ(v51 )σ(v69 )σ(v87 )...σ(v18m−3 ), P134 = σ(v16 )σ(v34 )σ(v52 )σ(v70 )σ(v88 )...σ(v18m−2 ), P135 = σ(v17 )σ(v35 )σ(v53 )σ(v71 )σ(v89 )...σ(v18m−1 ), P136 = σ(v18 )σ(v36 )σ(v54 )σ(v72 )σ(v90 )...σ(v18m ), P10 = σ(v1 ) and v18m−17 , v18m−16 , v18m−15 , v18m−14 , v18m−13 , v18m−12 , v18m−11 , v18m−10 , v18m−9 , v18m−8 , v18m−7 , v18m−6 , v18m−5 , v18m−4 , v18m−3 , v18m−2 , v18m−1 , v18m ≤ vn , m = 1, 2, 3,... Therefore, the max. no. of HFCs in GQFGs with girth 19 is two. This completes the proof. Theorem 2.11. For n ≥ 39, GQFG with girth 20 is decomposable into 1, if n is multiple of 19 , HFCs. 2, otherwise Proof. Let G be a GQFG on n(n ≥ 39) vertices with girth 20. Let v1 be an initial and a terminal vertex. By the definition of HFCs as following cases: Case 1: Assume that n is multiple of 19, Cycle C1 : σ(v1 )σ(v2 )σ(v3 )...σ(vn−1 )σ(vn )σ(v1 ). Therefore, the max. no. of HFCs in GQFGs with girth 20 is one. Case 2: Assume that n is not a multiple of 19, Cycle C1 : σ(v1 )σ(v2 )σ(v3 )...σ(vn−1 )σ(vn )σ(v1 ). Cycle C12 : P137 P138 P139 P140 P141 P142 P143 P144 P145 P146 P147 P148 P149 P150 P151 P152 P153 P154 P155 P10 or P137 P155 P154 P153 P152 P151 P150 P149 P148 P147 P146 P145 P144 P143 P142 P141 P140 P139 P138 P10 or P137 P151 P146 P141 P155 P150 P145 P140 P154 P149 P144 P139 P153 P148 P143 P138 P152 P147 P142 P10 or P137 P150 P144 P138 P151 P145 P139 P152 P146 P140 P153 P147 P141 P154 P148 P142 P155 P149 P143 P10 or P137 P153 P150 P147 P144 P141 P138 P154 P151 P148 P145 P142 P139 P155 P152 P149 P146 P143 P140 P10 or P137 P139 P141 P143 P145 P147 P149 P151 P153 P155 P138 P140 P142 P144 P146 P148 P150 P152 P154 P10 or P137 P149 P142 P144 P147 P140 P152 P145 P138 P150 P143 P155 P148 P141 P153 P146 P139 P151 P144 P10 , where P137 = σ(v1 )σ(v20 )σ(v39 )σ(v58 )σ(v77 )...σ(v19m−18 ), P138 = σ(v2 )σ(v21 )σ(v40 )σ(v59 )σ(v78 )...σ(v19m−17 ), P139 = σ(v3 )σ(v22 )σ(v41 )σ(v60 )σ(v79 )...σ(v19m−16 ), P140 = σ(v4 )σ(v23 )σ(v42 )σ(v61 )σ(v80 )...σ(v19m−15 ), P141 = σ(v5 )σ(v24 )σ(v43 )σ(v62 )σ(v81 )...σ(v19m−14 ), P142 = σ(v6 )σ(v25 )σ(v44 )σ(v63 )σ(v82 )...σ(v19m−13 ), P143 = σ(v7 )σ(v26 )σ(v45 )σ(v64 )σ(v83 )...σ(v19m−12 ), P144 = σ(v8 )σ(v27 )σ(v46 )σ(v65 )σ(v84 )...σ(v19m−11 ), P145 = σ(v9 )σ(v28 )σ(v47 )σ(v66 )σ(v85 )...σ(v19m−10 ), P146 = σ(v10 )σ(v29 )σ(v48 )σ(v67 )σ(v86 )...σ(v19m−9 ), P147 = σ(v11 )σ(v30 )σ(v49 )σ(v68 )σ(v87 )...σ(v19m−8 ), P148 = σ(v12 )σ(v31 )σ(v50 )σ(v69 )σ(v88 )...σ(v19m−7 ), P149 = σ(v13 )σ(v32 )σ(v51 )σ(v70 )σ(v89 )...σ(v19m−6 ), P150 = σ(v14 )σ(v33 )σ(v52 )σ(v71 )σ(v90 )...σ(v19m−5 ), P151 = σ(v15 )σ(v34 )σ(v53 )σ(v72 )σ(v91 )...σ(v19m−4 ), P152 = σ(v16 )σ(v35 )σ(v54 )σ(v73 )σ(v92 )...σ(v19m−3 ),
Hamiltonian Fuzzy Cycles in Generalized Quartic Fuzzy Graphs
1201
P153 = σ(v17 )σ(v36 )σ(v55 )σ(v74 )σ(v93 )...σ(v19m−2 ), P154 = σ(v18 )σ(v37 )σ(v56 )σ(v75 )σ(v94 )...σ(v19m−1 ), P155 = σ(v19 )σ(v38 )σ(v57 )σ(v76 )σ(v95 )...σ(v19m ), P10 = σ(v1 ) and v19m−18 , v19m−17 , v19m−16 , v19m−15 , v19m−14 , v19m−13 , v19m−12 , v19m−11 , v19m−10 , v19m−9 , v19m−8 , v19m−7 , v19m−6 , v19m−5 , v19m−4 , v19m−3 , v19m−2 , v19m−1 , v19m ≤ vn , m = 1, 2, 3,... Therefore, the max. no. of HFCs in GQFGs with girth 20 is two. This completes the proof. Now, we focus the main theorem, that is to find the HFCs in GQFG on n(n ≥ 6) vertices with girth k(k ≥ 3). Theorem 2.12. For n ≥ 6, GQFG with girth k, k = 3, 4, 5, ... and k = 10 is decomposable into 1, if n is even, multiple of k − 1 and 2k + 1 , HFCs. 2, otherwise Proof. Let G be a GQFG on n(n ≥ 6) vertices with girth k = 3, 4, 5, ... and k = 10. Let v1 be an initial and a terminal vertex. By the definition of HFCs as following cases: Case 1: Assume that n is even, multiple of k − 1 and 2k + 1, Cycle C1 : σ(v1 )σ(v2 )σ(v3 )...σ(vn−1 )σ(vn )σ(v1 ). Therefore, the max. no. of HFCs in GQFGs with girth k = 3, 4, 5, ... and k = 10 is one. Case 2: Assume that n is odd and not a multiple of k − 12 and 2k + 1, Cycle C1 : σ(v1 )σ(v2 )σ(v3 )...σ(vn−1 )σ(vn )σ(v1 ). Cycle C2 : {Pi : 1 ≥ i ≤ k − 1}σ(v1 ), where {Pi = σ(vi )σ(vk−1+i )σ(v2k−2+i )σ(v3k−3+i )... σ(v(k−1)m−(k−(i+1)) ) for all 1 ≤ i ≤ k − 1 and σ(vi )σ(vk−1+i )σ(v2k−2+i )σ(v3k−3+i )...σ(v(k−1)m−(k−(i+1)) ≤ vn , m ≥ 1, k ≥ 3. Therefore, the max. no. of HFCs in GQFGs with girth k = 3, 4, 5, ... and k = 10 is two. This completes the proof. Remark 2.13. For girth k = 10, the maximum number of Hamiltonian fuzzy cycle is 1, if n is a multiple of 3; otherwise the Hamiltonian fuzzy cycle is 2.
3
Conclusion
The paper [2, Theorem 3.1–3.7] is a survey about the result for HFCs in Quartic fuzzy graphs with girth k, k = 3, 4, 5, 6, 7, 8, 9. and present paper combining results of these Theorems 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 2.10 and 2.11, we have completely proved the main Theorem 2.12. From the discussion above we can conclude that the max. no. of HFCs in GQFG on n(n ≥ 6) vertices with girth k(k ≥ 3). This generalized result is applicable for network applications and graph theory.
1202
N. Jayalakshmi et al.
Compliance with Ethical Standards. All authors declare that there is no conflict of interest. No humans/animals involved in this research work. We have used our own data.
References 1. Bondy, J.A., Murty, U.S.R.: Graph Theory with Applications. North-Holland, New York (1980) 2. Shan, E., Kang, L.: The ferry cover problem on regular graphs and small-degree graphs. Chin. Ann. Math. Ser. B 39(6), 933–946 (2018) 3. Jayalakshmi, N., Vimal Kumar, S., Thangaraj, P.: Hamiltonian fuzzy cycles in quartic fuzzy graphs with girth K (K = 3, 4, 5, 6, 7, 8, 9). Int. J. Pharm. Technol. 8, 17619–17626 (2016) 4. Lyle, J.: A structural approach for independent domination of regular graphs. Graphs Comb. 1, 1–22 (2014) 5. Kohila, S., Vimal Kumar, S.: Fuzzy if-then rule in k-regular fuzzy graph, karpagam. Int. J. Appl. Math. 7, 19–23 (2016) 6. Mohanapriya, N., Vimal Kumar, S., Vernold Vivin, J., Venkatachalam, M.: Domination in 4-regular graphs with girth 3. Proc. Nat. Acad. Sci. India Sect. A 85, 259–264 (2015) 7. Mordeson, J.N., Nair, P.S.: Fuzzy Graphs and Fuzzy Hyper Graphs. Springer, Heidelberg (1998) 8. Nagoor Gani, A., Radha, K.: On regular fuzzy graphs. J. Phys. Sci. 12, 33–40 (2008) 9. Parimelazhagan, R., Sulochana, V., Vimal Kumar, S.: Secure domination in 4regular planar and non-planar graphs with girth 3. Utilitas Math. 108, 273–282 (2018) 10. Rosenfeld, A., Zadeh, L.A., Fu, K.S., Shimura, M.: Fuzzy Graphs, Fuzzy Sets and their Applications, pp. 77–95. Academic Press, New York (1975) 11. Sunitha, M.S., Mathew, S.: Fuzzy graph theory: a survey. Ann. Pure Appl. Math. 4, 92–110 (2013) 12. Vimal Kumar, S., Mohanapriya, N., Vernold Vivin, J., Venkatachalam, M.: On dynamic chromatic number of 4-regular graphs with girth 3 and 4. Asian J. Math. Comput. Res. 7, 345–353 (2016)
Effective Model Integration Algorithm for Improving Prediction Accuracy of Healthcare Ontology P. Monika1(&) and G. T. Raju2 1
VTU, Dayananda Sagar College of Engineering, Bengaluru, Karnataka, India [email protected] 2 VTU, RNS Institute of Technology, Bengaluru, Karnataka, India [email protected]
Abstract. In this digital era, the web is filled with enormous data in various formats & sizes resulting in decreased utilization of the data. Healthcare is one of the major domains that contributes data to web on daily basis in large numbers and numerous formats. Semantic web deals intelligently with its ontological data. The concept of Machine Learning coupled with statistical techniques for good prediction accuracy includes tree based algorithms like ID3, C4.5 etc. During the phase of resolving tree under-fitting & over-fitting issues which arises based on the dataset quality, couple of strong rules may be lost due to tree pruning which results in decreasing the prediction accuracy. An Effective Model Integration Algorithm (EMIA) has been proposed as a solution based on the concepts of Hidden Markov Model (HMM) and Stochastic Automata Model (SAM). The presented approach attempts to generate improved decision rules by integrating the rules of chosen models based on threshold in a Healthcare ontology resulting in increased prediction accuracy of 86%. Keywords: Semantic web Ontologies Ontology agents Ontologies integration Health care Diabetology Domains Decision support Decision tree SPARQL Rule based reasoning
1 Introduction Web in general intellect, is a huge collection of data in raw, semi-processed and processed format irrespective of several domains. Semantic web deals with facts and meaning associated within the web. Healthcare is one such domain which contributes towards huge volume of data generation on a daily basis. However the generated data is not promised to be successfully utilized when needed even within the same domain due to semantic limitations with respect to data representation and processing formats. Discovering knowledge from the vast semantic web tosses lot of challenges to be handled like various data patterns, missing data, uncertainty, different class labels etc. Tabular representation of data no more supports for automated knowledge discovery. The data represented in the graphical pattern unlocks lot of opportunities towards achieving machine automated decisions and better prediction accuracy. A way of representing web data: Ontology - formally defined by Gruber [24] as “Explicit © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 1203–1212, 2020. https://doi.org/10.1007/978-3-030-37218-7_126
1204
P. Monika and G. T. Raju
specification of a shared conceptualization” is the one to which today’s web is being transformed to. Operating on ontological graph data helps achieve automated decision making by constructing decision rules and applying those rules during querying the graphical representations like.rdf, .ttl, .owl, .trig etc. file formats using SPARQL Protocol And RDF Query Language (SPARQL) queries. Most common predictive modeling approach - Decision tree algorithms learn from the decision tree built during the process of training the model. Trained model works on the testing datasets predicting the results based on the rules, outputting the confusion matrix for accuracy scores. Researchers often prefer decision tree algorithms for data mining in a semantic web due to their various advantages like simplicity, ability to handle various types of data, statistical validation support, ability to handle large datasets and robustness against co-linearity over the limitations like NP – Complete, over fitting etc. An attempt is made here to produce improved decision rules by following the steps of proposed Effective Model Integration Algorithm [EMIA] and use the generated rules for constructing ontology & derive superior automated instances. Shooting SPARQL queries on so created ontology, demonstrates the improved relevance and accuracy of the search results benefitting medical researchers & practitioners. The proposed work has been experimented on dataset of Diabetology, a subdomain of Healthcare. The work include proposal of an EMIA for generating better decision rules and methodologies for creating ontology based on the generated enhanced rules for improved prediction accuracy. The rest of the paper is organized as follows: Sect. 2 describes the algorithms and techniques employed by other researchers towards efficient rules generation. Section 3 presents the proposed EMI Algorithm. Section 4 elaborates on Experimental Testing and results of the suggested methodology. Applying the enhanced rules onto Ontology and extracting the semantic information is explained in Sect. 5 followed by conclusion in Sect. 6.
2 Related Work During the phase of learning for knowledge extraction, extracted rules can be applied onto the ontological data. Allegro graph and Owlim [1, 2] are few of such rule based reasoners which are capable of inferring large scale instances of an ontology. Ontology based classification (OBC) is applied more on document classification [3, 4] as it is best suitable for representing contextual knowledge with an advantage of integrating heterogeneous data [5]. The Ontostar [6] ontology works by applying C4.5 learning algorithm on to the training dataset, by extracting the rules in the Semantic Web Rule Language (SWRL). A predictive analysis method for diabetes prediction using C4.5 algorithm [7] method builds a decision tree based on the training dataset and with the learnt experience predicts the future cases with the prediction accuracy of 76% approx. for the given test dataset [8]. Healthcare system working models respond by predicting risk of patients based on the features monitored regularly [9] supported by wrapper feature selection concept with bagging [10]. Random forest method shows promising
Effective Model Integration Algorithm for Improving Prediction Accuracy
1205
result of 80.84% improvement during diabetes prediction when all attributes were used during prediction [11]. Similarity measures like Euclidian distance used between the classification response vectors enhances the classification accuracy [12]. In medical practice, the concept of ontology is extensively used for semantic description and data interaction & integration to achieve automated knowledge discovery [13]. Overall, Ontologies can be summarized as knowledge representation tools in decision support systems [14]. At the other hand, for remote assessment of the patients’ health, the concept of Hidden Semi Markov Model has been employed [15] where the observations are compared relatively for the judgment. Prediction of diabetes and hypotension remotely using HMM coupled with neural networks and fuzzy sets have resulted in good accuracies [16, 17] concluding that the HMM works better with high prediction accuracies.
3 Effective Model Integration Algorithmic Approach Based on C4.5 Decision Tree Algorithm The decision tree models with target variable owning discrete set of values are called as classification trees. The most famous decision tree based classification technique extensively used today, C4.5 [18] is a supervised learning algorithm proposed by Quinlan which uses pessimistic method during decision rule generation. The algorithm begins with computation of measuring criteria like SplitInfo, Entropy, Gain, and Gain Ratio based on the given Training Dataset (T) & Attributes (A) followed with selection of the best attribute abest as candidate for root node of the decision tree. The process continues on the subsets of the training dataset induced based abest until all the attributes are processed or the stopping criteria is reached. Further at end, the algorithm outputs the constructed decision tree and learnt decision rules which will be used for further predictions with better accuracy. Lot many techniques like Bagging, Boosting, AdaBoost are noted in literature to enhance the accuracy of prediction of classification algorithms. Couple of machine learning algorithms itself can be hybrid to improve the prediction accuracy. With the related literature survey it’s understood that not much contributions are found towards directly improving the quality of model rules generated. The well-known decision tree algorithm: C4.5 shows less prediction accuracy in some cases due to loss of quality decision rules while fixing the issues of over-fitting and under-fitting of the decision trees. For modeling the stochastic sequences with an underling Finite Automata structure, Hidden Markov Model (HMM) shown in Fig. 1 is most extensively used. Generally, HMM can be viewed as Non-deterministic Finite Automata (NFA) SAM generalization [19] in which state reductions can be made as part of generalization or as Markovian finite state structure probabilistic generalization forming Markov chains depending on their immediate predecessors with hidden states during emissions.
1206
P. Monika and G. T. Raju
Fig. 3. Proposed EMIA models’ states
Fig. 1. Hidden Markov Model
The state variables of HMM depends only on their immediate predecessors thereby forming a Markov chain following greedy strategy. The famous state merging hypothesis [19] of Automata Theory is as follows: Hypothesis 1: “Two states can be integrated if they are in-distinguishable”. Considering states as models, to integrate any two models, initial model to be generated, an algorithm to be identified to integrate the models, resultant models to be evaluated followed with halting the process with generalization. Based on the theory of SAM & Markov chain perspectives of HMM and the fact that decision accuracy of C4.5 algorithm depends on the quality of decision rules generated with respect to the training dataset though the model prediction accuracy computed depends on the test dataset predictions, the effective model integration algorithmic approach represented in Fig. 2 has been proposed. Figure 3 depicts the generalized representation of the suggested algorithm with model status at various levels. Given, Pn
ai n sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Pn 2 i¼1 ðl ai Þ r¼ n l¼
i¼1
ð1Þ
ð2Þ
Threshold ¼ l þ kr
ð3Þ
q qþrþs
ð4Þ
Simðx; yÞ ¼
Where, m is the number of sets, n is the number of models in each set 8pi P, i < n (level-1), 8qi Q, i < n (level-2), 8si S, i < n (level p-2) 8ri R, i < n (level p-1), ai is the model accuracy of the ith model in that particular set, k N. l (1) is the mean and r (2) is standard deviation of the resultant values.
Effective Model Integration Algorithm for Improving Prediction Accuracy
1207
Fig. 2. Proposed Effective Model Integration Algorithm
The proposed idea starts with m sets with each set having n models equivalent to states of HMM in a SAM form. Instead of probability functions in HMM, the suggested approach considers the threshold values for integration of models if found similar proved upon computation of binary similarity scores between the models for models generalization. To begin with accuracies of all models in all the sets will be documented at level-1. The intra models get selected to further levels for integrating based on the following steps: 8a n in each model m, • At Level-1: calculate accuracy, apply threshold (3) to select the model a further. • From level-2 to level-p, using the Asymmetric Binary Similarity measure method, the concept of clustering is applied for determining the pair of rules to be integrated and the process continues till the last level to get final set of rules so as to attain better decision accuracy. Asymmetric Binary Similarity Measure (BSM) whose coefficients x and y are called as Jaccard coefficient is expressed as in (4) using the documented results of Table 1. The contingency table for binary attributes in Table 1 holds the quantitative values measuring the interrelation between them. Let x and y be 2 models for comparison. The Eq. (4) computes asymmetric binary
1208
P. Monika and G. T. Raju
similarity value between the rule sets of 2 different models. sim(x, y) = 0 implies no match between the models x and y. whereas Sim(x, y) nearest to value 1 indicates better match between the models. From the resultant matrix of models & respective similarity measures and observations of prediction similarity, the models crossing the threshold based on the similarity scores are integrated at each level within the respective models. Table 1. Binary attributes contingency matrix Object y Object x 1 1 q 0 s Sum q + s
0 r t r+t
Sum q+r s+t p
The process will be repeated recursively until no further combinations are possible. At the last level, inter model combinations will be attempted for integration resulting in the r models of level p.
4 Experimental Testing and Results The proposed approach has been experimented on Pima Indian Diabetic dataset of 768 records collected from UCI machine learning repository [20]. The dataset is about diabetic and non-diabetic readings about the women of age above 21 years. After preprocessing the data with Cluster based Missing Value Imputation Algorithm (CMVI) [21], the attributes Plasma glucose concentration, Diastolic blood pressure, Diabetic pedigree function and Age were finalized as most prominent attributes for accuracy enhancement of the prediction results using the proposed EMIA. As proof of concept, a sample computation steps are as follows: The input dataset has been split into 5 sets with varying sizes of 50, 100, 150, 200 and 250 randomly chosen entries. Each set consists of 10 different models. Initially at level 1, prediction accuracies of all the models based on C4.5 algorithmic procedure has been considered for selecting the models eligible for integration based on threshold. At level 2 & 3, the BSM has been computed among the selected models from previous level and all the models satisfying the threshold condition are integrated and fed as input to the next level. The stated procedures to be followed until no further integrations are possible at intra-set level. At level 4, which is last level in the present example, BSM is computed at inter-set level as previous level winded up intra-set combinations and the models of set 4 and set 5 from level 3 were satisfying the threshold condition. Hence integrated and selected as final enhanced model. Further inter set combination of these models couldn’t be done as the threshold rule was not satisfied.
Effective Model Integration Algorithm for Improving Prediction Accuracy
Precision ¼ Recall ¼ Accuracy ¼
TP TP þ FP
1209
ð5Þ
TP TP þ FN
ð6Þ
TP þ TN TP þ FP þ FN þ TN
ð7Þ
The so obtained final model was evaluated using Precision (5), Recall (6) and Accuracy (7) metrics. The observed precision, recall and accuracy scores of all the models for a sample dataset of size 150 are as recorded in Table 2. Similarly the observations were tabulated for the rest of the dataset sizes.
Table 2. Sample Precision, Recall and Accuracy scores of existing and proposed techniques Model no.
Dataset size = 150 Existing method Enhanced Precision Recall Accuracy (%) Precision 1 0.46 0.86 73.33 0.78 2 0.46 0.86 73.33 0.69 3 0.29 0.57 56.66 0.60 4 0.50 0.71 76.66 0.83 5 0.50 0.71 76.66 0.80 6 0.45 0.71 73.33 0.79 7 0.50 0.57 76.66 0.70 8 0.50 0.57 76.66 0.70 9 0.50 0.57 76.66 0.78 10 0.50 0.57 76.66 0.48 Average ! 0.47 0.67 73.66 0.72
method Recall 0.34 0.33 0.23 0.50 0.50 0.33 0.49 0.48 0.39 0.40 0.40
Accuracy (%) 80.68 82.88 78.04 83.85 83.85 85.68 84.33 84.69 85.09 76.92 82.60
By observing the mean scores of precision, recall and accuracy against various data sizes (Fig. 4), it’s evident that the proposed model precisions are higher, recalls are lower and accuracies are higher for all the cases compared to the existing approach. The prediction accuracy of the proposed EMIA enhanced model for the complete dataset was compared with the other standard algorithms as witnessed in Fig. 5 justifying that the accuracy of the proposed EMIA approach is better with the value of 86% compared to K-means (71%), CART (73%) and C4.5 (82%) algorithms.
1210
P. Monika and G. T. Raju
Fig. 4. Metrics comparison between existing and proposed methods
Fig. 5. Accuracy scores
5 Enhanced Rules and Ontology The enhanced rules obtained using EMIA can be utilized efficiently by following the sequence of methodology starting with enhanced rule generation, followed by building Ontology - by applying the rules defined in .pie file using tools like GraphDB ([22]) or using Cellfie tool available in protege [23] and querying the constructed knowledge graph using SPARQL queries for efficient knowledge retrieval. Observations conclude that the ontology built using the stated methodology is witnessed with the increased derived instances of up to 10% compared to the derived instances of well-known decision algorithms resulting in good prediction rate comparatively.
6 Conclusion The web today is termed as Semantic web as it operates on the semantics of graph data rather than the syntax being stored to achieve high prediction accuracy. İnspite of lot of technological advancement in the field of Machine Learning, Still the existing algorithms fail to achieve good prediction accuracy due to rule reductions during tree pruning. To overcome this drawback, an Effective Model Integration Algorithm (EMIA) has been proposed based on the concepts of Hidden Markov Model and Stochastic Automata Model (SAM). Depending on the accuracy scores and Binary Similarity Measures, threshold is set. The models from various sets with different dataset sizes undergo integration process by integrating their respective decision rules generated at both intra and inter set levels until the threshold condition is satisfied. The experimental results reveal that the proposed algorithmic technique achieves enhanced precision, recall and accuracy scores compared to existing techniques. The suggested model outperforms with the overall accuracy score of 86% for the present dataset. The observations conclude that the proposed approach performs better irrespective of domains and hence can be applied extensively to various datasets & domains to
Effective Model Integration Algorithm for Improving Prediction Accuracy
1211
enhance the prediction accuracy. Future work progresses on extending the work for enhancing prediction accuracy across ontologies upon interaction connecting the semantic data at schema level for efficient utilization of the available datasets.
References 1. Chang, W.W., Miller, B.: AllegroGraph RDF – triplestore evaluation. In: Technical report, Joint study with Adobe advanced technology Labs (2009) 2. Thakker, D.A., Gohil, S., Osman, T., Lakin, P.: A pragmatic approach to semantic repositories benchmarking. In: Proceedings of the 7th Extended Semantic Web Conference 2010, pp. 379–393 (2010) 3. Song, M., Lim, S., Kang, D., Le, S.: Ontology-based automatic classification of web documents. In: International Conference on Intelligent Computing Computational Intelligence, pp. 690–700. Springer (2006) 4. Zhang, X., Hu, B., Chen, J., Moore, P.: Ontology-based context modeling for emotion recognition in an intelligent web. World Wide Web 16(4), 497–513 (2013) 5. Gómez-Romero, J., Serrano, M.A., García, J., Molina, J.M., Rogova, G.: Context-based multi-level information fusion for harbor surveillance. J. Inf. Fusion 21, 173–186 (2014) 6. Liu, B., Yao, L., Han, D.: Harnessing ontology and machine learning for RSO classification. SpringerPlus 5, 1655 (2016) 7. Jain, R.: Rule generation using decision trees. In: Indian Agricultural Statistics Research Institute (IASRI) (2012) 8. Kalyankar, G.D., Poojara, S.R., Dharwadkar, N.V.: Predictive analysis of diabetic patient data using machine learning and hadoop. In: IEEE International Conference on IoT in Social, Mobile, Analytics and Cloud (I-SMAC) (2017) 9. Lakshmi, B.N., Indumathi, T.S., Nandini, R.: A study on C4.5 decision tree classification algorithm for risk predictions during pregnancy. Procedia Technol. 24, 1542–1549 (2016) 10. Lee, S.-J., Xu, Z., Li, T., Yang, Y.: A novel bagging C4.5 algorithm based on wrapper feature selection for supporting wise clinical decision making. J. Biomed. Inf. 88, 144–155 (2018) 11. Zou, Q., Qu, K., Luo, Y., Yin, D., Ju, Y., Tang, H.: Predicting diabetes mellitus with machine learning techniques. Frontiers Genet. 9 (2018). Article 515 12. Wang, H., Wang, J.: An effective ımage representation method using kernel classification. In: IEEE 26th International Conference on Tools with Artificial Intelligence (2014) 13. Gangemi, A., Pisanelli, D.M., Steve, G.: An overview of the ONIONS project: applying ontologies to the integration of medical terminologies. Data Knowl. Eng. 31, 183–220 (1999) 14. Zhang, X., Hu, B., Ma, X., Moore, P., Chen, J.: Ontology driven decision support for the diagnosis of mild cognitive impairment. J. Comput. Methods Programs Biomed. 113, 781– 791 (2014) 15. Capecci, M., Ceravolo, M.G., Ferracuti, F., Iarlori, S., Kyrki, V., Monteriù, A., Romeo, L., Verdini, F.: A hidden semi-markov model based approach for rehabilitation exercise assessment. J. Biomed. Inf. 78, 1–11 (2018) 16. Singh, A., Tamminedi, T., Yosiphon, G., Ganguli, A., Yadegar, J.: Hidden markov models for modeling blood pressure data to predict acute hypotension. In: IEEE International Conference on Acoustics, Speech and Signal Processing (2010)
1212
P. Monika and G. T. Raju
17. Gill, N.S., Mittal, P.: A Novel hybrid model for diabetic prediction using hidden markov model, fuzzy based rule approach and neural network. Indian J. Sci. Technol. 9(35), 192– 199 (2016) 18. Quinlan, J.R.: Improved use of continuous attributes in C4.5. ACM J. Artif. Intell. Res. 4(1), 77–90 (1996) 19. Hopcraft, J.E., Ullman, J.D.: Introduction to Automata Theory, Languages and Computation, Text book edition edn. Addison-Wesley, Boston (1979) 20. Dua, D., Graff, C.: UCI machine learning repository. University of California, School of Information and Computer Science, Irvine (2019). [http://archive.ics.uci.edu/ml] 21. Monika, P., Raju, G.T.: Data pre-processing and customized onto-graph construction for knowledge extraction in healthcare domain of semantic web. Int. J. Innovative Technol. Exploring Eng. (IJITEE) 8(11) (2019) 22. Guting, R.H.: GraphDB: modeling and querying graphs in databases. In: Proceedings of 20th International Conference on Very Large Databases, pp. 297–308 (1994) 23. Musen, M.A.: The protégé project: a look back and a look forward. AI Matters 1(4), 4–12 (2015). Association of Computing Machinery Specific Interest Group in Artificial Intelligence 24. Gruber, T.R.: A translation approach to portable ontology specifications. J. Knowl. Acquis. Curr. Issues Knowl. Model. 5(2), 199–220 (1993). https://doi.org/10.1006/knac.1993.1008
Detection of Leaf Disease Using Hybrid Feature Extraction Techniques and CNN Classifier Vidyashree Kanabur1(&), Sunil S. Harakannanavar1, Veena I. Purnikmath1, Pramod Hullole1, and Dattaprasad Torse2 1
S. G. Balekundri Institute of Technology, Belagavi, Karanataka, India [email protected], [email protected] 2 KLS Gogte Institute of Technology, Belagavi, Karanataka, India
Abstract. Identification of leaf disease using multiple descriptors is presented. Initially the images are resized to 256 256 to maintain the uniformity throughout the experiment. The Histogram Equalization (HE) technique is employed on resized leaf images to improve their quality. Segmentation is performed using k means clustering. The contour tracing technique is applied on leaf images to trace the boundary of affected areas. Prominent features of leaf image are extracted using DWT, PCA and GLCM techniques. The performance of the system is evaluated using three different classifiers viz., SVM, KNN and CNN using Matlab on bean leaf database. The performance of proposed model on CNN classifier is better when compared with other existing methodologies. Keywords: Acquisition Discrete wavelet transform Grey level co-occurrence matrix Support vector machine Convolutional neural network
1 Introduction The main objective of research in agriculture is to increase the productivity of food with reduced expenditure and enhanced quality of goods. The production of agriculture and its quality depends on soil, seed, and agro chemicals. The agricultural products (vegetables and fruits) are the essential for the survival of human beings. The agricultural productivity decreases when the crops are affected by various diseases. The diseases of plants are nothing but impairments that modifies the normal plants in their vital functions viz., photosynthesis, pollination, fertilization and germination. Due to the effect of adverse environmental conditions, diseases in plants are caused by various fungi, bacteria and viruses. If the disease is not detected at early stage then yield decreases. Figure 1 shows the sample of bacterial disease on plant. It reduces twenty to twenty five percent of its attribute production due to bacterial diseases on plant. In this paper, identification of leaf disease using multiple descriptors is presented. Initially the images are captured using physical devices and database is created containing healthy and unhealthy leaf. The HE is employed on resized images for improving the leaf quality. The significant features of leaf images are extracted using DWT, PCA and
© Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 1213–1220, 2020. https://doi.org/10.1007/978-3-030-37218-7_127
1214
V. Kanabur et al.
GLCM techniques. The classifiers such as SVM, CNN and KNN are employed to classify and recognize the leaf images.
Fig. 1. Bacterial disease on plant [6]
In paper, rest of the content is arranged as follows. Related work on leaf disease detection is discussed in Sect. 2. Section 3 discusses the proposed methodology for leaf disease detection. The experimentation results on proposed model are explained in Sect. 4. The conclusion of the proposed work is given in Sect. 5.
2 Related Work Here, the work related to existing methodologies on detection of leaf disease is discussed. Kaur et al. [1] detected the leaf disease in earlier stage of plant. The RGB image of leaf was converted into HSI image in the preprocessing approach to enhance the quality of image. The significant details of the image are extracted using k mean clustering approach. The features of the leaf image are classified using Neural Network, SVM, Fuzzy and K-nearest neighborclassifier. Sujatha et al. [2] adopted k means clustering algorithm for segmentation of leaf image which divides the leaf into different clusters. The diseases of different plant species have recorded using SVM classifier. Kumar et al. [3] applied color transformation approach to convert RGB to HSI image. Then the segmentation process is performed on converted image by masking of pixel and removes the unwanted color using specific threshold. The statistics from SGDM matrices evaluates various plant leaf diseases using neural network, K-nearest neighbors and fuzzy logic classifiers. Khairnar et al. [4] considered input RGB color image of leaf and converted into other color spaces such as HSI and CIELAB. The segmentation of leaf image was performed by k-mean clustering algorithm. After segmentation of leaf image, infected regions are described by extracting features. The features produced based on color, texture and shape are normally used for description of region. The classification is performed using minimum distance criterion and support vector machine. Batule et al. [5] applied the conversion of RGB into HSI color space image. In addition to this, it also converts the leaf image from RGB to HSV format. The model includes image color transform, image smoothing and disease spot segmentation. The features of the leaf image are classified using Neural Network, SVM. Kamlapurkar et al. [6] adopted Gabor filter to segment the leaf image and improve its
Detection of Leaf Disease Using Hybrid Feature Extraction Techniques
1215
quality. The features obtained are matched using ANN. Chaware et al. [7] adopted the various classifiers to match the features using SVM, KNN, ANN, and Fuzzy Logic. Rathod et al. [8] converted the leaf image from RGB to CIELAB color components. The features are extracted using K-Medoids algorithm which provides the color and shape descriptions. The produced features are classified using neural Network classifier. Varshney et al. [9] converts the color leaf image from RGB model into HSI model. The color co-occurrence method (CCM) was employed on segmented image where significant details of leaf image are extracted. Finally the obtained features are classified using neural network. Baskaran et al. [10] developed a model to calculate the level of RGB content in an image. The features are extracted using K-mean clustering approach and the features extracted are matched using SVM. Mainkar et al. [11] adopted a combination of GLCM and K-means clustering approach to fetch the useful information of leaf image. Features extracted from GLCM are matched using neural network and radial basis functions. Al-Hiary et al. [12] converted the color leaf image from RGB into HSI format. The features are extracted using K-means clustering and co-occurrence approach for texture analysis of leaf image. The NN is employed to classify the leaf image. Megha et al. [13] applied a method to convert the RGB into Grayscale model for resizing and image filtration. The primary details of the leaf are extracted using K-means and FCM clustering approaches. Further other features of leaf image (color, shape and texture) are extracted and matched using SVM. Sanjana et al. [14] adopted HE, AHE and CLHE to improve its quality. The median filter Gaussian filter and speckle reducing anisotropic diffusion approaches are applied on leaf image to fetch useful features of leaf image. Features extracted are matched using SVM and ED classifiers. Jagtap et al. [15] identified a model for conversion of defected image to HSI color space. Histogram of the intensity image classification was carried out using artificial neural network.
3 Proposed Model In this section, identification of leaf disease using multiple descriptors is presented. Initially the images are resized to 256 256 to maintain the uniformity throughout the experiment. The HE is employed on resized leaf images to improve their quality. Segmentation is performed using k means clustering which results in a partitioning of the data space into cells. The contour tracing technique is employed on images to extract the boundary of leaf images. The multiple descriptors such as DWT, PCA and GLCM are applied on enhanced leaf images to extract useful details of leaf images. Finally the features generated are classified using SVM, CNN and KNN. Figure 2 shows the proposed model. The main objective of the model is to • Detect the leaf disease • Increase accuracy of model • Decrease the error Preprocessing includes cropping, resizing, image enhancement and segmentation. In this paper, HE is employed on database leaf image to improve the quality as shown
1216
V. Kanabur et al.
Fig. 2. Proposed leaf disease detection model
in Fig. 3. Segmentation of images is carried out using K means clustering algorithm as shown in Fig. 4. The boundaries of the leaf images are obtained using contour tracing shown in Fig. 5.
Fig. 3. Histogram equalization
Fig. 4. K-mean clustering
Detection of Leaf Disease Using Hybrid Feature Extraction Techniques
Fig. 5. Contour tracing
1217
Fig. 6. DWT decomposition
The leaf images are decomposed into 4 sub bands viz., LL, LH, HL and HH using DWT approach as shown in Fig. 6. The LL component of DWT has the significant information and hence other sub bands are eliminated. The reduction of the features obtained from wavelet decomposition is carried out using PCA. Further, GLCM is used to derive homogeneity, autocorrelation, dissimilarity, entropy and other several properties. The higher-order distribution of gray values of pixel is exploited using specific distance or neighborhood criterion and each pixel value is normalized. In order to classify the detected disease, classifiers viz., SVM, KNN and CNN are used in the proposed model. The performance of model is evaluated for CNN classifier as shown in Fig. 7 and it is found that as the number of training iterations increases, the detection accuracy increases.
Fig. 7. Performance of CNN classifier
1218
V. Kanabur et al.
4 Performance Evaluation The performance of leaf detection model based on the detection of leaf disease is described here. The database was created using bean leaf. Using this database, experimentations are performed on MATLAB to measure the accuracy of model. Figure 8 shows the approach employed for accurate detection of the diseased leaf. The performance of proposed model can be understood using similarity score. In the similarity score, if distance is more then, it will give high percentage of non-similar value and if distance is less then it will shows the low percentage of non-similar values. The similarity score value is depends on the distance or length of the feature database. In Fig. 9, the shows the variations of FAR values with increase in similarity score. FAR gives the percentage of the false acceptance rate of the database. FAR values are depends on the database and neural networks. In neural network if more number of iterations is considered then, FAR decreases for such database. In case of FAR and similarity score graphs, the score of 4000, the value of FAR is 1. It is seen that as the similarity score increases, the value of FAR increases and approaches unity value.. If the distance value of the similarity score decreases then FAR value also reduces and meets to zero.
Fig. 8. Overall output of CNN
Detection of Leaf Disease Using Hybrid Feature Extraction Techniques
1219
Fig. 9. FAR vs similarity score
The results obtained using different classifiers such as SVM, KNN and CNN for detection of leaf disease is shown in Fig. 10.
Fig. 10. Comparison of SVM, KNN and CNN
5 Conclusion In this paper, leaf images are cropped and resized to maintain the uniform dimensions. HE technique is adopted to increase the image quality. Segmentation is performed using k means clustering. The contour tracing technique is used to extract the boundary of leaf images. The multiple descriptors DWT, PCA and GLCM are used to extract useful coefficients of leaf. Lastly, the extracted coefficients are classified using SVM,
1220
V. Kanabur et al.
KNN and CNN. The performance of the proposed method shows satisfactory accuracy of 99.09% on CNN classifier. In future, a mobile application can be developed and the specific solution should send to the farmer for the disease through short message service or email. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest. ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Kaur, R., Kaur, M.: A brief review on plant disease detection using image processing. Int. J. Comput. Sci. Mob. Comput. 6(2), 101–106 (2017) 2. Sujatha, R., Kumar, S., Akhil, G.: Leaf disease detection using image processing. J. Chem. Pharm. Sci. 10(1), 670–672 (2017) 3. Kumar, S., Kaur, R.: Plant disease detection using image processing-a review. Int. J. Comput. Appl. 124(16), 6–9 (2015) 4. Khairnar, K., Dagade, R.: Disease detection and diagnosis on plant using image processing-a review. Int. J. Comput. Appl. 108(13), 36–38 (2014) 5. Batule, V., Chavan, G., Sanap, V., Wadkar, K.: Leaf disease detection using image processing and support vector machine. J. Res. 2(2), 74–77 (2016) 6. Kamlapurkar, S.: Detection of plant leaf disease using image processing approach. Int. J. Sci. Res. Publ. 6(2), 73–76 (2016) 7. Chaware, R., Karpe, R., Pakhale, P., Desai, S.: Detection and recognition of leaf disease using image processing. Int. J. Eng. Sci. Comput. 7(5), 11964–11967 (2017) 8. Rathod, A., Tanawala, B., Shah, V.: Leaf disease detection using image processing and neural network. Int. J. Adv. Eng. Res. Dev. 1(6), 1–10 (2014) 9. Varshney, S., Dalal, T.: Plant disease prediction using image processing techniques-a review. Int. J. Comput. Sci. Mob. Comput. 5(5), 394–398 (2016) 10. Baskaran, S., Sampath, P., Sarathkumar, P., Sivashankar, S., Vasanth Kumar, K.: Advances in image processing for detection of plant disease. SIJ Trans. Comput. Sci. Eng. Appl. 5(3), 8–10 (2017) 11. Mainkar, P., Ghorpade, S., Adawadkar, M.: Plant leaf disease detection and classification using image processing techniques. Int. J. Innov. Emerg. Res. Eng. 2(4), 139–144 (2015) 12. Al-Hiary, H., Bani-Ahmad, S., Reyalat, M., Braik, M., AlRahamneh, Z.: Fast and accurate detection and classification of plant diseases. Int. J. Comput. Appl. 17(1), 31–38 (2011) 13. Megha, S., Niveditha, C.R., Soumyashree, N., Vidhya, K.: Image processing system for plant disease identification by using FCM-clustering technique. Int. J. Adv. Res. Ideas Innov. Technol. 3(2), 445–449 (2017) 14. Sanjana, Y., Sivasamy, A., Jayanth, S.: Plant disease detection using image processing techniques. Int. J. Innov. Res. Sci. Eng. Technol. 4(6), 295–301 (2015) 15. Jagtap, S., Hambarde, S.: Agricultural plant leaf disease detection and diagnosis using image processing based on morphological feature extraction. IOSR J. VLSI Sig. Process. 4(5), 24– 30 (2016)
Fish Detection and Classification Using Convolutional Neural Networks B. S. Rekha1(&), G. N. Srinivasan2, Sravan Kumar Reddy3, Divyanshu Kakwani3, and Niraj Bhattad3 1
Bharathiar University, Coimbatore, Tamil Nadu, India [email protected] 2 Department of Information Science and Engineering, R.V. College of Engineering, Bengaluru, India [email protected] 3 R.V. College of Engineering, Bengaluru, India [email protected], [email protected]
Abstract. About fifty percent of the world relies on seafood as main protein source. Due to this, the illegal and uncultured fishery activities are proving to be a threat to marine life. The paper discusses a novel technique that automatically detects and classifies various species of fishes such as dolphins, sharks etc. to help and protect endangered species. Images captured through boat-cameras have various hindrances such as fluctuating degrees of luminous intensity and opacity. The system implemented, aims at helping investigators and nature conservationists, to analyze images of fishes captured by boat-cameras, detect and classify them into species of fishes based on their features. The system adapts to the variations of illumination, brightness etc. for the detection process. The system incorporates a three phase methodology. The first phase is augmentation. This phase involves using data augmentation techniques on real time images dataset captured by boat-cameras and is passed to the detection module. The second phase is the detection. This phase involves detecting fishes in the image by searching for regions in the image having high probability of fish containment. The third phase is the classification of the detected fish into its species. This step involves the segmented image of fishes to be passed to the classifier model which specifies to which species the detected fish belongs to. CNN (Convolutional neural network) is used at the detection and classification phase, with different architectures, to extract and analyze features. The system provides confidence quotients on each image, expressed on a 0–1 scale, indicating the likelihood of the image belonging to each of the following eight categories ALB, BET, YFT, LAG, DOL, Shark, Other and None. The system provides detection and classification with an accuracy of 90% and 92% respectively. Keywords: Object detection Classification learning Convolutional neural network
Computer vision Deep
© Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 1221–1231, 2020. https://doi.org/10.1007/978-3-030-37218-7_128
1222
B. S. Rekha et al.
1 Introduction In order to preserve marine ecosystem, real-time inspection of fisheries is necessary. In coastal areas, where most of the fishes are caught, uncultured practices are proving to be detrimental to marine ecosystem. For the conservation of fish species, the study of size and diversity of each of such species is vital. There has been a lack of automatic methods to perform real-time detection and classification of captured fishes. Traditionally, this task has been achieved by doing manual inspection. This requires having an expert on-board who determines fish species at the time of capture. Other methods involve broadcasting the boat snapshots to remote experts who determine the fish species that the captured fish belongs to. Each of these methods has their drawbacks. First, having a fish inspector at every boat is not feasible. Second, it is difficult to find experts in the field of ichthyology. Third, transmitting boat snapshots to remote experts is vulnerable to failures. With the nature of human beings turning out to be destructive towards marine eco-system, the conservation of fishes becomes an important aspect. Thus a system which analyses the images of fishes captured by boat cameras and automatically detects and classifies them into their type of species acts as a key to solve such a problem. The current system receives real time input images from a camera installed aboard boats. Its task is to detect and classify the fishes in these images into one of the following 8 categories ALB, BET, YFT, LAG, DOL, Shark, Others and no fish. The images in the available dataset are shot with varying imaging conditions. These images are highly distorted, cluttered, and in several of them, the fishes are partly occluded. This makes the task of detection and classification challenging. The current state-of-the-art algorithms using the traditional image processing techniques do not perform very well on such images and also has proved to be very specific with the type of environment.
2 Related Work The system has to be independent of the environment and reliable in the case of noise. This can be achieved by using Deep Learning techniques which involve Convolutional Neural Network (CNN) in the detection and classification tasks. This technique was introduced first by Alex as ALEXNET in the ImageNet competition which gave remarkable results compared to the previous approaches [4]. Similar to all the machine learning algorithms, the method had a training phase and a testing phase. In training phase, the pipeline consists of three parts, data augmentation, convolution network and a fully connect layer at the end combined with a softmax activation. Each part of this design has a specific function. The main function of convolutional layers is to extract features from the image. Each convolutional layer has certain number of filters. These filters are weighted kernels that are passed through out the image to generate corresponding feature maps. These feature maps are passed on to the fully connected network. The fully connected layer weights the features in the final feature maps generated and gets a final probability distribution over all the possible classes. Several hyper parameters are used to tune the network model which include error correction method, learning rate, number of filters in each convolutional layer, number of convolutional
Fish Detection and Classification Using Convolutional Neural Networks
1223
layers, input image size etc. Using deeper networks results in better accuracy can be achieved when combined with the techniques like dropout and skip networks which are introduced in [5, 6]. A dropout is the technique where a fraction of connections in the fully connected layers are desperately removed in a random order. A clear explanation is given by the Alex team in [6]. Overfitting is the condition where the grading moves from least error point towards high error rate. This generally happens due to over training a network with same dataset. A simple indication for this is the training accuracy keeps increasing whereas the validation accuracy shows a gradual decrease. The Skips nets also uses a similar approach where few convolutional layers will be desperately skipped and the feature maps are passed on to the further layers. The main idea behind this is to compute a feature map based on two feature maps, one from the direct parent and another from the ancestor. This experimentally showed that the approach reduces overfitting of the network. Using an Ensemble of different models is another technique which gives an edge over using a single model prediction is another popular technique which is used in most of the modern deep learning problems [4, 6– 8]. Different approaches like using parallel convolutional neural networks are used by the Google research team [7] which showed improved results but uses a very complex architecture. Since the success of AlexNet [4], several detection approaches based on CNNs have been proposed. One such successful technique combined region proposals with CNNs, giving rise to R-CNN. The initial version of R-CNN is described in [9]. Several variants of the R-CNN in [9] have been proposed since then. The most successful among them are: Fast R-CNN [10], Faster R-CNN [11], DeepBox [12], R-FCN [13]. The R-CNN described in [9] first generates candidate regions using Selective Search [14], and for each region, generates a feature vector using a CNN, and finally uses an SVM per each category to assign category scores to each region proposal. R-CNNs were later improved by changing the region proposal generation algorithm, for instance Edgeboxes [15] is used in [16]. For a comparison of region proposal algorithm, see [17]. DeepBox [12] refined the region proposal algorithm by adding a shallow CNN network to filter the candidates. A better version of R-CNN, called Fast R-CNN, was introduced in [10]. It reduced the time by training in single-stage and by sharing the computation of feature vector corresponding to each region proposal. The techniques based on region proposal generation, however, fail in case of cluttered images. In such cases, it is difficult to extract meaningful regions out of an image, mainly because of high-object density and occlusion rates.
3 System Architecture The proposed system consists of three phase – Data augmentation, detection and classification. As the initial dataset is small and highly imbalanced, the images were augmented to produce a sizeable balanced dataset. Various augmentation techniques were used to increase the size of the data set. In the detection stage, the input image is segmented to obtain image patches that contain fish. The obtained patches are passed to the classifier. The classifier produces a probability distribution of the fish classes of the detected fish (Fig. 1).
1224
B. S. Rekha et al.
Fig. 1. System architecture.
3.1
Augmentation
Augmentation phase is used to enhance the training dataset in terms of both quality and quantity. Simple affine transforms and preprocessing techniques are used to achieve that. 3.2
Detection
The fishes present in the input image are detected in this phase. Different techniques are used to find an optimal approach to extract the fishes from the image. Initially, localization technique was applied in which, the input image is fed to a localization network which regresses coordinates for the area within which fish is present. The limitation of regression bounding is that it cannot detect multiple fishes in a single image. As the dataset consists of multiple fishes in a single image, a better method to find more than one fish was required, hence a custom detection algorithm that best matches the dataset is designed and implemented (Fig. 2).
Fig. 2. Output of the localization algorithm [6]
Fish Detection and Classification Using Convolutional Neural Networks
1225
The method used is top down image search, where the fishes are detected in two levels. The initial level detection is a loose bound of the fish and the latter is a tight bound. At each level different networks are used, whose input is a patch which will be classified into fish or no-fish. The patches are extracted from the raw image in the dataset with different window sizes that are selected in two ways. One is applying k-means clustering on the set of window shapes containing the fishes in the annotated dataset. The other is to select a range of windows with a constant interval. In the proposed system architecture, the latter method is chosen to generalize the system. These obtained window sizes are used to extract the patches for detection phase (Fig. 3).
Fig. 3. Detection algorithm flow diagram.
The detection pipeline consists of a large patch extractor, binary classifier, small patch extractor followed by another binary classifier. The initial large patch extractor extracts large square patches ranging from a scale of 300 300 to 600 600 with an interval of 50. These patches are classified with a binary classifier that is trained with similar image patches manually extracted from the dataset. The binary classifier also outputs the probability of each image patch containing a fish. Based on the classifier output, top 10 high probable image patches are selected and forwarded to next level segmentation. Smaller image patch extractor extracts small patches ranging from a scale of 100 100 to 300 300 with an interval of 50. These patches are classified with a binary classifier that is trained with similar fish patches manually extracted from the dataset. Moreover, computation time depends on model and image crop size, but precision is also affected; usually, time and precision have trade-off relation (Fig. 4).
Fig. 4. Architecture of the detection network
1226
B. S. Rekha et al.
The detection phase uses the top down image search technique with sliding window approach in order to initially detect the presence of the fish or no-fish. Figure 5 shows the presence of a fish.
Fig. 5. Output of the detection algorithm.
3.3
Classification
All the input patches obtained in the detection phase are resized appropriately passed to the classifier network. For each input patch, the classifier returns a probability distribution of fish classes. The final probability distribution is computed as follows:
P(x) = max (pi (x)), if x != ’NoF’ ð1Þ
avg (pi (x)), otherwise The Classification network consisted of several convolutional layers followed by a fully connected (FC) layer. VGG-16 network is used for classifier with the input size of (224, 224, 3). This network was chosen because pre-trained VGG networks are available and since our dataset is small, a pre-trained model is indispensable (Fig. 6).
Fig. 6. Architecture of the classification network.
Fish Detection and Classification Using Convolutional Neural Networks
1227
Once the images pass thru the detection phase are passed thru the classification phase. In the classification phase, if the fish is detected the classifier classifies the detected fish into one of the eight categories. The class with the highest probability is conjectured to be the class of the fish contained in the image. Figure 7 shows the classification of a fish detected in the input image to be of the category BET with a high probability of 1.0. Figure 8 shows that there is No-Fish on the board with a high probability of 1.0.
Fig. 7. Output of the classification algorithm identifying BET fish
Fig. 8. Output of the classification algorithm identifying No fish [7]
1228
B. S. Rekha et al.
4 Dataset The original dataset is provided by National Conservancy Organization, who in turn extracted it from camera feeds obtained from boat cameras. The dataset contains a total of 3777 images divided into 8 categories, namely ALB, BET, YFT, SHARK, LAG, DOL, NoF, Other. Since the dataset is relatively small, and contains plenty of variances in imaging conditions, the task of classification is challenging. Following are some of the challenges faced during processing such images: • Different boat environments: Different environments contain different camera position and orientation. • Different capture times: Some images are taken in day-light, while some are taken in night-time. • Distortion: Many images are distorted due to the general hustle typically present in fisheries. • Occlusion: Several images contain fishes occluded by other objects or fishermen. Prior to training the networks, the available dataset is heavily augmented with rotation and Gaussian blur. The initial dataset contained 3777 images, and augmentation process resulted in 16000 images. The augmented dataset is fully balanced across all the categories. This dataset is split into training and validation in the ratio of 8:2. Data augmentation increases the data set size which provides better training and also helps in overcoming the neural net getting over fit.
5 Training In training phase, the pipeline consists of three parts, data augmentation, convolution network and a fully connect layer at the end combined with a softmax activation. Each part of this design has a specific function. The main function of convolutional layers is to extract features from the image. Each convolutional layer has certain number of filters. These filters are weighted kernels that are passed through out the image to generate corresponding feature maps. These feature maps are passed on to the fully connected network. The fully connected layer weights the features in the final feature maps generated and gets a final probability distribution over all the possible classes. Several hyper parameters are used to tune the network model which include error correction method, learning rate, number of filters in each convolutional layer, number of convolutional layers, input image size etc. The detection and classification networks are trained with the following parameters: • • • •
Optimizer: Adams Learning Rate: 1e−3 Objective: Categorical Cross-entropy Epochs: 40
Fish Detection and Classification Using Convolutional Neural Networks
1229
The classifier uses a VGG-16 network pre-trained on the ImageNet dataset, and fine-tuned on tight patches of fishes extracted from our training data. 2000 patches are extracted for each category from the training dataset. For categories that do not have enough samples, data augmentation techniques are used.
6 Results Initially a single network was used that performed the classification job. The Training Accuracy was 95%, and the Validation Accuracy was poor. When this initial model was tested it was found that there was a dependency on the environment which made us to introduce a detection phase before classification. The initial approach for detection was localization of the fish in the image. This approach failed in the case where there are multiple fishes in a single image as the localization task was meant to find only one object of interest in an image. This made us to evolve a new method of detecting all the fishes in an image using an image search algorithm. With this final design of network architectures and training parameters, Following are the results obtained: Detection Training Accuracy: 94% Validation Accuracy: 90% Training Time: About 6 h Test Time: 5 s Classification Training Accuracy: 96% Validation Accuracy: 92% Training Time: About 3 h Test time: 50 ms
7 Conclusion The usage of CNNs in the detection and classification produced remarkable results. It gave significantly better results than the methods based on manual feature extraction and the traditional image processing techniques. The final networks used are derived from the VGGNet. The CNNs used in the detection module and the classification module achieved a validation accuracy of about 90% and 92% respectively. While the accuracy of the networks has been remarkable, the whole process is a little slow. The detection module takes about 5 s to process the image, while the classification module is almost instantaneous, taking less than a second. The training of the networks has been done on limited datasets. The usage of better aggregation tools could produce much better results in production. The future scope incudes working on R-CNN for better results in the detection and classification phases. The project could be extended to detecting and classification of other species of fishes also.
1230
B. S. Rekha et al.
Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest. ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Lowe, D.G.: Distinctive image features from scale-invariant key-points. IJCV 60, 91–110 (2004) 2. Clara Shanthi, G., Saravanan, E.: Background subtraction techniques: systematic evaluation and comparative analysis. Int. J. Mod. Eng. Res. (IJMER) 3, 514–517 (2013) 3. Jones, V.: Rapid object detection using a boosted cascade of simple features. In: Computer Vision and Pattern Recognition (2001) 4. Krizhevsky, A., et al.: ImageNet classification with deep convolutional neural networks. In: NIPS (2012) 5. He, K., et al.: Deep residual learning for image recognition. ISLR (2015) 6. Alex, et al.: Dropout: a simple way to avoid overfitting in the network. JMLR (2014) 7. Szegedy, C., Liu, W., et al.: Going deeper with convolutions. In: CVPR. Google Research (2015) 8. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large scale image recognition. In: ICLR. Visual Geometry Group, Department of Engineering Science, University of Oxford (2015) 9. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580–587 (2014) 10. Girshick, R.: Fast R-CNN. In: The IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015) 11. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems 28, NIPS (2015) 12. Kuo, W., Hariharan, B., Malik, J.: DeepBox: learning objectness with convolutional networks. In: The IEEE International Conference on Computer Vision (ICCV), pp. 2479– 2487 (2015) 13. Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems 29, NIPS 2016 (2016) 14. Uijlings, J.R.R., van de Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013) 15. Lawrence Zitnick, C., Dollr, P.: Edge boxes: locating object proposals from edges. In: European Conference on Computer Vision, ECCV 2014, pp. 391–405 (2014) 16. Tang, S., Yuan, Y.: Object detection based on convolutional neural network, Stanford Project report (2016) 17. Hosang, J., Benenson, R.: How good are detection proposals, really? Computer Vision and Pattern Recognition, arXiv (2014) 18. Gidaris, S., Komodakis, N.: LocNet: improving localization accuracy for object detection, arXiv (2016) 19. Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: OverFeat: integrated recognition, localization and detection using convolutional networks, arXiv (2014)
Fish Detection and Classification Using Convolutional Neural Networks
1231
20. Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks, arXiv (2016) 21. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR (2015) 22. Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. In: ICLR (2015) 23. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014) 24. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift, arXiv (2015) 25. Liu, W., Anguelov, D., Erhan, D., Szegedy, C.: SSD: Single Shot MultiBox Detector, arXiv (2016) 26. Mnih, V., Heess, N., Graves, A., Kavukcuoglu, K.: Recurrent models of visual attention. In: ICLR. Google DeepMind (2016) 27. Liang, M., Hu, X.: Recurrent convolutional neural network for object recognition. In: CVPR (2015)
Feature Selection and Ensemble Entropy Attribute Weighted Deep Neural Network (EEAw-DNN) for Chronic Kidney Disease (CKD) Prediction S. Belina V. J. Sara(&) and K. Kalaiselvi Department of Computing Science, School of Computing Science, Vels Institute of Science, Technology and Advanced Studies (VISTAS), (Formerly Vels University), Chennai, India [email protected], [email protected]
Abstract. Initial prediction and appropriate medication are the ways to cure Chronic Kidney Disease (CKD) in the early stage of progression. The rate of accuracy in the classification algorithms focuses on the usage of exact algorithms used to select he features in order to minimize the dataset dimensions. The accuracy not only relies on the feature selection algorithms but also on the methods of classification, where it predicts the severities that are useful for the medical experts in the field of clinical diagnosis. To minimize the time for computation and to maximize the classifiers accuracy level, the proposed study, Ensemble Entropy Attribute Weighted Deep Neural Network (EEAw-DNN) classification was aided to predict Chronic Kidney Disease. The rate of accuracy of the EEAw-DNN is surveyed with the help of feature selection using data reduction. Hence Hybrid Filter Wrapper Embedded (HFWE) based Feature Selection (FS) is formulated to choose the optimal subset of features from CKD set of data. This HFWE-FS technique fuses algorithm with filter, wrapper and embedded algorithm. At last, EEAw-DNNbased algorithm used for prediction is used to diagnose CKD. The database used for the study is “CKD” which is implemented using MATLAB platform. The outputs prove that the EEAwDNNclassifier combined with HFWE algorithm renders greater level of prediction when correlated to other few classification algorithms like Naïve Bayes (NB), Artificial Neural Network (ANN) and Support Vector Machine (SVM) in the prediction of severity of CKD. Datasets were taken from University of California Irvine (UCI) machine learning repository. Keywords: Chronic Kidney Disease (CKD) Classification Ensemble Entropy Attribute Weighted Deep Neural Network (EEAw-DNN) Feature selection (FS) Hybrid Filter Wrapper Embedded (HFWE) University of California Irvine (UCI)
© Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 1232–1247, 2020. https://doi.org/10.1007/978-3-030-37218-7_129
Feature Selection and (EEAw-DNN) for (CKD) Prediction
1233
1 Introduction Chronic kidney disease is a popular disease causing threatening to the public with the maximum occurrence, existence and very expensive treatment. Around 2.5–11.2% of the population in adults across the countries like Europe, Asia, North America, and Australia are likely to be affected with chronic kidney disease [1], In USA alone 27 million individuals are affected with CKD [2]. To attain an innovative knowledge among the huge amount of data, examining the CKD victims, there is a advantage of using data mining techniques [4]. Data mining techniques focuses on machine learning, database management and statistics have been used in the proposed work [5, 6]. The advantage of the usage of the electronic devices is also emphasized and also the software solutions for victims data banks in medical industry, large size of data are extracted every day [7, 8]. At present, there are no broadly accepted diagnostic tools or instruments for the progression of CKD. Hence the medical experts should take ideal decisions about the patients’ treatment, delaying risks in the treatments and the patients’ progress towards the renal failure and the patients who do not require higher clinical concentration. The level of CKD has been recommended in order to help in treatment focussed actions [9]. Decision making in the diagnosis of the medical industry is a challenging task because of the heterogeneity of kidney diseases, variability in stages of disease progression, and the cardiovascular RISK COVERAGE [10]. The risk is accurately predicted that could help individualized decision making process, enhancing initial and adequate care in patients [11]. Hence for the effective clinical diagnosis, the data mining techniques provides appropriate information from large medical databases which are collected often [12]. Major studies are carried out on the medical databases that are related to the prediction of the cancer disease too. Many methods of classification are carried out on disease prediction and the diagnosis of the medical experts. These classification techniques can reduce the prediction error that exists by the under experienced experts and the outputs can be extracted in no time [12]. But this is not an individual algorithm aided in machine learning criteria and the models for decision making to predict all the varieties of diseases. The high dimensional sets of data acquires a accuracy rate of classification which can be minimal and can also be over-fitting the risks, the computational efforts may be greater and expensive. Usually, the datasets of low dimensions can cause higher level of classification accuracy with the minimal cost for computation and overfitting risk [13]. This formulated proposes a better prediction system based on the computer vision and the machine learning techniques for aiding in the prediction of CKD under various phases of CKD. Innovative features and Ensemble Entropy Attribute Weighted Deep Neural Network (EEAw-DNN) were used for the process of rapid detection. In this study, many evaluations are carried out on variety of classes were done and correlated as per the estimated rates of glomerular filtration (GFR). The outputs prove that the system might yield consistent prediction process and the medical treatment of CKD victims. Datasets were extracted from the University of California Irvine (UCI) machine learning repository.
1234
S. B. V. J. Sara and K. Kalaiselvi
2 Literature Review The parameters of variety of clinical examinations which were examined by Anandanadarajah and Tharmarajah [14] in order identify the data which is helpful for predicting CKD. A repository with many parameters of healthy patients and patients suffering from CKD are investigated using different methods. The initial recognition of the dominant parameters carried out by Common spatial pattern (CSP) filter and linear discriminant analysis could yield positive results in the prediction of CKD. Then the classification techniques are utilized to recognize the feature which seems to be domiant. Also when there is inadequate data based on hypertension and diabetes mellitus, random blood glucose and blood pressure, these parameters could be utilized. Salekin and Stankovic [16] analysis take into account 24 predictive attributes and construct a machine learning classifier in the diagnosis of CKD. Utilizing machine learning methods attain a prediction accuracy of 0.993 as per the F1-measure with 0.1084 root mean square error. This is a 56% reduction of mean square error correlated to the arts state. The process of feature identification recognizes the most reliable parameters for the prediction of CKD and assesses them for their detection process. At last an analysis is carried out on cost-accuracy tradeoff in order to recognize an innovative CKD prediction process with greater accuracy and at a lower cost. Luck et al. [17] constructed a summarized and an approach which is easy to understand, which identifies the discriminate local phenomena using the actual exhaustive algorithm that is rule mining, to diagnose two different group of subjects: (1) subjects with least to the medium phases with no renal failure and (2) subjects with medium to severely affected CKD phases suffering from patients having moderate to established CKD stages with renal failure. The prediction algorithm brings out the table containing the m-dimensional variable space to acquire the local over densities of the 2 categories of subjects within the form of rules that are easy for interpretation. Moreover, the basic knowledge of concentration of urinary metabolites classifies the various phase of CKD of the subjects exactly. Huang et al. [18] formulated a prediction model of chronic diseases prognosis and detection system of combining data mining techniques along with case-based reasoning (CBR). The major procedure of the system adds: (1) acquiring the sensible techniques of data mining to identify the rule which are implicitly meaningful after examining the health data, (2) with the help of generated rules for the particular chronic disease diagnosis (3) implementing CBR to fix the chronic diseases prediction and medications and (4) extending these progression of work within the system for the easy gaining of knowledge for the prediction of chronic disease, construction, expanding these processes to work within a system for the convenience of chronic diseases knowledge generation, arrangement, cleaning and distribution. Di Noia et al. [19, 20] formulated software which acts as a tool for exploiting the potential of Artificial Neural Networks (ANNs) in order to classify subjects’ health resulting to (ESKD). ESKD infects kidneys in human beings demanding renal replacement medication along with dialysis or even kidney transplantation. The tool which was constructed was made available to both online Web application and as an
Feature Selection and (EEAw-DNN) for (CKD) Prediction
1235
Android mobile application. Its medical utility is the development focused on greatest worldwide availability. Polat et al. [21] formulated an algorithm namely Support Vector Machine (SVM) classification which was used in the prediction of CKD. In the wrapper method, the evaluator which was used as a subset classifier aided with greedy stepwise search engine and wrapper subset evaluator with the Best First search (BFS) engine were also utilized. Norouzi et al. [22] formulated an adaptive neurofuzzy inference system (ANFIS) for the detection of the renal failure timeframe of CKD focused on actual medical information. A Takagi-Sugeno type ANFIS technique was used in the identification of GFR values. Several variables are identified for the prediction model. This model can accurately detect the variations in GFR, in spite of several uncertainties within the human body and the nature which seems to be dynamic for the progression of CKD for the future period of time. Tangri et al. [23] constructed a prediction model which is validated for the purpose of CKD progression. The Development phase and the stage of validation of the prediction models, utilizes the information gathered from demographic, medical, and laboratory information from 2 independent Canadian cohorts of subjects suffering from various CKD phases of 3 to 5, were suggested for medication from nephrologists between April 1, 2001, and December 31, 2008. A prediction model utilized by regularly conducted lab investigations can accurately diagnose the progression to the failure of the kidney in the subjects who suffer with CKD phases 3 to 5.
3 Proposed Methodology This work formulated a detection system focussed on the vision of computer and the techniques involved in Machine Learning process which helps in the prediction of CKD and its various phases. Innovative features and the algorithm namely Novel features and Ensemble Entropy Attribute Weighted Deep Neural Network (EEAwDNN) were subjected for the purpose of quick prediction process. To diagnose the CKD, 3 significant varieties of algorithms of FS namely wrapper, filter and embedded methods are used in the selection of reduction of features from the CKD dataset. In the proposed work, many validations are done on various classes which were carried out as per the eGPR. The output proved that the system could yield continuous prediction and the medication involved for the CKD cases. From University of California Irvine (UCI) machine learning repository the datasets were obtained. Figure 1 proves that the system of EEAw-DNN classifier used for classification process and CKD identification. The total layout of the formulated system is depicted in Fig. 1.
1236
S. B. V. J. Sara and K. Kalaiselvi
Filter HFWE based Feature selection CKD database
EEAw-DNN Wrapper
prediction
Embedded CKD prediction
Fig. 1. Proposed EEAw-DNN classifier architecture for CKD prediction
CKD is categorized into phases from I–V as per the evaluated Glomerular Filtration Rate (GFR) [24, 25]. UCI containing CKD data set consists of 24 attributes and one extra attribute class (binary) [26]. It consists of few samples which are 400 in number onto two distinct classes. (“CKD”-.250 cases; “NOTCKD”-.150 cases). Within the 24 attributes, 11 attributes are numeric and 13 attributes are nominal. The set of data consists of some which are missing. Removing the missing valued tuples, around 160 samples were utilized in this proposed work. 3.1
Feature Selection
The filter algorithm is used to select the features whose rank is considered to be the highest within them, and the selected features of subset can be used for any of the prediction based algorithm. Relief-F Relief-F is an algorithm of instance-dependent attribute selection algorithm which identifies the finest features having the value which distinct the instances with the reason from many groups that can correlate with each others. One-R One-R is a powerful algorithm [27], it constructs a single rule applicable for every feature within the training information and selects the rule with a minimal error. Each and every numerically valued parameters as constant and utilizes the basic algorithm to discrete the values with the range of dislocated intervals. Gain Ratio The Gain Ratio is The non-symmetrical evaluation is the Gain ratio constructed with the purpose of balancing the direction formulated got the IG bias [28]. Gini Index (GI) An algorithm called Gini index [29] is supervised multivariate FS technique which is the filter mechanism for evaluating the capability of a feature to distinct various classes.
Feature Selection and (EEAw-DNN) for (CKD) Prediction
1237
Wrapper Algorithm Wrapper technique validates the features’ scores of the data sets that focus on the power estimated using the classification algorithm as black box. Bat Algorithm (BA) is potential algorithm under the stage of development, still certain shortage is found at exploration [30], hence it is possible to directly chosen in local, a very minimum on the maximum of the multimodal validating operations. To resolve the issue of standard BA, 2 major changes are subjected to maximize the analysis and construction of BA. Commonly, the BA analysis and construction are controlled by the pulse emission rate r, which improves the repetitive process. Step 8 and step 9 lines in BA proceed with the local search belonging to BA. Step 7 is investigated and the algorithm slowly loses its construction potentiality whereas iteration continuing. In order to resolve this issue, the ability of BA is improved by proposing the linear decreasing inertia weight factor [31, 32]. The factor namely inertia weight factor restricts the analysis and growth of BA. Frequency value which is randomly created is allotted for the purpose of feature identification solution within BA and this value of such frequency will have the similar impact to all dimensions of the feature selection solution. All the features with certain variation make no sense in this aspect. This creation minimizes BA’s local search results. Embedded Algorithm Depending on the SVM the Embedded algorithm is carried out (SVM-t) to select the attributes which seems to be analytical, obtained from the dataset of CKD. The Support vectors’ utilize the data of SVM in direction towards the creation of greatest segmentation hyper plane and identification of each CKD dataset classes. The position of nearest CKD dataset points within two classes in SVM, take part an important role for the purpose of selecting the features. Ensemble Entropy Attribute Weighted Deep Neural Network (EEAw-DNN) Prediction Customarily, around 10,000 reiterations and there are a huge number of emphasess and furthermore neural system parts in the learning arrangement of DNNs. Considering the adequate calculation multifaceted nature in the testing system, one thing is to rapidly choose a little subset of neural system parts in various preparing cycles. In the meantime, thinking about the high exactness prerequisite, something else is to adaptively consolidate this subset of neural system parts and develop a last arrangement framework. In the following, the unified framework of EEAw-DNN is first formulated. Next, the detail procedure of EEAw-DNN is then described. To detail the gathering choice [33], the individual classifier choices can be join by dominant part casting a vote, which aggregates the votes in favor of each class and chooses the class that gets the majority of the votes. While the majority voting is the most popular combination rule, a major limitation of majority voting is that only the decision of each model is taken into account without considering the distribution of decisions Specifically, all the conceivable models in the speculation space could be abused by thinking about their individual choices and the connections with different theories.
1238
S. B. V. J. Sara and K. Kalaiselvi
Therefore, BM assigns the optimal label y to y according to the following decision rule, i.e., Pðy Þ ¼ argmaxy PðyjH; xÞ
ð1Þ
where W(y, hi(x)) is a function of y and hi(x). By multiplying the scaling factor k > 0, W(y, hi(x)) can have a different range in R. There are two key issues for optimizing Eq. 3. The first one is the calculation of W(y; hi(x)). As mentioned above, P(y|hi,x) is the distribution of describing the correlation between decision y and hi(x). Thus, W(y, hi(x)) can be derived from y, hi(x) and the distance between y and hi(x). Here, W(y, hi(x)) is assumed to be computed as W ðy; hi ð xÞÞ ¼ I ðy ¼ hi ð xÞÞ þ U ð yÞ V ðy; hi ð xÞÞ
ð2Þ
where both U ð yÞ and V(y; hi ð xÞ) are functions. I(y = hi(x)) returns 1 when y = hi(x); otherwise, Iðy ¼ hi ðxÞÞ ¼ 0. In this work, statistics the frequencies of classes at the same location from the label and the hypothesis on the validation set, and calculate the cost of two different classes (a and b) as Costða; bÞ ¼ 1 PðajbÞ
ð3Þ
Note that if both y and hi(x) are from the given CKD sample, then they will have a competitive relationship with eachother. Thus, V (y, hi(x)) can be calculated with V ðy; hi ð xÞÞ ¼
FðCLDðy; hi ð xÞÞhi ð xÞ 2 positive FðCLDðy; hi ð xÞÞhi ð xÞ 2 negative
ð4Þ
where F is a function of the Cost Levenshtein Distance (CLD) between y and hi(x). By a heuristic approach, the values of F can be empirically assigned at the multiple integral points, and the values at other points can be calculated by the piecewise linear interpolation. The second issue is about generating voting candidates (more probable labels of the hypotheses). Obviously, the ground truth doesn’t always appear in the decisions made by H. It is compulsory to uncover an efficient way to generate high-quality candidates from all the decisions, i.e., to find a more probable label yi(x) from the existed initial label y0i ð xÞ of hypothesis hi. Generally speaking, a good candidate means it has a miniature edit distance with most of the hypotheses. Following this idea, propose an algorithm too semantically generate voting candidates (see Algorithm 1).
Feature Selection and (EEAw-DNN) for (CKD) Prediction
1239
Algorithm 1: Generating Voting Candidates Input- H = (h1, h2,,,,,, hL): the base classifier set, |H|= L. Y0-the initial decisions made by H. ED- the measurement function of the pairwise distance. -the upper bound of the distance between the candidate and the hypothesis. Output:Y- the voting candidate set.
In Algorithm 1, the searching process of H is an implicit computational way for P(D|hi). In experiments, a special simple case of algorithm 1 is used, where during the voting candidates generation process, Y0 is initialized only by H, the upper bound is set from h to inf, and P(D|hi) is assumed to be a constant. EEAw-DNN Within the above framework, the procedure of P(D|hi) for CKD includes three major steps, i.e., base classifiers generation, classifier combination, and ensemble pruning. Age Ensembles work best if the base models have high expectation results and don’t cover in the arrangement of precedents they misclassify. Profound Neural Networks (DNNs) are normally utilized as a base classifier generator for groups. From one viewpoint, EEAw-DNN being made out of numerous preparing layers to learn portrayals of information with different dimensions of deliberation. Then again, amid the preparation period of one individual profound neural system, two depictions with various minibatch orderings will merge to various arrangements. Those previews frequently have the comparative blunder rates, however commit distinctive errors. This decent variety can be misused by ensembling, in which numerous snapnots are normal inspecting and after that joined with dominant part casting a ballot yield layer, which gauges the arrangement likelihood adapted on the information picture, for example P(h|x), where x is the info CKD tests and h speaks to a class arrangement The center of
1240
S. B. V. J. Sara and K. Kalaiselvi
EEAw-DNN is to ascertain y* (by Eq. 3), i.e., the figuring of F, which is a component of separation among y and hi(x). Here, F is spoken to by the arrangement of qualities at the different vital focuses. These qualities are allocated with the most noteworthy forecast rate on the approval set. In classifier ensemble, pruning can generally improve the ensemble performance. The attribute weight function referred to as entropy is given by GðawÞ ¼ 1 þ
nX
o p ð aw; c Þ log p ð aw; c Þ =log n samples j
ð1:6Þ
Actually, this equation represents some entropy ratio, i.e. Gðaw Þ ¼ 1HðcjawÞ=HðcÞ
ð1:7Þ
where H(c) is the entropy of the distribution (uniform) of the classes and H(c|aw) is the entropy of the conditional distribution given that the attribute ‘aw’ appeared and more belongs to classes either positive or negative. The last tested global function is the real entropy of the conditional distribution: GðawÞ ¼ HðcjawÞ ¼
X c
pðaw; cÞ log pðaw; cÞ
ð1:8Þ
In EEAw-DNN pruning, initially, a population of binary weight vectors which is created from entropy function, where 1 means the classifier is remained. Secondly, the population is iteratively evolve where the fitness of a vector Aw is measured on the validation set V, i.e., f(Aw) = PV Aw (P stands for the prediction rate). Finally, the ensemble is correspondingly pruned by the evolved best weight vector Aw. The detail procedure of the EEAw-DNN ensemble is shown in Algorithm 2.
Algorithm 2: EEAw-DNN (classifier combination) Input- H = (h1, h2,… hL): the base classifier set, |H| = L. F a function of distance between y and hi(x), attribute weight Parameter- Y - the voting candidates set generated by Algorithm 1. Output- Y*- the label of CKD prediction Procedure 1.
Initialize Y by H.
2.
Generation of the attribute weight by eq. 1.6
3.
For y
4.
Calculate P(y|H, x) through Eq.1. 2.
5.
End
6.
Calculate y* through Eq.1. 3.
Y:
Feature Selection and (EEAw-DNN) for (CKD) Prediction
3.2
1241
Experimental Results
Classification Accuracy–shows the potential of the classifier algorithm to predict the dataset classes. Accuracy ¼
TP þ TN 100 TP þ FP þ TN þ FN
ð1:9Þ
Sensitivity–shows the accuracy level of target class’s occurrence (Eq 1.10). Recall ¼ Sensitivity ¼
TP 100 TP þ FN
ð1:10Þ
Specificity associates with the ability of the test exactly to detect the patients without any condition. Specificity ¼
TN 100 TN þ FP
ð1:11Þ
Precision can also term as positive predictive value which is considered to be the fraction of relevant instances within the instances which are retrieved. Precision ¼
TP 100 TP þ FP
ð1:12Þ
F-measure is considered to be the mean of precision and recall is measured as follows: F measure ¼ 2
Recall Precision 100 Recall þ Precision
ð1:13Þ
Figure 2 represents the comparison of performance outputs of sensitivity with the relevant classifiers like ANN, SVM, SVM-HWFFS, SVM-HFWE-FS and formulated EEAw-DNNNB classifier. The formulated EEAw-DNNNB algorithm yields greater results of sensitivity of 97.30% whereas other classifiers like ANN, SVM, SVMHWFFS, and SVM-HFWE-FS yields results of sensitivity of 55.56%, 62.50%, 87.501% and 95.45%. Figure 3 represents the outcomes of the proposed EEAw-DNNNB technique produces higher specificity results of 87.50% whereas other classifiers such as ANN, SVM, SVM-HWFFS, SVM-HFWE-FS produces specificity results of 55.56%, 62.50%, 87.50% and 87.50%.
1242
S. B. V. J. Sara and K. Kalaiselvi
Methods Results(%) Sensitivity
Specificity
Precision
F-measure
Accuracy
Error rate
ANN
76.19
55.56
80.00
78.05
70.00
30.00
SVM
77.27
62.50
85.00
80.95
73.33
26.67
SVM-HWFFS
90.91
87.50
95.24
93.02
90.00
10.00
SVM HFWE-FS
95.45
87.50
95.45
95.45
93.33
6.67
EEAw-DNN
97.30
87.50
95.58
96.43
94.70
5.30
Fig. 2. Sensitivity performance comparison vs. classifiers 100 90 80
Specificity (%)
70 ANN SVM SVM-HWFFS SVM -HFWE-FS EEAw-DNN
60 50 40 30 20 10 0 Methods
Fig. 3. Specificity performance comparison vs. classifiers
Figure 4 represents the results of the formulated EEAw-DNNNB algorithm produces higher precision results of 95.58% whereas other classifiers like ANN, SVM, SVM-HWFFS, SVM-HFWE-FS produces precision results of 80%, 85%, 95.24% and 95.45%.
Feature Selection and (EEAw-DNN) for (CKD) Prediction
1243
100 90 80
Precision (%)
70 ANN SVM SVM-HWFFS SVM -HFWE-FS EEAw-DNN
60 50 40 30 20 10 0 Methods
Fig. 4. Precision performance comparison vs. classifiers
Figure 5 represents the output of the proposed EEAw-DNNNB technique yields a greater f-measure output of 96.43% whereas other classifiers such as ANN, SVM, SVM-HWFFS, SVM-HFWE-FS produces f-measure results of 78.05%, 80.95%, 93.02% and 95.45%. 100 90 80
F-Measure (%)
70 ANN SVM SVM-HWFFS SVM -HFWE-FS EEAw-DNN
60 50 40 30 20 10 0 Methods
Fig. 5. F-measure performance comparison vs. classifiers
Figure 6 represents the output of the proposed EEAw-DNNNB technique yields greater results of accuracy of 94.70% whereas other classifiers like ANN, SVM, SVMHWFFS, SVM-HFWE-FS produces accuracy results of 70%, 73.33%, 90% and 93.33%.
1244
S. B. V. J. Sara and K. Kalaiselvi 100 90 80
Accuracy (%)
70 ANN SVM SVM-HWFFS SVM -HFWE-FS EEAw-DNN
60 50 40 30 20 10 0 Methods
Fig. 6. Accuracy performance comparison vs. classifiers
Figure 7 represents the results of the formulated EEAw-DNNNB algorithm produces minimized error rate output of 5.30% whereas other classifiers like ANN, SVM, SVM-HWFFS, SVM-HFWE-FS produces accuracy results of 30%, 26.67%, 10% and 6.67%. 100 90 80 70 ANN SVM SVM-HWFFS SVM -HFWE-FS EEAw-DNN
Error (%)
60 50 40 30 20 10 0 Methods
Fig. 7. Error rate performance comparison vs. classifiers
4 Conclusion and Future Work Chronic kidney disease (CKD) is considered to be a serious problem of health aspect. Prediction of CKD needs to be accurate which is essential for the minimization of treatment cost and mortality. The classification algorithms’ accuracy depends on the usage of exact feature extraction techniques leads to the classification algorithms accuracy in o5rder to reduce the data set dimension. The feasible feature set is
Feature Selection and (EEAw-DNN) for (CKD) Prediction
1245
recognized by the CKD under the use of Hybrid Filter Wrapper Embedded (HFWE) based feature selection. The quality of the methods was validated with the help of the features selected. In CKD prediction process, Wrapper, filter and embedded based FS algorithms are utilized in the minimization of dimensionality of features with the help of EEAw-DNN classifier is used for the purpose of classifying the features. The current study formulates Ensemble Entropy Attribute Weighted Deep Neural Network (EEAwDNN) for detecting the CKD focused on the data. The formulated EEAw-DNN prediction system is formulated to identify adaptively deep classifier components at several repetitions from the entire neural system. Moreover, the ensemble is proposed as a Bayesian framework for classifier attribute weighting and combination. The output proves that the EEAw-DNN classifier utilizing HFWE-FS method produces high level of accuracy in the prediction of CKD when co-related to other methods. This dataset consists of noisy and missing data. Therefore, an algorithm for the purpose of classification is essential with the potential of dealing with the noisy and missing data, which is considered to be the future scope. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest. ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Chen, P., Zhang, Q.-L., Rothenbacher, D.: Prevalence of chronic kidney disease in population-based studies: systematic review. BMC Public Health 8(1), 117 (2008) 2. Baumgarten, M., Gehr, T.: Chronic kidney disease: detection and evaluation. Am. Fam. Physician 84(10), 1138 (2011) 3. Moyer, V.A.: Screening for chronic kidney disease: us preventive services task force recommendation statement. Ann. Intern. Med. 157(8), 567–570 (2012) 4. Frimat, L., Pau, D., Sinnasse-Raymond, G., Choukroun, G.: Data mining based on real world data in chronic kidney disease patients not on dialysis: the key role of early hemoglobin levels control. Value Health 18, A508 (2015) 5. Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2011) 6. Jiawei Han, M.K., Jian, P.: Data Mining: Concepts and Techniques. 3rd edn. Elsevier (2012) 7. Larose, D.T.: Discovering knowledge in data: an introduction to data mining (2009) 8. Rodriguez, M., Salmeron, M.D., Martin-Malo, A., Barbieri, C., Mari, F., Molina, R.I., et al.: A new data analysis system to quantify associations between biochemical parameters of chronic kidney disease-mineral bone disease. PLoS One 11, e0146801 (2016). https://doi. org/10.1371/journal.pone.0146801 9. Levey, A.S., Coresh, J., Balk, E., et al.: National kidney foundation practice guidelines for chronic kidney disease: evaluation, classification, and stratification. Ann. Intern. Med. 139(2), 137–147 (2003) 10. Keane, W.F., Zhang, Z., Lyle, P.A., et al.: Risk scores for predicting outcomes in patients with type 2 diabetes and nephropathy: the RENAAL study. Clin. J. Am. Soc. Nephrol. 1(4), 761–767 (2006)
1246
S. B. V. J. Sara and K. Kalaiselvi
11. Taal, M.W., Brenner, B.M.: Predicting initiation and progression of chronic kidney disease: developing renal risk scores. Kidney Int. 70(10), 1694–1705 (2006) 12. Akay, M.F.: Support vector machines combined with feature selection for breast cancer diagnosis. Expert Syst. Appl. 36, 3240–3247 (2009) 13. Özçift, A., Gülten, A.: Genetic algorithm wrapped Bayesian network feature selection applied to differential diagnosis of erythemato-squamous diseases. Digital Signal Process. 23, 230–237 (2013) 14. Anandanadarajah, N., Tharmarajah, T.: Identifying important attributes for early detection of chronic kidney disease. IEEE Rev. Biomed. Eng. 1–9 (2017) 15. Varma, B.P., Raman, L.K., Ramakrishnan, L.S., Singh, L., Varma, A.: Prevalence of early stages of chronic kidney disease in apparently healthy central government employees in India. Nephrol. Dial Transplant. 3011–3017 (2010) 16. Salekin, A., Stankovic, J.: Detection of chronic kidney disease and selecting important predictive attributes. In: IEEE International Conference on Healthcare Informatics (ICHI), pp. 262–270 (2016) 17. Luck, M., Bertho, G., Bateson, M., Karras, A., Yartseva, A., Thervet, E., et al.: Rule-mining for the early prediction of chronic kidney disease based on metabolomics and multi-source data. PLoS One 11, e0166905 (2016) 18. Huang, M.-J., Chen, M.-Y., Lee, S.-C.: Integrating data mining with case-based reasoning for chronic diseases prognosis and diagnosis. Expert Syst. Appl. 32, 856–867 (2007) 19. José, N., Rosário Martins, M., Vilhena, J., Neves, J., Gomes, S., Abelha, A., Machado, J., Vicente, H.: A soft computing approach to kidney diseases evaluation. J. Med. Syst. 39, 131 (2015) 20. Di Noia, T., Claudio, V., Ostuni, F.P., Binetti, G., Naso, D., Schena, F.P., Di Sciascio, E.: An end stage kidney disease predictor based on an artificial neural networks ensemble. Expert Syst. Appl. 40, 4438–4445 (2013) 21. Polat, H., Mehr, H.D., Cetin, A.: Diagnosis of chronic kidney disease based on support vector machine by feature selection methods. J. Med. Syst. 41(4), 55 (2017) 22. Norouzi, J., Yadollahpour, A., Mirbagheri, S.A., Mazdeh, M.M., Hosseini, S.A.: Predicting renal failure progression in chronic kidney disease using integrated intelligent fuzzy expert system. Comput. Math. Methods. Med. (2016) 23. Tangri, N., Stevens, L.A., Griffith, J., Tighiouart, H., Djurdjev, O., Naimark, D., Levin, A., Levey, A.S.: A predictive model for progression of chronic kidney disease to kidney failure. JAMA 305(15), 1553–1559 (2011) 24. Levey, A.S., Coresh, J.: Chronic kidney Dis. Lancet 379, 165–180 (2012) 25. National Kidney Foundation: K/DOQI clinical practice guidelines for chronic kidney disease: evaluation, classification, and stratification. Am. J. Kidney Dis. 2002(392 Suppl 1), S1–S266 (2002) 26. Anderson, J., Glynn, L.G.: Definition of chronic kidney disease and measurement of kidney function in original research papers: a review of the literature. Nephrol. Dial. Transplant. 2011(26), 2793–2798 (2011) 27. Novakovic, J., Strbac, P., Bulatovic, D.: Toward optimal feature selection using ranking methods and classification algorithms. Yugoslav J. Oper. Res. 21(1), 119–135 (2011) 28. Komarasamy, G., Wahi, A.: An optimized K-means clustering technique using bat algorithm. Eur. J. Sci. Res. 84(2), 263–273 (2012) 29. Kumar, M.: Prediction of chronic kidney disease using random forest machine learning algorithm. Int. J. Comput. Sci. Mob. Comput. 5(2), 24–33 (2016) 30. Yang, X.S.: A new metaheuristic bat-inspired algorithm. In: Gonzalez, J.R. et al. (eds.) Nature Inspired Cooperative Strategies for Optimization (NISCO 2010), vol. 284, pp. 65–74. Springer Press (2010)
Feature Selection and (EEAw-DNN) for (CKD) Prediction
1247
31. Yang, X.S.: Bat algorithm for multi-objective optimization. Int. J. Bio Inspired Comput. 3(5), 267–274 (2011) 32. Nakamura, R., Pereira, L., Costa, K., Rodrigues, D., Papa, J., Yang, X.S.: BBA: a binary bat algorithm for feature selection. In: Proceedings 25th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), 22–25 August 2012, pp. 291–297 (2012) 33. .Zhou, X., Xie, L., Zhang, P., Zhang, Y.: An ensemble of deep neural networks for object tracking. In: IEEE International Conference on Image Processing (ICIP), pp. 843–847 (2014)
Swarm Intelligence Based Feature Clustering for Continuous Speech Recognition Under Noisy Environments M. Kalamani1(&), M. Krishnamoorthi2, R. Harikumar1, and R. S. Valarmathi3 1
Department of ECE, Bannari Amman Institute of Technology, Coimbatore, Tamilnadu, India [email protected], [email protected] 2 Department of CSE, Dr.N.G.P. Institute of Technology, Coimbatore, Tamilnadu, India [email protected] 3 Department of ECE, Vel Tech Rangarajan Dr.Sagunthala R&D Institute of Science and Technology, Chennai, Tamilnadu, India [email protected]
Abstract. Swarm Intelligence based Feature Clustering using Artificial Bee Colony (ABC) technique is proposed and implemented in this research work. It is used to group and label the features of continuous speech sentence. This algorithm is unsupervised classification which classifies the feature vectors into different clusters and it will enhance the quality of clustering. Simulation is carried out for various number of clusters for different speech recognition algorithms under clean and noisy environmental conditions of NOIZEUS database signals. Experimental results reveal that the proposed hybrid clustering provides substantial enhancements in various performance measures compared with the existing algorithms and found as number of cluster as 5 produces the optimal result. For this optimal value, ABC clustering technique provides optimal enhancement in the recognition accuracy of Continuous Speech Recognition compared with the existing clustering techniques under different speech signal environments. Keywords: Feature clustering Labelling K-means Fuzzy c-means Artificial Bee Colony Centroid Continuous Speech Recognition
1 Introduction Today, the major challenging problem faced for human-machine interface through Continuous Speech Recognition (CSR). Recently, the more sophisticated recognizers are required to short out these problems. In CSR before modelling, the extracted features are labelled using efficient clustering. Recently, clustering algorithm has plays a major role in the various signal processing applications. Clustering methods mentioned in the literature are used the different principles to classify the objects into clusters. During clustering process, the properties of whole clusters are used instead of © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 1248–1255, 2020. https://doi.org/10.1007/978-3-030-37218-7_130
Swarm Intelligence Based Feature Clustering for Continuous Speech Recognition
1249
individual feature and this might be useful when handling large amounts of feature vectors. Conventional clustering is mostly used to handle low dimensional features. It is mainly categorized into two methods. One is the partitional clustering in which the feature vectors are grouped into some specified number of clusters. Second is the Hierarchical clustering in which the feature vectors are clustered with a structure of partitions [1]. The partitional clustering techniques are classified into two classes. In hard clustering, each feature vectors gathered into single group. But in soft clustering, feature vectors are assigned into more than one cluster, and it have diverse membership levels. Soft clustering is used to assign the membership values and data values to one or more than one clusters. The example for most widely used partitional clustering are fuzzy cmeans (FCM) and k-means (KM) techniques. KM method is a famous and approximate algorithm which is used to optimize the convergence. The objective of this method is to find K-mean vectors with K-cluster centroids. In this, global minimum is not achieved due to discrete values. Also, it depends on initial value [2]. KM with unsupervised approach is the simple clustering technique. For many applications, this clustering technique is mainly used because it is simple, efficient with low cost. Hence, it is produce the clusters with uniform size [3]. FCM clustering technique is frequently used for feature clustering applications. In this, the intra-cluster distance is minimized but it depends on initial values. FCM is the unsupervised and most widely used soft clustering method. This approach is used to label and cluster the extracted features in speech recognition system with less error rate. Fuzzy clustering algorithm is successfully applied in segmentation applications. The limitations of FCM algorithm as: Cluster number required in advance and mostly converge to a local minima or saddle point. Hierarchical clustering is broadly divided into agglomerative and divisive. This clustering does not required the number of cluster in advance. Due to its complexity, this has low execution efficiency than partitional clustering. Both clustering has pros and cons such as number and shape of clusters, degeneracy and overlapping of clusters [4, 5]. In order to overcome these issues in conventional clustering techniques, nowadays a hybrid clustering algorithms are used. The conventional techniques failed to produce efficient results when the size of the dataset increased. So clustering with swarm-based optimization algorithms emerged as an alternative to the traditional clustering algorithms. Many traditional clustering algorithms combined with optimization and this hybrid clustering are developed recently. The nature of optimization techniques is population based and can find the near optimal solutions. Optimization techniques will avoid being trapped in a local optimum when combined with partitional clustering algorithms and perform even better [6–8]. ABC clustering technique is implemented in this research work. This hybrid clustering used to group the extracted feature vectors into centroids and label it before modelling in CSR. This paper deals with: Sect. 2 provides the FCM feature clustering techniques and Sect. 3 demonstrate the proposed ABC clustering technique for Continuous Speech Recognition. Section 4 demonstrate the performance measures of proposed and existing clustering under different environments of speech signal. Finally, Sect. 5 conclude this research work.
1250
M. Kalamani et al.
2 Fuzzy C-Means Clustering for Continuous Tamil Speech Recognition FCM clustering is an unsupervised and most commonly used clustering techniques. In FCM clustering, the data points for all groups have various membership values between 0 and 1. Based on the distance, this membership values are assigned. In this, all values are updated iteratively [4] and it is summarized as follows: 1. 2. 3. 4.
Let xi, i = 1, 2, …, NT and it is going to clustered. Assume cluster number as CL and its lies in the range 2 CL NT. Choose the fuzziness of the cluster (f) and it is greater than one. Initialize membership matrix randomly, such that Uij € [0,1] and CL P Uij ¼ 1 for each ‘i’ j¼1
5. Compute the center of the jth cluster as follows: NT P
Uijf xi
ccj ¼ i¼1 NT P
i¼1
ð1Þ Uijf
6. Compute the Euclidean distance, Dij between ith feature and jth centroid as: Dij ¼ xim CCj
ð2Þ
7. Based on Dij, update fuzzy membership matrix U. If Dij > 0, then Ujij ¼
1 2 CL f 1 P Dij C¼1
ð3Þ
DiC
8. If Dij = 0, then Uij = 1 because data points are having same value. 9. Continue the Steps 5–7 until termination criterion is reached.
3 Proposed ABC Clustering for Continuous Tamil Speech Recognition The proposed ABC algorithm aims to locate the cluster centers of feature vectors based on minimum objective function. The clever searching characteristics of honey bee swarms are used to cultivate the new approach as Artificial Bee Colony (ABC). This approach contains three phases such as onlooker bee, employed bee and scout bee.
Swarm Intelligence Based Feature Clustering for Continuous Speech Recognition
1251
Among this three, first two phases are predictable and third phase is a random one. This ABC clustering algorithm contains four phases and step by step process in each phase is defined in detail in the subsequent sections [8]. 3.1
Initialization Phase
In this phase, population size ‘S’ is assumed and position for each swarm is generated randomly. The entire population is divided into two with equal size and it is namely as onlooker and employed bees. Initially, the count of current scout bee is initialized to zero. After this initial process, the solution is the repeated cycles of all three phases. Consider the feature vector D consist of ‘n’ elements and select two features from D as centroids C1, C2 and has the distance value as i, j respectively. After the distance calculation between the centroids and feature vectors, the feature value is stimulated to the cluster with minimum distance value. Finally, the feature vectors are classified into these two initial clusters based on their minimum distance value. Then, fitness value f ðzi Þ for each employed bee is calculated by adding all the respective distance values. 3.2
Employed Bee Phase
Here based on fitness value, the new position of all employed bees is reformed and it is described as: yi;j ¼ zi;j þ Ui;j ðzi;j zk;j Þ
ð4Þ
Where, yi;j is the new position and zi;j is the old position of employed bee; random number Ui;j and its value assumed between [−1, 1]; k is arbitrarily nominated neighborhood for solution i. The optimal position of the employed bee is selected based on greedy selection strategy. In this, value of new position is greater than the previous position, then the value of new position is taken or otherwise the value of previous position is taken for further processing. After evaluating the new position, the value of the fitness is calculated for all employed bees as: fitðzi Þ ¼
f ðzi Þ 0 1 þ absðf ðzi ÞÞif f ðzi Þ\0
1 1 þ f ðzi Þ if
ð5Þ
The algorithm proceeds to the onlooker phase only when all the employed bees have been processed. 3.3
Onlooker Bee Phase
Here, onlooker bee position is selected based on the fitness likelihood which is supplied from the previous phase. The likelihood of position for all employed bee is calculated as follows:
1252
M. Kalamani et al.
fitðzi Þ Pi ¼ Ps=2 i¼1 fit ðzi Þ
ð6Þ
The position of onlooker bee selected based on highest fitness value and finds the suitable new position in the neighborhood. Updated using Eq. (4) for quality improvement of new position for the onlooker bee. Finally, this bee compares the old position with new one. This bee will retain the new position value when it is greater than old position value or otherwise the old position value is retained. 3.4
Scout Bee Phase
This phase is the last step in the conventional ABC method. If the solution of the first phase is not improved with finite number of cycles, then solutions of final phase is uncontrolled one. Then, this scout bee searches the solution in the new space randomly. With the new populations created in this iteration, continued the next iterations. For each iteration, all the phases of the proposed approach are executed until the stopping criteria is reached. Then, final centroid is updated based on the new position.
4 Results and Discussions The speech sentences with different noise environment are obtained from the NOIZEUS database and it is used throughout the evaluation process of existing and proposed algorithms. In this research work, the KM, FCM, and ABC approaches are implemented for labeling the extracted speech features before the modeling phase under different speech signal environments. For ABC algorithm, swarm size is assumed as 30 and number of iterations is fixed as 100 with Upper Bound value of 5. The performance measures for the existing and proposed clustering techniques for sp01 speech sentence in NOIZEUS database under clean and 5 dB airport noise are described in Tables 1 and 2. From these experimental results, it is found that the proposed ABC clustering technique reduces the average Intra cluster distance, DB index and distance index from 9.4–12.9%, 14.3–20.9%, 34.3–50% respectively and increases the average Beta index from 9.6–25.1% when compared existing KM and FCM algorithms under clean signal environments. In addition, proposed ABC clustering technique reduces the average Intra cluster distance, DB index and distance index from 3.35–6%, 13.6–22.5%, 28.9–44.9% respectively and increases average the Beta index from 17.3–31.4% when compared all existing algorithms under 5 dB airport noise signal environments. Hence, it is perceived that the proposed ABC clustering provided signification improvement in Intracluster distance, Beta Index, DB index, and Distance Index when compared to the existing clustering techniques. The Recognition Accuracy (%) versus cluster number for the ExpectationMaximization with Gaussian Mixture modeling (EM-GMM) technique with K-means, FCM and proposed ABC clustering algorithms sp01 speech sentence in NOIZEUS database under various speech signal environments shown in Figs. 1 and 2. From these
Swarm Intelligence Based Feature Clustering for Continuous Speech Recognition
1253
experimental results, it is reveal that all the clustering techniques for various modeling techniques provide the considerable enhancement in recognition accuracy for the optimal cluster number as 5 compared with other values. Table 1. Performance comparison of existing and proposed clustering for sp01 speech sentence (Clean) Performance measures Existing clustering Proposed ABC-FCM technique K-Means FCM ABC Intra-cluster distance 356.46 342.75 310.48 286.15 Beta index 7.23 8.72 9.65 11.43 DB index 0.91 0.84 0.72 0.59 Distance index 0.46 0.35 0.23 0.12
Table 2. Performance comparison of existing and proposed clustering for sp01 speech sentence (5 dB airport noise) Performance measures Existing clustering Proposed ABC-FCM technique K-Means FCM ABC Intra-cluster distance 412.57 401.29 387.84 314.73 Beta Index 6.15 7.42 8.97 10.26 DB index 0.98 0.88 0.76 0.63 Distance Index 0.49 0.38 0.27 0.18
3
4
5
6
7
84.8 88.6 80.7
82.9 86.7
91.7
93.1
Proposed (ABC) 86.5 88.2
FCM
88.90 91.6 94.5
84.70 89.2 92.1
87.8 90.4 82.7
Recognition Accuracy (%)
K-Means
8
Number of Clusters Fig. 1. Average recognition accuracy (%) versus number of clusters of the existing and proposed clustering algorithms used in EM-GMM modelling for sp01 speech sentence (Clean)
M. Kalamani et al.
4
5
6
75.8 78.5
81.2 70.4
72.1
77.4
79.1
81.5
84.3
Proposed (ABC)
74.3
72.80
78.4
80.6 76.8 70.4
Recognition Accuracy (%)
3
FCM
82.6
K-Means
76.70
1254
7
8
Number of Clusters Fig. 2. Average recognition accuracy (%) versus number of clusters of the existing and proposed clustering algorithms used in EM-GMM modelling for sp01 speech sentence (5 dB Airport Noise)
In addition, proposed ABC clustering algorithm provides major improvement in Recognition Accuracy compared with existing clustering under clean and noisy environments. Hence, the ABC clustering with this optimal cluster is used for labeling the features of speech signal under the real time Continuous Speech Recognition under noise environments.
5 Conclusion The extracted feature vectors are labelled through clustering technique before modelling in CSR. For this, ABC clustering is implemented in this research work. From the simulation results, the proposed clustering provides improvement in all the performance metrics compared with existing clustering techniques under clean and noisy environments. All clustering approaches will give better performance improvement for the cluster number ‘5’ and it is considered as optimal one for further processing. From these experimental results, it is perceived that the ABC clustering with optimal cluster number as 5 yields the better improvement in recognition accuracy when compared to existing clustering algorithm for EM-GMM modelling technique under various signal environments of continuous speech recognition. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest. ✓ No humans/animals involved in this research work. ✓ We have used our own data.
Swarm Intelligence Based Feature Clustering for Continuous Speech Recognition
1255
References 1. Xu, R., Wunsch, D.C.: Clustering algorithms in biomedical research: a review. IEEE Rev. Biomed. Eng. 3, 120–154 (2010) 2. Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recogn. Lett. 31(8), 651–666 (2010) 3. Li, X.G., Yao, M.F., Huang, W.T.: Speech recognition based on k-means clustering and neural network ensembles. In: 7th IEEE International Conference on natural computation, vol. 2, pp. 614–617 (2011) 4. Chattopadhyay, S., Pratihar, D.K., Sarkar, S.C.D.: A Comparative study of fuzzy C-means algorithm and entropy-based fuzzy clustering algorithms. Comput. Inf. 30(4), 701–720 (2011) 5. Kalamani, M., Krishnamoorthi, M., Valarmathi, R.S.: Continuous tamil speech recognition technique under non stationary noisy environments. Int. J. Speech Technol. 22(1), 47–58 (2019) 6. Fister Jr., I., Yang, X.S., Fister, I., Brest, J., Fister, D.: A brief review of nature-inspired algorithms for optimization. Elektrotehniski Vestnik. 80(3), 1–7 (2013) 7. Zhang, C., Ouyang, D., Ning, J.: An artificial bee colony approach for clustering. Expert Syst. Appl. 37(7), 4761–4767 (2010) 8. Karaboga, D., Ozturk, C.: A novel clustering approach: artificial bee colony (ABC) algorithm. Appl. Soft Comput. 11(1), 652–657 (2011)
Video-Based Fire Detection by Transforming to Optimal Color Space M. Thanga Manickam(&), M. Yogesh, P. Sridhar, Senthil Kumar Thangavel, and Latha Parameswaran Department of Computer Science Engineering, AmritaVishwa Vidhyapeetham, Coimbatore, India {cb.en.u4.cse17161,cb.en.u4.cse17168, cb.en.d.cse17011}@cb.students.amrita.edu, {t_senthilkumar,p_latha}@cb.amrita.edu
Abstract. With the increase in number of fire accidents, the need for the fire detection system is growing every year. Detecting the fire at early stages can prevent both material loss and loss of human lives. Sensor based fire detection systems are commonly used for detecting the fire. But they have drawbacks like time delay and close proximity. Vision based fire detectors are cost efficient and can potentially detect fire in its early stages. Here we propose a lightweight pixel based fire detection model to extract frames from videos and identify frames with fire in it. We use matrix multiplication to transform our input frame to a new color space in which separation of fire pixels from non-fire pixels is easier. The optimal value for the matrix to be multiplied is obtained using fuzzy-cmeans clustering and particle swarm optimization. Otsu thresholding is applied on transformed image in new color space to classify the fire pixels in the frame. Our result shows high accuracy on forest fire videos with very less inference time.
1 Introduction In recent years various techniques have been used in fire detection. Sowah et al. [1] proposed a multiple sensor based fire detection method. Fuzzy logic is applied there on data received from smoke sensor, flame sensor and temperature sensor to raise fire alarm. The main drawback of these sensors is time delay [2] and high false positive rates. The distance from source of fire and sensors is also an important factor. Several pixel based fire detection methods have been proposed. Toreyin et al. [3] proposed a method based on spatial and temporal analysis. He detected the moving pixels using the difference between the frames. The fire-colored pixels are identified from moving pixels using Gaussian mixture. Then he applied temporal and spatial wavelet analysis to detect frames with fire. This method is not fast enough to identify the fire in early stages. Chen et al. [4] proposed a fire detection method based on the chromatic and dynamic features of a video. First moving regions of image sequences are identified using image differencing. Then fire and smoke pixels are extracted by chromatic features of image. Dynamic features are then used to validate the results. This is a complete rule based model and it can lead to high false positive rate. Celik et al. [5] © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 1256–1264, 2020. https://doi.org/10.1007/978-3-030-37218-7_131
Video-Based Fire Detection by Transforming to Optimal Color Space
1257
proposed a pixel based fire detection using CIELAB [6] color space. The original image in RGB color space is transformed to CIELAB color space. Then heuristic rules are applied to detect fire. This method works well only when the fire grows gradually with time. This might not be the case with sudden explosions. Demirel et al. [7] proposed a fire pixel classification method using statistical color model and fuzzy logic. The RGB image is transformed to YCbCr color space. The set of fuzzy rules is applied in YCbCr color space to detect fire in the image. This method may not work well when the intensity of the fire changes. We are proposing an efficient method that can detect fire at early stages. We also aim to ensure the proposed model fits into memory to deploy it in real time. This led us in choosing color space transformation followed by thresholding, as it requires only a 3 3-matrix multiplication, which is computationally less expensive and has less inference time. To measure the optimality of the matrix we use Fuzzy C Means and the matrix is optimized through particle swarm optimization.
2 Methodology The speed of the model is very important in fire detection as early identification of fire can prevent the material and life loss. Hence, we propose a lightweight model that can classify the fire pixels in a faster way. We extracted frames from the videos and used it for training the model. We took frames with fire and non-fire as training examples. We extracted 25 25 pixels from the positive and negative images and flattened them into a matrix with three columns containing RGB values of the fire and non-fire pixels. The pixel values now are in RGB color space. It is the common color space used in our daily life. There are other color spaces like YIQ, HSV [8], grey-scale etc. They can be converted by performing matrix multiplication with the original pixel values. Let M be a 3 3 matrix with which the original pixel values are multiplied. Then the color space transformation is obtained by: New color space pixel value = original pixel value * M, where * is matrix multiplication. Our aim is to find the optimal value for M such that fire pixels and non-fire pixels are separable in new color space. We started with a random 3 3 matrix M. We project the original image to the new color space using the matrix M. Fuzzy C Means (FCM) technique is used to measure how separable fire and non-fire pixels are in new color space. FCM is applied to cluster the image into fire pixels and non-fire pixels. The error in this step is calculated as the total number of wrongly classified pixels. Particle Swarm Optimization (PSO) is used to obtain the matrix M such that error in FCM technique is minimal. After obtaining the best M value that gives the minimal cost, test image frame is matrix multiplied with optimal M. During prediction time, Otsu’s thresholding takes advantage of the fact that fire pixels are more separable from non-fire pixels in the new optimal colorspace. Otsu’s thresholding is more suitable here as it works well on clustered pixels (Figs. 1 and 2).
1258
M. Thanga Manickam et al.
Fig. 1. Proposed system
2.1
Fuzzy C Mean Clustering (FCM)
Fuzzy C mean clustering is a soft clustering method [9]. It is used to assign each data point to one of the clusters. Each data point is assigned membership degrees to belong in each cluster. The data points on boundaries between two classes are not fully assigned to a single class through this [10]. The data points which are closer to centroid of a particular class have higher membership degree for that class. Hence it handles the outliers effectively. Here FCM is used to cluster fire pixels from non-fire pixels in new color space. Algorithm for FCM: Input: Fire image converted to new color spaces. Here C=2. 1. Let x1, x2, x3…xn be the pixel values to be clustered. 2. Initialize C random clusters cj and membership degree wij for each data point xi to belong in cluster j. 3. Find cost to be minimized through following equation. 4. Update centroids and membership degrees through the following equations.
Here m is the fuzziness index. Repeat step4 till the cost function converges. Output: Pixels clustered into fire and non-fire. We have used skfuzzy toolbox [11] to apply fuzzy c means. We used parameters as m = 2 and random seed=1.
2.2
Particle Swarm Optimization (PSO)
We have used particle swarm optimization to find the optimal value for M such that cost of the algorithm is minimal. Particle swarm Optimization is a useful optimizing technique [12] in which gradient of the cost function to be optimized cannot be
Video-Based Fire Detection by Transforming to Optimal Color Space
1259
Fig. 2. Workflow
determined. It is initialized with random solutions called particles and in each of the iteration, the particles move towards the best solution with a velocity v [13]. We are trying to minimize the cost function for a matrix M given by the following expression. The cost function of the algorithm for the matrix M is given by: Cost (M) = no of actual fire pixels incorrectly classified as non fire + no of non-fire pixels incorrectly classified as fire. Algorithm for PSO Input: Images, ground truth with the velocity . 1. Initialize each particle 2. Find cost function for each of the particles in the swarm. Maintain the local best solution for each particle in the swarm and global best solution of all the particles in the swarm. 3. Velocity of each particle is updated by the following equation, , where c1 and c2 are random numbers and w is velocity weight. At the end of each of the iteration global best and local best solutions are updated. 4. The step3 is repeated until the cost function converges. The global solution at the end provides the optimal matrix M. Output: Optimal matrix M We have used w =2, random seed = 1 while applying particle swarm optimization. The optimal solution obtained by the PSO is used to transform the input image to new color space.
1260
2.3
M. Thanga Manickam et al.
Otsu Thresholding
Otsu is a thresholding technique to binarize an image [14]. Otsu thresholding works well on clustered values [15]. This thresholding assumes the image to be bimodal (foreground pixels and background pixels). It then calculates the optimal threshold value using the minimal within-class variance. The within-variance is calculated using the following formula. r2x ðtÞ ¼ x0 ðtÞr20 ðtÞ þ x1 ðtÞr21 ðtÞ, where weights x0;1 ðtÞ are the probabilities of the two classes separated by a threshold t and r20;1 ðtÞ are variances of these two classes. The class probabilities x0;1 ðtÞ are calculated from L histogram. x0 ðtÞ ¼
Xt1
x1 ðtÞ ¼
i¼0 L1 X
pðiÞ
pðiÞ
i¼t
Variance r20 ðtÞ and r21 ðtÞ are calculated using class mean µ, l0 ð t Þ ¼ l 1 ðt Þ ¼
Xt1 i¼0
i
pðiÞ x0
XL1i pðiÞ i¼t x1
Then we find the minimal value from the with-in class variances of different classes of the image. The class, which has minimal with-in class variances, is the threshold value (t) to be used for Otsu thresholding. We apply Otsu’s thresholding to find threshold value for each of the three channels in new color space. We have used skimage library [16] to apply Otsu thresholding. A pixel is marked as fire pixel only if it is greater than threshold in each channel. By this way, we achieve a faster way to classify the fire pixels. After training on 150 images we got the optimal value for M as: 0:5905 1:2407 0:483
1:7695 2:803 2:719
2:2408 1:0292 2:5853
3 Dataset We used Cair’s [17] dataset of fire images for training. It consists of a variety of scenarios and different fire situations. We extracted 150 patches of 25 25 pixels from this dataset for the training process. We used Visifire’s fire dataset [18] for testing our model. It consist videos of forest fire. Our algorithm works well on images with dark background. To verify this we used this dataset. To test our method on open environment we used fire dataset collected by smart space lab in our institution. It had 15300 frames with fire and 6300 non-fire frames. We tested this model on 480 non-fire
Video-Based Fire Detection by Transforming to Optimal Color Space
1261
images to evaluate the performance of the method in reducing false positives. This dataset consisted of different images in it under different lighting conditions. It had many fire-like images in it.
4 Result Our results on 5 video clips from Visifire dataset showed high accuracy in detecting frames with fire. This shows the algorithm works well in forest fire datasets. Here are our results on this dataset: Video (1) Forest1 (2) Forest2 (3) Forestfire1 (4) Forest4 (5) Forest5
Resolution 400 256 400 256 400 256 400 256 400 256
True positives False positives Accuracy 200 0 1 245 0 1 213 0 0.977 219 0 1 216 0 1
To test the method’s performance in bright background we used smart space lab dataset. Results showed high accuracy with less false positive rate on this dataset. Here are our results on this dataset. Video Resolution True positive rate False positive rate Accuracy Smart space lab 640 480 0.943 0.061 0.941
Our method showed good accuracy in identifying the non-fire images. It predicted the fire-like objects like glowing lamp as non-fire with less number of false positives. This reduces the false alarms significantly. Total number of Images False positive rate Accuracy 480 0.085 0.9145
The above results show the model can detect the fire efficiently with low false positive rate. Here are the examples of our results (Fig. 3):
1262
M. Thanga Manickam et al.
Fig. 3. (a)-Forest1, (b)-Forest2, (c) – Smart space lab, (d) - Forestfire1. The images at the left shows original images and binary images at the right shows the output of our model. The following results in forest fire images shows that our model is capable of detecting the forest fire accurately [11]
Results on Non-fire Images: Figure 4(a)–(d) shows our sample results on the non fire images dataset prepared by us. This shows that our model can potentially avoid false alarms.
Fig. 4. Sample results on non-fire images. The images at the left shows original images and binary images at the right shows the output of our model. The output shows that fire-like images are correctly classified as non-fire
For inferring the decision on single image, our method took 0.15 s. This show our method can detect the fire at earlier stages with high speed. The above results are obtained using python version 3.6,numpy version 1.15.4 [19] and opencv version 3.4.1 [20].
5 Conclusion We have proposed here a fast and accurate method to detect fire. We projected the Input image from RGB color space to optimal color space using linear transformation with matrix M. We used Fuzzy C means clustering and Particle Swarm optimization to
Video-Based Fire Detection by Transforming to Optimal Color Space
1263
find the optimal value of M such that the Fire pixels are separable from non-fire pixels in transformed color space. Then we applied Otsu’s thresholding to detect the fire pixels in the image. Experimental results showed that our method is capable of detecting forest fires efficiently with less false positives rate. Acknowledgment. This proposed work is a part of the project supported by DST (DST/TWF Division/AFW for EM/C/2017/121) titled A framework for event modelling and detection for Smart Buildings using Vision Systems to Amrita Vishwa Vidyapeetham, Coimbatore. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Sowah, A., Ofoli, R., Fiawoo, K.: Hardware design and web-based communication modules of a real-time multisensor fire detection. IEEE Trans. Ind. Appl. 53(1), 559 (2017) 2. Paresh, P.A., Parameswaran, L.: Vision-based algorithm for fire detection in smart buildings. In: Lecture Notes in Computational Vision and Biomechanics, vol. 30. Springer, Netherlands, pp. 1029–1038 (2019) 3. Töreyin, B., Dedeoglu, Y., Gudukbay, U., Cetin, A.: Computer vision based method for realtime fire and flame detection. Pattern Recogn. Lett. 27, 49–58. http://doi.org/10.1016/j. patrec.2005.06.015 4. Chen, T., Wu, P., Chiou, Y.: An early fire-detection method based on image processing. In: Proceedings - International Conference on Image Processing, ICIP, vol. 3, pp. 1707–1710 https://doi.org/http://doi.org/10.1109/ICIP.2004.1421401 5. Celik, T.: Fast and efficient method for fire detection using image processing. ETRI J. 32. https://doi.org/http://doi.org/10.4218/etrij.10.0109.0695 6. Buckley, R.R., Giorgianni, E.J.: CIELAB for Color Image Encoding. Encyclopedia of Color Science and Technology. Springer, New York (2016) 7. Celik, T., Ozkaramanlt, H., Demirel, H.: Fire pixel classification using fuzzy logic and statistical color model. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP 2007, Honolulu, pp. I-1205–I-1208 (2007) 8. Raval, K., Shah, R.S.K.: Color image segmentation using FCM clustering technique in RGB, L*a*b, HSV, YIQ Color spaces. Eur. J. Adv. Eng. Technol. 4(3), 200 (2017). www. ejaet.com 9. Yuhui, Z., Byeungwoo, J., Danhua, X., Jonathan Wu, Q.M., Hui, Z.: Image segmentation by generalized hierarchical fuzzy C-means algorithm. J. Intell. Fuzzy Syst. 28, 961–973 (2015) 10. Senthil Kumar, T., Suresh, A., Kiron, K., Chinnaswamy, P.: Survey on predictive medical data analysis. J. Eng. Res. Technol. 3, 2283–2286 (2014) 11. Warner, J.D.: Scikit-Fuzzy, Zenodo. http://doi.org/10.5281/zenodo.1002946 12. Marini, F., Walczak, B.: Particle swarm optimization (PSO), a tutorial. Chemometrics Intell. Lab. Syst. 149, 153 (2015) 13. Aerospace Design Laboratory Stanford, Gradient-Free Optimization, Aerospace Design Laboratory Stanford, April 2012
1264
M. Thanga Manickam et al.
14. Kalavathi, P.: Brain tissue segmentation in MR brain images using multiple Otsu’s thresholding technique. In: 2013 8th International Conference on Computer Science & Education, Colombo, pp. 639–642 (2013) 15. Bangare, S.L., Dubal, A., Bangare, P.S., Patil, S.T.: Reviewing Otsu’s method for image thresholding. Int. J. Appl. Eng. Res. 10(9) (2015) 16. van der Walt, S., Schönberger, J.L., Warner, J.D.: Scikit-image: image processing in Python. https://doi.org/10.7717/peerj.453 17. Cair, Fire-detection-image-dataset, Github. https://github.com/cair/Fire-Detection-ImageDataset 18. Bilkent SPG, Visifire Fire dataset. http://signal.ee.bilkent.edu.tr/VisiFire/Demo/FireClips/ 19. van der Walt, S., Chris Colbert, S., Varoquaux, G.: The NumPy array: a structure for efficient numerical computation. Comput. Sci. Eng. 13, 22–30 (2011) 20. Bradski, G.: The OpenCv_library. Dr. Dobb’s J. Softw. Tools, 2236121 (2008)
Image Context Based Similarity Retrieval System Arpana D. Mahajan(&) and Sanjay Chaudhary Madhav University, Pindwara, Rajasthan, India [email protected], [email protected]
Abstract. The rapid development in multimedia and imaging technology, the numbers of images uploaded and shared on the internet have increased. It leads to develop the highly effective image retrieval system to satisfy the human needs. The content-text image retrieval (CTIR) system which retrieves the image based on the high level features such as tags which are not sufficient to describe the user’s low level perception for images. Therefore reducing this semantic gap problem of image retrieval is challenging task. Some of the most important notions in image retrieval are keywords, terms or concepts. Terms are used by humans to describe their information need and it also used by system as a way to represent images. Here in this paper different types of features their advantage and disadvantages are described. Keywords: Context retrieval Flickr CISS
High level features Distances WordNet
1 Introduction In this section to measure the semantic similarity between tags associated with images various semantic similarity measures are described such as WordNet Distance, Flickr distance, and Context based image semantic similarity based distance and Fuzzy string matching approach. Above all methods are called as high level features. High level features are not extracted from mathematical expression its require context data. At last the comparison between these methods is given using advantages and disadvantages of all the methods [1, 2]. Tags of the images in query class are required to extract from its respective annotation files. Tags of the images of query class is extracted as these tags are used by context based similarity techniques to measure the similarity among tags of the query pictures and tags of the pictures which are in query’s class (Fig. 1).
© Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 1265–1272, 2020. https://doi.org/10.1007/978-3-030-37218-7_132
1266
A. D. Mahajan and S. Chaudhary
Fig. 1. Tag system
2 Related Works 2.1
Word-Net Distance (WD)
WordNet Distance is similarity measure between two keywords or concepts which are available in WordNet database. WordNet Database [6] is developed by cognitive science laboratory of Princeton University is a large semantic lexicon for English language. WordNet is a lexical database for the English language. It is an online lexical database designed for use under program control. English nouns, verbs, adjective and adverbs are organized into sets of synonyms that are in turn linked through semantic relations that determine word definitions [6]. It groups English words into sets of synonyms which provides short definitions and usage examples, and records a number of relations among these synonym sets or their members. WordNet is a combination of dictionary and thesaurus. WordNet “includes the following semantic relations: Synonymy, Antonym, Hyponymy, Meronym, Troponymy, and Entailment. WordNet lists the alternatives from of words in different context from which choices must be made. Semantic Similarity between two concepts C1 and C2 lexicalized in WordNet is given by following equation. It is known as WordNet Distance [11]. simðc1 ; c2 Þ ¼
2 log P(C0 Þ log PðC1 Þ þ log PðC2 Þ
ð1:1Þ
Where, C0 is most specific class that subsumes both C1 and C2. C0 is the most specific concept which is an ancestor of both C1 and C2 [7]. Using WD between keywords or tags of two images, we can measure the semantic relatedness between two images for retrieving the similar images in image retrieval system. 2.2
Flickr-Distance (FD)
Flickr distance is novel measurement of the relationship between different concepts in visual domain [4]. In this method first of all for each concept, a collection of images are obtained from Flickr, based on which the latent topic visual language model is built to capture the visual properties of the concept then FD between two concepts is measured
Image Context Based Similarity Retrieval System
1267
by the square root of Jensen-Shannon (JS) divergence between the corresponding visual language model of concepts [4, 5]. Flickr distance is based on the visual information of images. FD can easily scale up with increasing of concepts on the web. FD is also more consisting to human cognition therefore gives better measurements. FD measures four types of conceptual correlations between given concepts: synonymy, similarity, metonymy and concurrence. The examples of these conceptual correlations are shown in Figure. Synonymy denotes the same semantic” concepts with different names, like “table-tennis” and “ping-pong”. Similarity denotes that two concepts are visually similar, like “table” and “desk”. Meronym denotes one concept is part of other concept, i.e., “building” and “window”. Concurrence denotes situation when two concepts appear simultaneously in daily life, like “desk” and “chair” [5] (Fig. 2).
Fig. 2. Different four mutual intangible associations [7]
Following figure shows the illustration of Flickr distance calculation. Although FD is effective there are also some limitations of it. First limitation is, we do not know which topic the user is considering for each concept. Second limitation is that FD’s accuracy mainly depends on the quality of web images. If there are limited numbers of images related to one concept, then the concept may not be well captured and FD is not estimated right [12]. 2.3
Context-Based Image Semantic Similarity (CISS)
CISS “is a scheme for discovering and evaluating image similarity in terms of the associated group of concepts [8, 9]. The main idea in this method is to extract the text based group of concept which characterizes the image. In CISS all the concepts related to image Ii is compared with all the concepts related to image Ij then appropriate scheme Mutual Confidence (CM) is used to combine the elementary distances D of the elements in the first and second group of concepts of images. Let us we have two images image Ii and Ij respectively as shown in Figure. Ii and Ij are pair of images to be compared. Ti1, Ti2… Tim is a group of concepts related to image Ii, while Tj1, Tj2… Tin are group of concepts related to Ij. Then D Iij, the context based image semantic similarity of image Ii and Ij is defined as” [10].
1268
A. D. Mahajan and S. Chaudhary
DIij ¼ AG2 fAG1 ½DEL ðdTim ! jnÞ; AG1 ½DEL ðdTjn ! imÞ
ð1:2Þ
Where DEL = MAX, is the most accurate group combination schema and d = Mutual Confidence (CM) is used as elementary metric. And AG1, AG2 = average metric. CISS method, for measuring the similarity considers that the co-occurrence between tags takes place when both the tags are used with the nearly similar resources. It gives more effective and accurate similarity measurements than similarity methods based on image low level features. Also required very less computation cost. It gives more effective and accurate result similarity measurements than similarity based on image low level features both in terms of deep concept similarity as well as in terms of computational cost [13]. 2.4
Fuzzy String Matching
Popular Fuzzy String matching, the pattern in words are matched approximately rather than exactly for measuring the similarity between words. It overcomes the drawback of hit/miss type search for text based image retrieval system [9]. Levenshtein approach is used for fuzzy sting matching in which string matching is done with fuzzy string matching with edit distance. First of all number of alters or edits such as insertions and deletions are calculated to transform one string into the other string. The distance of matching is measured in terms of number of basic operations are required to convert one string to its exact match. This number is called as edit distance [9]. The number of edits obtained from fuzzy string matching is converted to percentage of matching using following equation. StDist ¼ 1:0ðLDistðK1; K2ÞÞ=maxðK1; K2Þ
ð1:3Þ
Where LDist (K1, K2) = edit distance between K1 and K2 and K1 = length of string 1, K2 = length of string 2. 2.5
Proposed Distance Method
Fig. 3. Proposed distance method
Image Context Based Similarity Retrieval System
1269
We have applied context base similarity techniques between these high level features of identified query class and the query tags given by the user. Here from the .mat file which containing whole dataset annotations are used but only the tags which are of the query class are used. CISS and WordNet distance methods are applied between query tags and tags of the query class using group similarity algorithm given in [3]. Following Figure shows the concept of group similarity algorithm. In which apiece label of unique image is compared with all the tags of second image and semantic distance is calculated (Fig. 3).
3 Proposed Context Retrieval Method Tags of the images in query class are required to extract from its respective annotation files. Tags of the images of query class is extracted as these tags are used by context based similarity techniques to measure the similarity between tags of the query image and tags of the images which are in query’s class. After extraction of tags, semantic similarity between tags of query image and tags of images in query class are calculated. For finding out the semantic similarity CISS and WordNet distance have decided to use in our system. Here CISS only calculate the co- occurrence of tags so for calculating other relationships such as hypernym, synonym etc. WordNet distance is used (Fig. 4).
Fig. 4. Proposed block diagram
The most relevant images from database are retrieved using considering smallest semantic similarity measure. Our proposed work will retrieve the image using image Tags high level features so resultant images are very accurate to fulfill the user’s perception. So our proposed work will effectively bridge the semantic gap between human and machines in image retrieval tasks.
1270
A. D. Mahajan and S. Chaudhary
Here we have apply context base similarity methods on the images of the query class only so our system required less calculation since we have not decided to apply context similarity methods on whole image database. Therefore our approach can also use for large database efficiently.
4 Results and Discussion Using classification we have identified the class of the query image now next step is high level features extraction of query image. Then high level features of images in the identified query class are also extracted. Following Figures shows the examples of high level feature extraction of query images (Fig. 5).
Fig. 5. GUI proposed system [12]
Image Context Based Similarity Retrieval System
1271
For calculating WordNet Distance we have installed maximum current gaps form of Word-Net i.e. Word-Net 2.1. For calculating this semantic distance we have used Java WordNet Similarity, it is open source project based on java and Hybrid Approach for Context based Image Retrieval (Table 1 and Figs. 6, 7). Table 1. Analysis Categories Cost Forest Highway Inside city Mountain Open country Street Tall building Symbols Average
PrecisionN=10 0.82 0.94 0.82 0.80 0.86 0.88 0.82 0.98 1.0 0.867
RecallN=10 0.12 0.11 0.13 0.13 0.14 0.12 0.14 0.19 1.0 0.227
Fig. 6. Precision graph
0.8 Recall
0.6 0.4 0.2 0 10
30 No. of Retrievd Images Fig. 7. Recall graph
50
1272
A. D. Mahajan and S. Chaudhary
5 Conclusion The aim of this paper is to discuss all the phases of automatic image retrieval features. This research provide, a widespread study for feature extraction methods, classification methods and comparison measurement distances. Automatic image retrieval is a process for perceive scene feature after a photo and apportion to the corresponding classes. The feature extraction, classification and similarity measurement are used based on the size of the dataset. To overcome this issue and provide accurate retrieval of images we have proposed context image retrieval system which is based on image high levels of features. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest. ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Cao, X., Wang, S.: Research about image mining technique. In: ICCIP, pp. 127–134. Springer, Heidelberg (2012) 2. Franzoni, V., Milani, A., Pallottelli, S., Leung, C.H.C., Li, Y.: Context-based image semantic similarity. In: Proceedings of IEEE Twelfth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), pp. 1280–1284 (2015) 3. Franzoni, V., Leung, C.H.C., Li, Y., Mengoni, P., Milani, A.: Set similarity measures for images based on collective knowledge. In: ICCSA, pp. 408–417. Springer, Heidelberg (2015) 4. Zarchi, M.S., Monadjemi, A., Jamshidi, K.: A concept-based model for image retrieval systems. Comput. Electr. Eng. (2015). https://doi.org/10.1016/j.compeleceng.2015.06.018 5. Goel, N., Sehgal, P.: Weighted semantic fusion of text and content for image retrieval. In: Proceedings of IEEE International Conference Advances in Computing, Communications and Informatics (ICACCI), pp. 681–687 (2013) 6. Li, X., Uricchio, T., Ballan, L., Bertini, M., Snoek, C., Del Bimbo, A.: Socializing the semantic gap: a comparative survey on image tag assignment, refinement and retrieval. ACM Comput. Surv. 49(1), 14 (2016) 7. Wu, L., Hua, X., Yu, N., Ma, W., Li, S.: Flickr distance: a relationship measure for visual concepts. IEEE Trans. Pattern Anal. Mach. Intell. 34(5), 863–875 (2012) 8. Wu, L., Hua, X.-S., Yu, N., Ma, W.-Y., Li, S.: Flickr distance. In: Proceedings of 16th ACM International Conference on Multimedia, pp. 31–40 (2008) 9. Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995) 10. Budanitsky, A., Hirst, G.: Semantic distance in Wordnet: an experimental, applicationoriented evaluation of five measures. In: Proceedings of WordNet and Other Lexical Resources (2001) 11. Wu, L., Jin, R., Jain, A.K.: Tag completion for image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 35(3), 716–727 (2013) 12. Chen, L., Xu, D., Tsang, I.W., Luo, J.: Tag-based image retrieval improved by augmented features and group-based refinement. IEEE Trans. Multimedia 14(4), 1057–1067 (2012) 13. Lu, D., Liu, X., Qian, X.: Tag-based image search by social re-ranking. IEEE Trans. Multimedia 18(8), 1628–1639 (2016)
Investigation of Non-invasive Hemoglobin Estimation Using Photoplethysmograph Signal and Machine Learning M. Lakshmi1(&)
, S. Bhavani1, and P. Manimegalai2
1
2
Department of ECE, Karpagam Academy of Higher Education, Coimbatore, India [email protected], [email protected] Department of BME, Karunya Institute of Technology and Sciences, Coimbatore, India [email protected]
Abstract. Non-invasive haemoglobin (SpHb) estimation using Photoplethysmograph signal has gained enormous attention among researches and can provide earlier diagnosis to polycythemia, anaemia, various cardiovascular diseases etc. The primary goal of this study investigation is to evaluate the efficiency of SpHb monitoring using PPG and generalized linear regression technique. PPG signal was acquired from outpatients and SpHb was calculated from the characteristic features of PPG. Haemoglobin value obtained through venous blood sample was compared with SpHb. The absolute mean difference between the SpHb and Hbref was 0.735 g/dL (SD 0.41). For statistical analysis of correlation between SpHb and Hblab, IBM SPSS statistics software was used. Bland-altman analysis, T-test, linear regression analysis was further used for finding the agreeability limits. Keywords: Photoplethysmography PPG Machine learning Linear regression
Non-invasive Haemoglobin
1 Introduction Haemoglobin (Hb) is a complex protein molecule in red blood cells (RBC) that transports oxygen from lungs to the rest of the body. Haemoglobin measurement is one of the most frequently performed laboratory tests. Haemoglobin test is performed during usual physical examination or when there is a sign of RBC disorder person such as anaemia or polycythaemia [1, 2]. Haemoglobin test is one of the mandatory steps to make decisions during blood transfusions. Haemoglobin measurement is generally performed by the traditional “fingerstick method” i.e., by invasively drawing blood from the body. Although the conventional laboratory measurement is accurate, it has its own limitations such as time delay, inconvenience of the patient, exposure to biohazards and the lack of real time monitoring in critical situations. The above said limitations can be overcome by Non-invasive haemoglobin (SpHb) monitoring. © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 1273–1282, 2020. https://doi.org/10.1007/978-3-030-37218-7_133
1274
M. Lakshmi et al.
SpHb monitoring has gained enormous attention as a point of care testing that allows the ability to monitor haemoglobin concentration in a continuous, accurate, and non-invasive fashion. Various technologies and methods are employed by researches all over the globe to develop a system/device for SpHb monitoring [4–12]. Among various methods adopted, measurement of SpHb from the statistical features of photoplethysmograph signal and adopted machine learning technique has shown an excellent correlation with the Haemoglobin (Hb) measured using invasive method [3]. The study [3] was carried out on 33 healthy subjects using various machine learning techniques. The main goal of the conducted investigation is to evaluate the said efficiency of calculating SpHb among the outpatients of a hospital using photoplethysmograph (PPG) and regressive machine learning technique.
2 Methodology 2.1
PPG Signal Collection
The subject database acquisition was done in Sree Abirami Hospitals, Coimbatore after obtaining the ethical clearance for collecting the PPG signals of outpatients. A formal consent was obtained from the subjects before enrolment. Subjects aged between 18 and 55 years were enrolled. The subject data was stored in a spreadsheet, Microsoft Excel. The IR Plethysmograph transducer and Labchart software (version 7) of ADInstruments were used for signal acquisition.
Fig. 1. Study flow
The sensor was placed in the forefinger of left arm of the subjects. While the venous blood sample was collected by the trained professionals for calculating Hblab, corresponding PPG signal was acquired for a 15 period sample. The study flow of the work is presented in Fig. 1. In total 155 subjects were approached. Out of which 127 subjects were enrolled for the study with prior consent received. It included subjects aged between 21 to 50 with a mean (m) age of 36 and standard deviation (SD) of 7.8. The remaining 28 subjects were excluded due to lack of co-operation and inability to receive proper PPG signal.
Investigation of Non-invasive Hemoglobin Estimation
2.2
1275
SpHb Calculation
The original PPG signal acquired using Labchart was saved along with a spreadsheet containing subject details. Along with original signal, derivatives of the signal were also recorded in the Labchart software, which were exported to MATLAB for subsequent signal processing. Although the signals obtained using Powerlab kit are mostly clean i.e., without baseline wandering and added noise, denoising using wavelets was performed to eliminate the insignificant noises. Seven time domain features as described in [3] were extracted from the PPG signal and its derivatives. The selected features were trained using curve fitting tool in MATLAB. The selected seven features of PPG signal were given as input to the regression model with Hblab as target output. Linear model type as used in [3] was encorporated. The SpHb values predicted using the linear regression model and the Hblab were stored for further analysis. For the statistical analysis, Bland-altman plot, Regression analysis, t-test, dispersion plots, Pearson product moment correlation coefficient were done on IBM SPSS statistics package. Figure 2 depicts the flow diagram of the work performed in the study.
First and Second Derivatives
Hblab
target
PPG Data
Features input
ML algorithm training
SpHb prediction
Fig. 2. Acquisition of principal components
3 Results and Discussion Total of 127 outpatient volunteers were enrolled which includes 76 female subjects and 51 male subjects. Mean age of female subjects was 34 ranging from 21 to 48 with standard deviation (SD) of 8.9. Mean age of male subjects was 37 ranging from 23 to 50 with standard deviation (SD) of 7.7. The mean Hblab recorded was 14.09 g/dL ranging from 10.1 to 16.6 g/dL with SD of 1.75. The mean SpHb recorded was and 13.81 g/dL ranging from 10.1 to 16.4 g/dL with SD of 1.55. The Hblab recorded and the predicted SpHb values are given in Table 1. Table 1. Hblab recorded and predicted SpHb Subject 1 2 3 4 5
Hblab (g/dL) SpHb (g/dL) Hbbias (g/dL) Absolute error (g/dL) Squared error 12 12.4 −0.4 0.4 0.16 12.8 13.5 −0.7 0.7 0.49 11.8 11.3 0.5 0.5 0.25 13.8 12.2 1.6 1.6 2.56 10.2 11 −0.8 0.8 0.64 (continued)
1276
M. Lakshmi et al. Table 1. (continued)
Subject 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
Hblab (g/dL) SpHb (g/dL) Hbbias (g/dL) Absolute error (g/dL) Squared error 12.1 11.3 0.8 0.8 0.64 12.2 13 −0.8 0.8 0.64 11.4 12.4 −1 1 1 13.9 12 1.9 1.9 3.61 12.8 11.3 1.5 1.5 2.25 12 11 1 1 1 14.1 12 2.1 2.1 4.41 13 12.5 0.5 0.5 0.25 11 12.3 −1.3 1.3 1.69 11.1 11.5 −0.4 0.4 0.16 10.1 10.3 −0.2 0.2 0.04 13.1 13.7 −0.6 0.6 0.36 11.7 11.2 0.5 0.5 0.25 12 13 −1 1 1 13.7 13.3 0.4 0.4 0.16 12 12.2 −0.2 0.2 0.04 10.7 13 −2.3 2.3 5.29 10.6 10.2 0.4 0.4 0.16 10.2 11 −0.8 0.8 0.64 11.9 11.3 0.6 0.6 0.36 11.1 10.8 0.3 0.3 0.09 13.7 13.4 0.3 0.3 0.09 13.5 14 −0.5 0.5 0.25 12.7 13.1 −0.4 0.4 0.16 13.1 14 −0.9 0.9 0.81 10.4 10.1 0.3 0.3 0.09 12.2 13 −0.8 0.8 0.64 12.9 11 1.9 1.9 3.61 12.6 12 0.6 0.6 0.36 13.3 13 0.3 0.3 0.09 11.8 11.4 0.4 0.4 0.16 11.8 11.3 0.5 0.5 0.25 12.8 12.5 0.3 0.3 0.09 11.2 12 −0.8 0.8 0.64 12.1 11.8 0.3 0.3 0.09 11 11.3 −0.3 0.3 0.09 13.9 14.3 −0.4 0.4 0.16 12.3 11.9 0.4 0.4 0.16 11.5 11.9 −0.4 0.4 0.16 11.7 11.3 0.4 0.4 0.16 13.1 14 −0.9 0.9 0.81 (continued)
Investigation of Non-invasive Hemoglobin Estimation
1277
Table 1. (continued) Subject 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86
Hblab (g/dL) SpHb (g/dL) Hbbias (g/dL) Absolute error (g/dL) Squared error 12.6 13 −0.4 0.4 0.16 11.8 14 −2.2 2.2 4.84 14.1 14.3 −0.2 0.2 0.04 11.6 11.3 0.3 0.3 0.09 10.5 10.1 0.4 0.4 0.16 10.5 11.4 −0.9 0.9 0.81 10.5 10.4 0.1 0.1 0.01 10.9 10.4 0.5 0.5 0.25 13.2 12 1.2 1.2 1.44 14.7 15 −0.3 0.3 0.09 14.3 14.1 0.2 0.2 0.04 11.4 11 0.4 0.4 0.16 12.1 13 −0.9 0.9 0.81 12.7 11.9 0.8 0.8 0.64 14.9 14.8 0.1 0.1 0.01 15.5 14 1.5 1.5 2.25 14.3 15 −0.7 0.7 0.49 13.3 14 −0.7 0.7 0.49 10.7 10.4 0.3 0.3 0.09 12.6 12 0.6 0.6 0.36 13.1 14 −0.9 0.9 0.81 12.4 13.2 −0.8 0.8 0.64 11.6 12 −0.4 0.4 0.16 10.8 11.3 −0.5 0.5 0.25 13.4 12 1.4 1.4 1.96 13.3 13 0.3 0.3 0.09 10.2 11 −0.8 0.8 0.64 10.1 12 −1.9 1.9 3.61 14.2 13 1.2 1.2 1.44 12.4 11.3 1.1 1.1 1.21 14.1 13 1.1 1.1 1.21 14 13.2 0.8 0.8 0.64 10.9 11.7 −0.8 0.8 0.64 14.9 14 0.9 0.9 0.81 14.1 13.8 0.3 0.3 0.09 16.2 15.5 0.7 0.7 0.49 15.8 14 1.8 1.8 3.24 16.1 15.4 0.7 0.7 0.49 15.9 15.1 0.8 0.8 0.64 14.5 15 −0.5 0.5 0.25 (continued)
1278
M. Lakshmi et al. Table 1. (continued)
Subject 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
Hblab (g/dL) SpHb (g/dL) Hbbias (g/dL) Absolute error (g/dL) Squared error 15.3 14.6 0.7 0.7 0.49 13.3 13 0.3 0.3 0.09 15.4 15.9 −0.5 0.5 0.25 14.7 14.2 0.5 0.5 0.25 15.1 16 −0.9 0.9 0.81 14.2 14.5 −0.3 0.3 0.09 14.7 14 0.7 0.7 0.49 15.2 15.6 −0.4 0.4 0.16 15 14.6 0.4 0.4 0.16 15.6 16.3 −0.7 0.7 0.49 14.3 14 0.3 0.3 0.09 15.8 14.8 1 1 1 14 13.5 0.5 0.5 0.25 13.6 12.3 1.3 1.3 1.69 15.6 14.6 1 1 1 14.8 14.4 0.4 0.4 0.16 15.3 14 1.3 1.3 1.69 14.7 14 0.7 0.7 0.49 15 14.9 0.1 0.1 0.01 15.8 15.1 0.7 0.7 0.49 16.4 16 0.4 0.4 0.16 15.1 15.6 −0.5 0.5 0.25 16.3 15.8 0.5 0.5 0.25 16.5 16 0.5 0.5 0.25 16 16.4 −0.4 0.4 0.16 11 12.6 −1.6 1.6 2.56 13.7 13.9 −0.2 0.2 0.04 15.1 14.6 0.5 0.5 0.25 15 14 1 1 1 15.3 13.3 2 2 4 15.9 14.4 1.5 1.5 2.25 12.6 13.5 −0.9 0.9 0.81 13.1 12.2 0.9 0.9 0.81 16.6 16 0.6 0.6 0.36 15.9 15 0.9 0.9 0.81 13.1 13.7 −0.6 0.6 0.36 15.7 16 −0.3 0.3 0.09 14.2 14.1 0.1 0.1 0.01 14.9 14.8 0.1 0.1 0.01 15.6 14 1.6 1.6 2.56 16.3 15 1.3 1.3 1.69
Investigation of Non-invasive Hemoglobin Estimation
1279
For evaluating the performance, Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE) were used as formulated below. In the following equations, Bj denotes the Hblab value and Bjʹ denotes the predicted SpHb value. MAE ¼
MSE ¼
n 1X jBj B0j j n j¼1
n 1X ðBj B0j Þ2 n j¼1
RMSE ¼
pffiffiffiffiffiffiffiffiffiffi MSE
MAE, MSE and RMSE value of 0.737 g/dL, 0.738 g/dL and 0.86 g/dL were obtained between Hblab and SpHb. Coefficient of determination-R2 value of 0.768 was obtained along with Pearson coefficient of 0.876 in SPSS. Dispersion diagram with R2 is plotted in Fig. 3. The low values of MAE, MSE and a good R2, Pearson coefficient shows good correlation between SpHb and Hblab. Performance evaluation of SpHb is shown in Table 2.
Table 2. Performance evaluation Performance error MAE MSE RMSE R2 Pearson coefficient
n = 127 Female subjects (n = 76) Male subjects (n = 51) 0.737 (SD 0.44) 0.733 (SD 0.51) 0.735 (SD 0.44) 0.738 (SD 0.85) 0.803 (SD 1.17) 0.731 (SD 0.866) 0.859 0.896 0.86 0.768 0.564 0.634 0.876 0.751 0.796
To determine if there is a significant difference between the two measurements, one sample t-test is performed considering null hypothesis which shows the confidence interval which is tabulated in Table 3.
Table 3. One sample test Test value = 0 t-static
Hbbias
2.15
Degrees of freedom
Sig
Difference (Mean)
126
0.03
0.165
95% Confidence interval Lower Upper 0.013 0.032
1280
M. Lakshmi et al.
Fig. 3. Dispersion diagram
The above figure, Fig. 3 depicts the Dispersion diagram drawn between SpHb and Hblab. A regression value of 0.768 was obtained which shows satisfactory results on prediction of SpHb. To compare the correlation and agreeability between two Hblab and SpHb, BlandAltman analysis was performed with mean on x-axis and Hbbias on y-axis which is plotted in Fig. 4.
Fig. 4. Bland altman plot
To analyse whether the data fits within the expectation of agreement, the upper and lower limit of agreement is plotted which are calculated as follows.
Investigation of Non-invasive Hemoglobin Estimation
1281
Upper limit of agreement ¼ Mean Hbbias þ ð1:96 SD of Hbbias Þ Lower limit of agreement ¼ Mean Hbbias þ ð1:96 SD of Hbbias Þ The bland-altman plot is shown in figure for which the upper and lower limit of agreement is 1.8664 and −1.5357 respectively which indicates the 95% confidence limits. It can be seen that most of the data points are clustered around the mean difference line of 0.165.
4 Conclusion Coefficient of Determination R2 value of 0.77 shows satisfactory results with mean difference of 0.74 g/dL. R2 value slightly differs for female and male population. However the study was conducted on outpatient subjects and not on healthy volunteers. One limitation of the study is that it did not include any anaemic subjects. The lowest recorded Hb value in the study is 10.1 g/dL. So, it can be concluded that SpHb measurement using PPG and generalised linear regression shows good correlation for Hb values greater than 10 g/dL. Further investigation of the method for anaemic population is needed for further validation of the method. Although the method has many positive attributes, it has to be validated in varied population set-up with larger sample size. Acknowledgements. The authors of this paper wish to thank the help and support from the nursing and medical staff at the Sree Abhirami Hospitals, Coimbatore without whose kind support, the study could not be performed. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Beutler, E., Waalen, J.: The definition of anemia: what is the lower limit of normal of the blood hemoglobin concentration? Blood 107(5), 1747–1750 (2006) 2. Vincent, J.L., et al.: Anemia and blood transfusion in critically ill patients. Jama 288(12), 1499–1507 (2002) 3. Kavsaoğlu, A.R., Kemal, P., Hariharan, M.: Non-invasive prediction of hemoglobin level using machine learning techniques with the PPG signal’s characteristics features. Appl. Soft Comput. 37, 983–991 (2015) 4. Kraitl, J., Ewald, H., Gehring, H.: An optical device to measure blood components by a photoplethysmographic method. J. Opt. A: Pure Appl. Opt. 7(6), S318 (2005)
1282
M. Lakshmi et al.
5. Kraitl, J., Ewald, H., Gehring, H.: Analysis of time series for non-invasive characterization of blood components and circulation patterns. Nonlinear Anal. Hybrid Syst. 2(2), 441–455 (2008) 6. Timm, U., et al.: Sensor system concept for non-ınvasive blood diagnosis. Procedia Chem. 1 (1), 493–496 (2009) 7. Saltzman, D.J., et al.: Non-invasive hemoglobin monitoring during hemorrhage and hypovolemic shock. Beckman Laser Inst and Medical Clinic, Irvine, CA (2004) 8. Suner, S., et al.: Non-invasive determination of hemoglobin by digital photography of palpebral conjunctiva. J. Emerg. Med. 33(2), 105–111 (2007) 9. Kim, O., et al.: Combined reflectance spectroscopy and stochastic modeling approach for noninvasive hemoglobin determination via palpebral conjunctiva. Physiol. Rep. 2.1 (2014) 10. Esenaliev, R.O., et al.: Continuous, noninvasive monitoring of total hemoglobin concentration by an optoacoustic technique. Appl. Opt. 43(17), 3401–3407 (2004) 11. Herlina, A.R., Fatimah, I., Mohd Nasir, T.: A non-invasive system for predicting hemoglobin (Hb) in dengue fever (DF) and dengue hemorrhagic fever (DHF). In: Sensors and the International Conference on new Techniques in Pharmaceutical and Biomedical Research, IEEE Asian Conference (2005) 12. Lbrahim, F., Ismail, N.A., Taib, M.N., WanAbas, A.B., Taib, M.N.: Modeling of hemoglobin indengue fever and dengue hemorrhagic fever using bioelectrical impedance. Physiol. Meas. 25(3), 607–616 (2004)
Sentiment Analysis on Tweets for a Disease and Treatment Combination R. Meena1(&), V. Thulasi Bai2, and J. Omana1 1
Department of CSE, Prathyusha Engineering College, Chennai, India [email protected], [email protected] 2 Department of ECE, KCG Engineering College, Chennai, India [email protected]
Abstract. The proposed work has retrieved tweets on a particular disease and treatment combination from twitter and they were processed to extract the sentiments. Initially the polarity values were set up in a range from weakly negative to strongly positive and the tweets were analyzed. The overall sentiment of the tweets related to breast cancer and chemotherapy was weakly positive. Naïve Bayes algorithm was applied on the tweets retrieved on the same disease and treatment combination. Nearly 10,000 tweets were analyzed using the Pubmed and Google book search engine as a training corpus. The sentiments were plotted in the graph which shows that the sentiments were neutral. Lastly, to find the most occurred word in the tweets, bigrams were used and cooccurrence of words were plotted using Natural Language Tool Kit in python. Keywords: Cancer
Chemotherapy Polarity Naïve Bayes Bigrams
1 Introduction The idea of extracting health data of the public and surveillance of disease outbreaks using Internet and social media as a source has a decade of history. [1] Social networks generate a humongous amount of shared content on all the areas and the users of social media are ever increasing which is predicted to be 3.02 billion in 2021. [2] The average time spent on the social media by the users varies from places around the world though it coarsely comes around 2 h and 22 min per day. [3] Twitter is one of the popular micro blogging sites. Presently, a considerable amount of growth conquers in the study of twitter data for public health monitoring in various ways such as disease monitoring, disease outbreak and prediction using public responses and trends, etc.., [4] There is a wide range of tools and various methods are available to perform the health oriented analysis on the tweets extracted from the popular blogging service twitter. [5] Tweets posted by the users can be carefully analyzed for the traces of keywords of a particular disease outbreak and prevalence in a particular location which is used for identifying
© Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 1283–1293, 2020. https://doi.org/10.1007/978-3-030-37218-7_134
1284
R. Meena et al.
seasonal diseases like influenza, flu and dengue. [6–8] Public health monitoring in twitter can be done by segregating the sentiments of users on many issues like usage of e-cigarette. [9] Twitter forums are also a great source of information which can be used to find the adverse drug effects by applying sentiment analysis and other text mining approaches. [10] NLP is applied on the twitter posts by the individual users in the discussion forum and it is used to understand the mental health of users using the sentiment scores thereby predicting the various depression conditions which actually is a less light shed analysis in healthcare. [11] Geo location features and tweets retrieved from the twitter can even be the best sources for identifying and managing the situation during a natural disaster like earthquake. [12] Twitter analysis is also a unique source of resource with tweets and images which can also be used for understanding the communication pattern, sentiment and severity of the disasters like hurricane. [13] The proposed work in this paper deals with the sentiment analysis of tweets retrieved real time from the users on a particular disease and its widely used treatment. Cancer is a dreadful disease and has over 17 million people around the world is affected by the disease as on 2018. [14] We have examined the sentiment distribution on the disease ‘Breast cancer’ which is of the most prevailing cancer around the world. [15] Other than surgery, chemotherapy and radiation therapy are the most recommended cancer treatments which is used to increase the life expectancy and helps them to lead a healthy life of the patients. So, chemotherapy was taken as a treatment and sentiment classes were discovered for the treatment. For both disease and treatment, tweets were retrieved and naïve bayes algorithm was applied on the data and text analysis was performed. The most common network of words were also found along with the cooccurrence of words were plotted for the disease and treatment. The paper is structured as, Sect. 2 has literature review related to the work, Sects. 3, 4 and 5 has the proposed work and Sect. 6 has the conclusion.
2 Literature Review Sentiment analysis is one of the major areas in text mining where opinions are categorized based on the data and used to find the users perspective. Its application in health care yields promising results and better understanding of users views in terms of medications, drugs and treatments. There is a lot of related work carried on previously in sentiment analysis of cancer patients using the social media data shared by the users. Here is the list of related works tabulated along with the data mining or machine learning technique used by the researchers (Table 1).
Sentiment Analysis on Tweets for a Disease and Treatment Combination
1285
Related Works in Literature Review:
Table 1. Related works Research work [16] [17]
[18]
[19] [20]
Problem addressed
Techniques used
Sentiment analysis – happiness of cancer patients Sentiment analysis, Temporal casuality, Probabilistic Kripke structure – Cancer Survivor Network Sentiment analysis tool (SentiHealthCancer) for analyzing the mood of cancer patients – online communities Psychosocial Benefits for Women with Breast cancer in Internet community Cancer information dissemination in social media
Hedonometric analysis
[21]
Analyzing breast cancer symptoms
[22]
Sentiment analysis – Cancer Survivors Network Sentiment analysis on cancer treatments Pap smear, mammogram Informatics and future research support on online social media cancer communities Web groups for reducing stress of Breast cancer patients Sentiment analysis and subjective attitude - Cancer Survivors Network Clustering of cancer tweets
[23] [24]
[25] [26] [27] [28] [29]
Breast cancer prediction Ranking, sentiment analysis of cancer drugs
Machine learning – Adaboost sentiment classifier Senti strength, F1, Accuracy, Harmonic mean, Recall and Precision Content analysis, Thematic analysis Social network analysis, content analysis, sentiment analysis, and multivariate analysis of variance Data mining classification algorithms – Decision tree and Bayesian classification Machine learning Content analysis Social support, member engagement
Mean scores and covariance comparing of web based blog Lexical approach NLP, feature selection and extraction, soft clustering algorithm Data mining algorithms Machine learning, SVM
3 Sentiment Analysis of Disease (Breast Cancer) and Treatment (Chemotherapy) 3.1
Objective and Applications of Sentiment Analysis
Sentiment analysis in health care is a way to understand patient interaction in social media. It paves a path in medical diagnosis, treatment feedbacks, drug effects and
1286
R. Meena et al.
mental health of patients. Social media interactions are the best way to predict the disease outbreaks. For complex diseases such as cancer, sentiment analysis on the social media interactions which has rich source of data on health conditions, medications and treatment details provides meaningful insights from the data. People spending larger time on the social media will be benefited from the analysis on their conversations which can give them positive effect on their mental health. Sentiment analysis and tools can be the source of inspiration for taking up medical procedures such as chemotherapy. These analyses will be useful to bring up more people on the discussion in social media related to their treatments and medications which can be a mode of helping new patients to understand the positive impacts of the treatments and can boost their mental health. 3.2
Analysis of Tweets
The tweets related to the keywords cancer and chemotherapy was retrieved. Since there are many language tweets are available in twitter, only English tweets were concentrated. After retrieval of tweets, the URLs or links, special characters are removed from
Breast Cancer
Chemotherapy
Extracting English tweets
Preprocessing
Tweet Extraction and Preprocessing
Polarity calculation using Text blob in python
Sentiment class identification using Naïve Bayes Algorithm
Overall class of Sentiment
Sentiment Analysis
Plotting Graph – Polarity Class of Sentiment
Bigram analysis for Co-occurrence of words
Fig. 1. Block diagram of the proposed work
Sentiment Analysis on Tweets for a Disease and Treatment Combination
1287
Table 2. Table showing the range of the polarity value for sentiment classes Polarity range 0 >0 and 0.3 and 0.6 and −0.3 and −0.6 and −1 and = D < = n/2 ’ group them in Ci, where i=1, 2, 3. Case III: If no term ‘n’ match group them in cluster Ci, where i=1, 2, 3. 6. Calculate Cosine similarity between Di where i=1, 2, 3,…M and Q which are grouped into clusters. 7. Calculate PageRank of the web pages grouped in clusters. 8. Finally ranked web pages are retrieved.
To implement the proposed approach an example is taken which consist of seven web pages A, B, C, D, E, F, and G having 500, 300, 200, 400,300, 300 and 200 words respectively. The directed graph which is shown in Fig. 2 presents seven web pages which are interlinked among each other.
Efficient Ranking Framework for Information Retrieval
A
B
C
D
1349
G
F
E
Fig. 2. Hyperlink structure of web pages
The frequency of the terms of the specified query in the various web pages is shown in Table 1. Table 1. Term frequency of web pages A B C D E F G Information 20 8 8 10 20 0 25 Retrieval 10 12 13 15 10 0 0 Evolutionary 12 0 6 8 5 0 0 Computation 8 0 2 6 2 6 0
The hyperlink count for the respective nodes on the basis of hyperlink structure is calculated and shown in Table 2.
Table 2. Hyperlink count for the nodes Node A B C D E F G
Inlink 3 2 3 3 3 1 3
Outlink 2 3 4 2 1 3 3
Hyperlink count 5 5 7 5 4 4 6
1350
S. Irfan and S. Ghosh
For all the nodes the tf-idf score is calculated and the nodes are clustered following the rules of the algorithm. It is found that node A, D, C and E contain all the words of the query and hence Case I is applied and they are clustered in one group and on node B Case II is applied which is placed in second cluster while the two nodes F and G are left out according to Case III. After clustering the nodes, cosine similarity is calculated for Cluster 1 and Cluster 2 as shown in Tables 3 and 4. Table 3. Cosine similarity of cluster 1 Cluster 1 cos(Q,A) cos(Q,D) cos(Q,C) cos(Q,E)
0.97755 0.925222 0.88167 0.842342
Table 3 present the cosine similarity for nodes A, D, C and E clustered together. Table 4 present the cosine similarity for node B. Table 4. Cosine similarity of cluster 2 Cluster 2 cos(Q,B) 0.48662
The Fig. 3 represents the cosine similarity of the final nodes clustered together.
Fig. 3. Cosine similarity of clustered documents
Efficient Ranking Framework for Information Retrieval
1351
After finding the similarity among the nodes, finally the PageRank formula is applied as given in equation on the clusters and ranking of the nodes are done as shown in Tables 5 and 6. PR(A) ¼ ð1 dÞ þ d(PR(T1Þ=C(T1Þ þ . . . þ PR(Tn)=C(Tn))
ð3Þ
The Table 5 shows the PageRank of the nodes which are clustered together in Cluster 1. Table 5. PageRank of Cluster 1 Iteration\Node 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
PR(A) 1 1.070833333 1.0836276 1.0943901 1.0957752 1.0959534 1.0959764 1.0959793 1.0959797 1.0959797 1.09597975 1.09597975 1.09597975 1.09597975 1.09597975 1.09597975
PR(C) 1 1.141666667 1.1772084 1.1817825 1.1823711 1.1824469 1.1824566 1.1824579 1.182458 1.1824581 1.1824581 1.1824581 1.1824581 1.1824581 1.1824581 1.1824581
PR(D) 1 0.959270833 0.9668235 0.9677954 0.9679205 0.9679366 0.9679387 0.967939 0.967939 0.967939 0.967939 0.967939 0.967939 0.967939 0.967939 0.967939
PR(E) 1 1.124356771 1.1275666 1.1279797 1.1280329 1.1280397 1.1280406 1.1280407 1.1280407 1.1280407 1.1280407 1.1280407 1.1280407 1.1280407 1.1280407 1.1280407
The Table 6 shows the PageRank of the nodes which are clustered together in Cluster 2. Table 6. PageRank of cluster 2 Iteration\Node PR(B) 0 1 1 0.7875 2 0.7875 3 0.7875 4 0.7875 5 0.7875 6 0.7875 7 0.7875 8 0.7875 9 0.7875 (continued)
1352
S. Irfan and S. Ghosh Table 6. (continued) Iteration\Node 10 11 12 13 14 15
PR(B) 0.7875 0.7875 0.7875 0.7875 0.7875 0.7875
After calculating the PageRank of the nodes it has been found that after Iteration 10 the value are same in all the iteration so the process can be stopped. The final rank of the nodes is shown in Table 7. Table 7. Ranked nodes Ranked nodes C E A D B
It has been found that the following strategy help the user in performing lesser calculation for computing the PageRank of the nodes thus reducing the computational complexity of the process and overall time. We are getting the result in less number of iteration thus minimizing the cost and time also. Table 8. PageRank of all nodes Iteration \Node 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
PR(A)
PR(B)
PR(C)
PR(D)
PR(E)
PR(F)
PR(G)
1 1.070833333 1.003452212 0.99836318 0.987380975 0.989610073 0.991998383 0.994402744 0.996269645 0.997655692 0.998654721 0.99936675 0.999871306 1.000227902 1.000479604 1.000657159
1 0.817604167 0.814486743 0.771932797 0.771854973 0.77175748 0.773499129 0.775038772 0.776278519 0.77719715 0.77786114 0.77833461 0.778670258 0.778907516 0.779074997 0.779193146
1 1.120092014 0.930016212 0.951614393 0.946697406 0.95011678 0.952553442 0.954653739 0.956204617 0.957331215 0.958135255 0.958705664 0.959108977 0.95939372 0.959594604 0.959736276
1 0.9030074 0.988149391 0.979879115 0.987634427 0.99160929 0.994887179 0.99720331 0.998862661 1.000037597 1.000868279 1.001454617 1.001868246 1.002159943 1.002365623 1.00251064
1 1.100444812 1.102490972 1.133991137 1.145881868 1.153346683 1.157934579 1.160994733 1.163084667 1.16453652 1.165552575 1.166266392 1.166768785 1.16712269 1.1673721 1.167547904
1 0.433333333 0.559749703 0.558946764 0.567768986 0.571044855 0.573311486 0.574745072 0.575723599 0.576398855 0.576870548 0.577201524 0.577434349 0.577598315 0.577713854 0.57779529
1 1.446175421 1.44334152 1.474478775 1.486040666 1.494040539 1.499100253 1.502553879 1.504937134 1.506601934 1.507770086 1.508591819 1.509170524 1.509578308 1.50986573 1.510068343
Efficient Ranking Framework for Information Retrieval
1353
In contrast to the above proposed method if we follow the original PageRank we are unable to get the similar value until iteration 15, while this is achieved after iteration 10 on following the proposed methodology. The Table 8 shows the result of PageRank of all the nodes by following the standard method of PageRank calculation. By observing the computation it has also been found that the PageRank of Node G is high, though it contains only single term of the query thus showing inadequacy to capture the true value. It also signifies that alone structural links are not sufficient for calculating the rank of the nodes.
7 Conclusion The paper presents an approach for ranking the web pages by using not only the structural links among the pages but also incorporating content measure for high precision results. It has been found out that a single measure for ranking documents is not sufficient to provide accurate result, and the precision value can be enhanced if more than one measure is used for ranking the web pages. In the proposed work we not only applied document clustering and similarity measure for ranking web pages but also used link structure among the pages. It has been observed that though the hyperlink count for a particular node is high but still due to lack of content it is inefficient for ranking purpose. The proposed methodology helps in overall reducing the computational complexity of PageRank. As it is known that the process of information retrieval is a complex task which need sufficient time for retrieval, so by using this process we are undertaking only relevant nodes and exempt the nodes which lack content, thus decreasing the retrieval time for accessing relevant information. In future, further work can be done on this procedure by employing evolutionary computation techniques which help in clustering the documents and will help in increasing the efficiency of the proposed approach.
References 1. Alam, M., Sadaf, K.: A review on clustering of web search result. In: Advances in Computing & Information Technology, AISC, vol. 177, pp. 153–159. Springer, Heidelberg (2013) 2. Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques (2000) 3. Leuski, A., Allan, J.: Improving interactive retrieval by combining ranked lists and clustering (2000) 4. Sheshasaayee, A., Thailambal, G.: Comparison of classification algorithms in text mining. Int. J. Pure Appl. Math. 116(22), 425–433 (2017) 5. Jain, R., Purohit, G.N.: page ranking algorithms for web mining. Int. J. Comput. Appl. 13(5), 0975–8887 (2011) 6. Srivastava, J., Cooley, R., Deshpande, M., Tan, P.-N.: Web usage mining: discovery and applications of usage patterns from web data. In: ACM SIGKDD, January 2000
1354
S. Irfan and S. Ghosh
7. Wang, Z.: Improved link-based algorithms for ranking web pages. In: WAIM. LNCS, vol. 3129, pp. 291–302. Springer, Heidelberg (2004) 8. Yates, R.B., Hurtado, C., Mendoza, M.: Query clustering for boosting web page ranking. In: AWIC 2004. LNAI, vol. 3034, pp. 164–175. Springer, Heidelberg (2004) 9. Irfan, S., Ghosh, S.: A review on different ranking algorithms. In: International Conference on Advances in Computing, Communication Control and Networking IEEE ICACCCN (2018) 10. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the Seventh International World Wide Web Conference (1998) 11. Masterton, G., Olsson, E.J.: From impact to importance: the current state of the wisdom-ofcrowds justification of link-based ranking algorithms. Philos. Technol. 31, 593–609 (2018) 12. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46, 604–632 (1999) 13. Xing, W., Ghorbani, A.: Weighted pagerank algorithm. In: Proceedings of the Second Annual Conference on Communication Networks and Services Research. IEEE (2004) 14. Fujimura, K., Inoue, T., Sugisaki, M.: The eigenrumor algorithm for ranking blogs. In: WWW (2005) 15. Bidoki, A.M.Z., Yazdani, N.: DistanceRank: an intelligent ranking algorithm for web pages. Inf. Process. Manag. 44, 877–892 (2007) 16. Jiang, H.: TIMERANK: a method of improving ranking scores by visited time. In: Proceedings of the Seventh International Conference Machine Learning and Cybernetics, Kunming, 12–15 July 2008 (2008) 17. LaTorre, A., Pena, J.M., Robles, V., Perez, M.S.: A survey in web page clustering techniques (2019) 18. Sandhya, N., Govardhan, A.: Analysis of similarity measures with wordnet based text document clustering. In: Proceedings of the InConINDIA. AISC, vol. 132, pp. 703–714. Springer, Heidelberg (2012) 19. Rafi, M., Shaikh, M.S.: An improved semantic similarity measure for document clustering based on topic maps (2013) 20. Markov, Z., Larose, D.T.: Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage. Wiley, Hoboken (2007) 21. Manning, C.D., Raghavan, P.: An Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008). Preliminary draft© 2008
Detection and Removal of RainDrop from Images Using DeepLearning Y. Himabindu(&), R. Manjusha, and Latha Parameswaran Department of Computer Science and Engineering, Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, Coimbatore, India [email protected], {r_manjusha,p_latha}@cb.amrita.edu
Abstract. Dynamic climatic conditions like rain, affects the performance of vision algorithms which are used for surveillance and analysis tasks. Removal of rain-drops is challenging for single image as the rain drops affect the entire image and makes it difficult to identify the background affected by the rain. Thus, removal of rain from still pictures is a complex and challenging task. The rain drops affect the visibility and clarity of the image which makes it difficult to read and analyze the information present in the image. In this paper, we identified and restored the rain drop affected regions using a deep learning architecture. Keywords: Resnet
Rain removal Denoising
1 Introduction Digital image processing is one of the thrust areas of research. Image processing techniques are used to improve images or to draw out some information from the images. 1.1
Applications of Image Restoration
A few applications of image restoration are discussed here: 1. In the field of medical imaging such as Brain tumor identification, X-ray imaging, MRI, etc. 2. Pattern Recognition. 3. One of the major areas is surveillance. Traffic signals, Shopping complex and many more places where surveillance is essential has camera deployed outside. The videos or images captured from such cameras may be affected with noise such as rain or snow. Rain removal is similar to image enhancement. Main challenge involved is identifying the regions affected by the rain and restoring those without loss of information. Differentiating raindrops from objects is one of the challenges involved. As mentioned, removing rain drops from images is complex than removing rain drops from video. This is because the video is a collection of frames and using the frames the correlation © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 1355–1362, 2020. https://doi.org/10.1007/978-3-030-37218-7_142
1356
Y. Himabindu et al.
between the consecutive frames can be calculated and further the rain drops direction and regions affected by it can be identified and removed accordingly. Differentiating the rain streaks from that of background with respect to object information like orientation and structure is complex and is a major challenge while trying to restore a rain affected image.
2 Literature Survey The indices of neighboring samples were represented using parameters p and p + 1. The slope with respect to p is calculated using mod (fp, fp + 1). The non- zero components count that satisfies the parameter p is denoted by operator {}. To extract basic data i.e., to straighten the details and sharpen the fundamental edges the strategy was joined with a general limitation, that is, there existed an auxiliary likeness between input signal g and the outcome f. The pixel splendor was preserved and managed using contrast enhancement. Histogram extending was utilized to move the pixel esteems to fill the splendor bringing about high contrast. In [2] authors have used a mixed strategy. The image that was given as input, was converted to YCbCr color space to acquire precise gray component. The algorithm began about the centroid and iterated until a neighborhood ideal was found. J¼
2 Xk Xn ðjÞ x c j i j1 i1
ð2Þ
Equation 2 is used to measure the distance between clusters formed. The clusters were discovered by the peaks and valleys from histogram. From this bunches they acquired the sliced areas and by consolidating the boundary extraction the image highlights was obtained. To get the accurate number of clusters, elbow algorithm was considered. The peak value of the histogram was obtained from the number of clusters. After the potential raindrop regions were recognized by the saliency method, a strategy for check was put in to attest if the zone contains a raindrop or not. The details regarding the shape of the drop was acquired using Hough circle transform. K-means was used for classifying the regions. The edges and boundary information was acquired using canny edge detection. By separating the background details the rain drops were identified. The raindrops were evacuated using the median filter. In [3], Deep CNN was the base to handle the rain drop problem. The mapping between the rainy and clean layers is learnt instinctively. To yield a better quality of the restored image, both removal of rain and enhancement of the image were performed. Low frequency layer was termed ad base layer and the high frequency layer was termed as detail layer. The input for the proposed architecture was the detail layer i.e., high frequency layer. Image enhancement operation was performed to improve the quality of image. The network was trained on the detail layer obtained after splitting the image. The image that is free from raindrops was obtained by combining the base layer with obtained detail layer.
Detection and Removal of RainDrop from Images
1357
In [4], layer decomposition was the proposed solution for rain drop removal. The image O is split into two layers. B represents rain free layer and R represents layer with rain drops. O ¼ BþR
ð3Þ
The split layers are assigned with priors based on patch. Using the priors traced by Gaussian mixture models the rain drops of various directions and sizes were handled. In [5], the proposed strategy does not need any kind of statistical information with respect to pixel. This strategy is reasonable for both images just as videos as it does not require any arrangement before recognition. Rain free image was obtained by determining the intensity changes formed by rain on the pixels. The acquired reference picture gave draft forms yet it did not have those edges which were influenced by raindrops. The affected regions were refurbished using a guided filter for preserving the edges and the acquired reference image. In [6], authors introduced an approach that is free from explicit motion analysis. The technique probed the transient changes and chose pixels affected by rain. In [7], a new model with two elements was created by collaborating binary map with existing model. The rain drops were represented by one element and another element represented various shapes and directions of rain drops. A multi-task Convolutional network JORDER (joint rain detection and removal) was constructed. JORDER (joint rain detection and removal) identified the raindrop influenced locales and a contextualized expanded network, which was used for separating the raindrops. In [8] authors have made use of an orientation filter for identifying the rain drops. Remaining raindrops were identified using entropy maximization followed by removal of background information. Here the deterioration depended on the direction of rain drops. A low pass filter was used to estimate the background. The backdrop information in pictures [9] was acquired by high recurrence concealment and direction. It was noticed that at least one color channel of the pixels influenced by raindrops had low intensity value. The thickness of mist was determined based on the prior and the image was reconstructed. In [10], the image was split into two layers. One layer was represented by B (rain free layer) and another layer was represented by R (rainy layer). This separation was achieved by constituting three priors on layer B and layer R. The three priors included smoothening the rain streaks in B, preserving the object details in B and hide the nonrainy details from R. From the obtained rainy layer, the regions populated by rain drops were identified using histogram and the direction of the rain was estimated by using canny edge detector the slope of the rain drop direction is calculated. The rain patches were randomly selected and restored. In [11] a framework constituting deep learning and layer priors was introduced. The framework included a CNN of residual type for training the images. Along with this they used or incorporated the concept of total variation energy which would enhance the recovered image. From the literature survey it is observed that rain removal and restoration is still an open challenge and hence can be explored.
1358
Y. Himabindu et al.
3 Proposed Work The first step performed is background subtraction i.e., separating the rain drop layer and detail layer. The objective of splitting image into the two layers was partially achieved by using high-pass and low-pass filters. Image impainting is used for filling those rain affected regions by taking the mean of surrounding pixels. Further, Applying the L0 gradient smoothening gave a partial rain free image. But the result was not as expected as the images obtained as output was in a blurred state. Hence Using the ResNet Architecture, the objective of removing rain drops from images is achieved. The number of layers which produced a good result was 26 layers; ResNet Architecture helps in handling the gradient vanishing problem. Here we obtained the base layer using a low pass guided filter and take the difference of base layer and image to obtain the detail layer. The obtained detail layer has the required rain drops and object structures which make sure that background information is not lost. Image ¼ Base layer þ Detail layer The detail layer is taken as input to the network and the training of images is performed. The objective function can be defined as L¼
XN f ðXi;detail ; W, b) + Xi Yi 2 F i¼1
ð5Þ
N denotes the number of training images, ResNet is denoted by f(.), the network parameters which are supposed to be learned are W and b. To obtain the detail layer, firstly the base layer is calculated using a low pass guided filter and then the image is separated as base and detail layer. As the dataset has color images, we take c i.e., the number of channels as 3. The filter size for the first layer is c f1 f1 m1, f represents the filter size and m represents the feature maps. Numbers of layers are represented by L. For the layers till L − 1, the filter size is m1 f2 f2 m2. The size for last layer is m2 f3 f3 c. Using this we calculate the residual value of the image. Figure 1 gives a brief description about the workflow used in the proposed method of identifying and removing the raindrops from images. 3.1
Results and Discussions
From [15], Stochastic Gradient Descent has been used to minimize the objective function. A dataset of 14,000 pairs of rain images is trained. The testing and training is in the ratio 13: 7. The parameters are set namely, number of layers is taken as 26 and the learning rate is 0.1, the iterations are taken at 10k and 210k. The filter size is set to 3 and the mapping parameter is set to 16. The number of channels, c is set to 3. A guided filter of radius 15 is used. Table 1 gives the SSIM, Structural similarity index for Figs. 1 and 2 obtained as a result of proposed method. It basically requires two images, one is original image and the other one is a processed image. Using the structures visible it calculates the
Detection and Removal of RainDrop from Images
1359
Fig. 1. Workflow of proposed method
Table 1. Table with SSIM values Figures Figures Figures Figures Figures
Fig. 2. Input rainy image [7]
2 4 6 8
and and and and
3 5 7 9
SSIM value 0.87 0.926 0.948 0.932
Fig. 3. Output normal image
perceptual difference between the images and gives the value accordingly. The SSIM value ranges between −1 and 1. Higher the SSIM value, the more similar are the images. The higher the SSIM is, the closer the image is to the ground truth. Figures 1 and 2 are the input and output images respectively. Table 2 gives the average SSIM for different network sizes. Increasing the number of hidden layers gives a good SSIM value. A good SSIM value means the images are quite similar. Figures 1 and 2 are the input and output images respectively.
1360
Y. Himabindu et al. Table 2. Table for network sizes Layers 12 26 36
m1 = m2 = 16 m1 = m2 = 48 0.876 0.88 0.916 0.920 0.918 0.924
4 Results Figures 4 and 6 are given as input to the proposed method and the figures obtained as output are Figs. 5 and 7. In Figs. 3 and 4, we can see one rainy image which is given as input to the proposed method and the other which is obtained as output. The above results are obtained for the proposed approach with 26 layers at a learning rate of 0.1. When the layers were 12 the obtained images were obtained as shown in Fig. 9.
Fig. 4. Input rainy image [8]
Fig. 6. Input rainy image [1]
Fig. 5. Output normal image
Fig. 7. Output normal image
Detection and Removal of RainDrop from Images
Fig. 8. Input rainy image [11]
1361
Fig. 9. Output normal image
5 Conclusion Using the proposed approach, the input image is split into detail layer and base layer benefits in preserving the object details. In this paper, the pairs of clean and rainy images are used for network learning as the ground truth is not taken. Firstly, we tried to achieve the objective by using classic techniques. We calculated the mean of surrounding pixels and used the value to fill the raindrop affected regions. In the resulting image the rain drops were removed partially and the quality of image was not as expected. Hence used the deep learning architecture, where the input image has the object details and rain streaks. This approach helped us to preserve the background information of the image. Though it produced good results still the architecture can be improved for differentiating the rain streaks and object details which have similar frequency. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest. ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Manu, B.N.: Rain removal from still images using L0 gradient minimization technique. In: 2015 7th International Conference on Information Technology and Electrical Engineering, ICITEE. IEEE (2015) 2. Kanthan, M.R., Sujatha, S.N.: Rain drop detection and removal using K-Means clustering. In: 2015 IEEE International Conference on Computational Intelligence and Computing Research, ICCIC. IEEE (2015) 3. Fu, X., et al.: Clearing the skies: a deep network architecture for single-image rain removal. IEEE Trans. Image Process. 26(6), 2944–2956 (2017) 4. Li, Y., et al.: Rain streak removal using layer priors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
1362
Y. Himabindu et al.
5. Xu, J., et al.: Removing rain and snow in a single image using guided filter. In: 2012 IEEE International Conference on Computer Science and Automation Engineering, CSAE, vol. 2. IEEE (2012) 6. Subhani, M.F., Oakley, J.P.: Low latency mitigation of rain induced noise in images, p. 13 (2008) 7. Yang, W., et al.: Deep joint rain detection and removal from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017) 8. Dodkey, N.: Rain streaks detection and removal in image based on entropy maximization and background estimation. Int. J. Comput. Appl. 164(11), 17–20 (2017) 9. He, K., Sun, J., Tang, X.: Single image haze removal using dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. 33(12), 2341–2353 (2011) 10. Zhu, L., et al.: Joint bilayer optimization for single-image rain streak removal. In: Proceedings of the IEEE International Conference on Computer Vision (2017) 11. Luo, Y., Xu, Y., Ji, H.: Removing rain from a single image via discriminative sparse coding. In: Proceedings of the IEEE International Conference on Computer Vision (2015) 12. Fu, X., et al.: Removing rain from single images via a deep detail network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017) 13. Schaefer, G., Stich, M.: UCID: an uncompressed color image database. In: Storage and Retrieval Methods and Applications for Multimedia 2004, vol. 5307. International Society for Optics and Photonics (2003) 14. Manjusha, R., Parameswaran, L.: Design of an image skeletonization based algorithm for overcrowd detection in smart building. In: Pandian, D., Fernando, X., Baig, Z., Shi, F. (eds.) International Conference on ISMAC in Computational Vision and Bio-Engineering, vol. 30, pp. 615–629. Springer, Cham (2018) 15. Bai, Z., et al.: Algorithm designed for image inpainting based on decomposition and fractal. In: 2011 International Conference on Electronic and Mechanical Engineering and Information Technology, EMEIT, vol. 3. IEEE (2011) 16. Liu, R., et al.: Deep layer prior optimization for single image rain streaks removal. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP. IEEE (2018)
A Review on Recent Developments for the Retinal Vessel Segmentation Methodologies and Exudate Detection in Fundus Images Using Deep Learning Algorithms Silpa Ajith Kumar(&) and J. Satheesh Kumar Dayananda Sagar College of Engineering, Bangalore, India [email protected], [email protected]
Abstract. Retinal image analysis is considered as a well-known non-intrusive diagnosis technique in modern opthalmology. The pathological changes which occurs due to hypertension, diabetic retinopathy and glaucoma can be viewed directly from the blood vessels in retina. The examination of the optic cup-todisc ratio is the main parameter for detecting glaucoma in the early stages. The significant areas of the fundus images are isolated using the segmentation techniques for deciding the value of cup-to-disc ratio. The deep learning algorithms, such as the Convolutional Neural Networks (CNNs), is often used technique for the analysis of fundus images. The algorithms using the concepts of CNNs can provide better accuracy for the retinal images. This review explains the recent techniques in deep learning relevant for the analysis of exudates. Keywords: Blood vessel segmentation Convolutional neural networks Deep learning Diabetic retinopathy Exudate detection Fundus images Machine learning
1 Introduction The common cause for vision impairment in adults is due to the diabetic retinopathy (DR) [1]. The presence of the exudates on the retina plays a key role in the vision loss. The detection of exudates is an important diagnosis step for diabetic retinopathy. The automatic detection of exudates was analyzed in the literatures done before [2, 3]. This review paper discusses about the exudates which is considered as the main reason for the existence of Macular Edema and DR in retinal images. Deep learning methods have great significance in recent years since it is able to solve complex problems efficiently [4]. The deep learning techniques use the feature learning techniques from raw data [5]. CNNs are implemented to detect the DR in fundus images which helps in the applications of deep learning in the pixel based classification and labeling. The automatic exudate detection methods are classified into two major classifications namely morphological and machine learning methods [6–8]. © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 1363–1370, 2020. https://doi.org/10.1007/978-3-030-37218-7_143
1364
S. A. Kumar and J. Satheesh Kumar
Sopharak et al. [9] discussed the exudate detection using the morphological reconstruction techniques in the fundus images. In the work done by Sanchezet al. [10, 11] a statistical mixture model-based clustering method for dynamic thresholding for the exudate detection was implemented. Fraz et al. [12] in his paper tried the bootstrapped or bagging decision trees based ensemble classifier for the exudate detection. The paper discusses the concepts of fundus image processing, available public datasets and analysis of performance measures for different extraction techniques in Sect. 2. The deep learning methods and the analysis of different methods used in exudates detection and segmentation for DR is outlined in Sect. 3. Section 4 concludes with the summary and enhancements for future research.
2 Fundus Image Processing 2.1
Fundus Photography
Fundus photography records the retinal images and the neurosensory tissues in our eyes. The images are converted into electrical impulses which the brain easily understands. Fundus photography uses three types of examination namely: the color photography, red-free photography and angiography [13]. The techniques of adaptive optics are used to increase the areas of optic systems. The AO retinal imaging helps in the classification of normal and diseased images of eyes which are done using spacing, mosaic and photography cell thickness. The fundus imaging methods are based on the fundus fluorescein angiography and Optical Coherence Tomography [14]. 2.2
Publicly Available Retinal Image Datasets
In this section, the summarization of the public domain datasets like DRIVE, STARE, MESSIDOR, ORIGA, Kaggle and e-Ophtha is done in detail. Table 1. Summary of datasets used for blood vessel segmentation and DR detection Databases
Total no. of images
Digital Retinal Images for Vessel Extraction (DRIVE) [15] Structured Analysis of Retina (STARE) [16]
Fundus camera
Resolution (pixel)
Field of view FOV
Tasks
40 [33 Normal images & 7 Canon CR5 images contains exudates, hemorrhages and pigment epithelium changes]
768 564
45°
Blood vessel extraction, exudates, hemorrhages
400 (20 images for blood Top Con vessel segmentation and 80 TRV-50 images ONH is localized)
605 700
35°
Blood vessel segmentation, exudates, MicroAneursyms (MA)
(continued)
A Review on Recent Developments for the Retinal Vessel Segmentation
1365
Table 1. (continued) Databases
Total no. of images
Fundus camera
Resolution (pixel)
MESSIDOR [17]
1200
1440 960, 45° 2240 1488, 2304 1536
Diabetic retinopathy (DR) grading and macular edema risk analysis
Kaggle ORIGA [18]
80000 482 (Normal & 168 (Glaucomatous images)
Non mydriatic 3CCD camera, (Top Con TRC NW6) – –
– –
e-ophtha
463 (148MAs, 233 Normal, OPHDIT Exudates 47 exudates, 35 Normal Tele- medical non-exudates) Network
DR grading Optic disk, Optic cup and cup-todisc ratio (CDR) Grading of DR lesions
2.3
Field of view FOV
– –
–
Tasks
Analysis of Performance Measures
The segmentation process produces a pixel based outcome. A pixel can be classified either as a vessel or surrounding tissue. Table 1 shows the four different distinctive areas namely (i) the true positive area (ii) the true negative area (iii) the false negative area (iv) the false positive area. In the four cases discussed above, two are considered as classifications and the other two are treated as misclassifications (Table 2). Table 2. Performance measures for assessment of feature extraction techniques Measurement parameters Sensitivity or True Positive Fraction or (TPF) or Recall Specificity or True Negative Fraction (TNF) Precision or Positive Predicted Value (PPV) Negative Predicted Value (NPV) Accuracy False Positive Fraction (FPF) False Negative Fraction (FNF) Receiver Operating Characteristic (ROC) Intersection Over Union (IOU)
F-score
Relevant formulas TP TP þ FN TN TN þ FP TP TP þ FP TN TN þ FN TP þ TN TP þ TN þ FP þ FN FP TP þ FN FN TP þ TN
–
Area S \ M Area S [ G
‘S’ indicates the segmentation of the output and ‘M’ represents the manual ground truth segmentation PPV TPF 2 PPV þ TPF
1366
S. A. Kumar and J. Satheesh Kumar
3 Deep Learning Methods In this section a brief description of deep learning, architectures and techniques is provided for the effective detection of exudates in fundus images. A brief summary of exudate detection is described in Table 3. 3.1
Machine Learning Algorithms
The machine learning algorithms are generally classified as supervised and unsupervised learning algorithms. In supervised learning, a model is presented with a dataset D ¼ fx; ygNn1 , were ‘x’ represents the input features and ‘y’ represents the label pairs. The final output value for any correct input image is predicted with the help of the regression function. The unsupervised learning algorithms can be implemented using various loss functions like reconstruction loss. The principal component analysis and clustering methods are the traditional unsupervised learning algorithms [20].
Fig. 1. Various deep learning based architectures [20, p. 64]
A Review on Recent Developments for the Retinal Vessel Segmentation
3.2
1367
Neural Network Architectures
The neural network architectures (Fig. 1) used for the exudate detection are convolutional neural networks (CNNs), auto encoders (AEs), deep belief networks (DBNs) and recurrent neural networks (RNNs). The CNNs are used for various tasks and comprises mainly of three types of layers namely convolutional, pooling and fully connected layers. The AlexNet, VGGNet, GoogLeNet and ResNet are the CNN models used for the exudate detection. An auto encoder (AE) is usually a single hidden layer network having the identical input and output values. It is used to construct stacked – auto encoder (SAE). The pre-training and fine-tuning are the two phases in the training of a SAE model. The DBN architecture is constructed by cascading the restricted Boltzmann machines. DBN is a probabilistic model which is pre-trained in an unsupervised way and finally fine-tuned with the help of a gradient descent and back propagation algorithms. Table 3. Summary of exudate detection and vessel segmentation algorithms Authors
Methodology
Parham et al. [21]
Exudate detection using 9 DIARETDB1 and e-Ophtha layered CNN-based model and color space transformation
Chen et al. [22] Long et al. [23]
Diverse lesion detection using Kaggle and Messidor subspace learning
Badawi et al. [24]
Almotiri et al. [25]
Algorithm based on dynamic threshold and Fuzzy–C means clustering for hard exudates detection and Support Vector Machine (SVM) for classification
Datasets
Performance Metrics Accuracy = 98.2%, Sensitivity = 0.99, Specificity = 0.96, Positive Prediction Value (PPV) = 0.97 AUC = 0.9956, MAP = 0.8236
e-ophtha EX database (47 images) DIARETDB1 database (89 images)
Sensitivity = 76.5%, PPV = 82.7% and F-score = 76.7%. (For e-ophtha EX database) overall average sensitivity = 97.5%, specificity = 97.8%, accuracy = 97.7%, (DIARETDB1 database) CNN based multi loss function DRIVE, AVRDB, and AV Accuracy for DRIVE = 96%, optimized deep encoder– classification dataset Accuracy for AVRDB decoder for segmentation of dataset = 98% retinal vessels Accuracy for AV classification dataset = 97% Adaptive fuzzy thresholding Vessels (DRIVE and STARE Sensitivity = 93.15%, and mathematical morphology datasets), optic disc Specificity = 97.09%, based segmentation algorithms (DRISHTI-GS) and exudates PPV = 90.15% lesions (DIARETDB1)
(continued)
1368
S. A. Kumar and J. Satheesh Kumar Table 3. (continued)
Authors
Methodology
Fu et al. [26]
Automatic Glaucoma SCES and SINDI screening done using Discaware Ensemble Network (DE Net) for automatic glaucoma screening
Datasets
Geetha Ramani et al. [19]
Methods used are image preprocessing, supervised and unsupervised learning and image post-processing techniques.
DRIVE database
Performance Metrics SCES Dataset: AUC = 0.9183, B-Accuracy = 0.8429, Sensitivity = 0.8479, Specificity = 0.8380 SINDI Dataset: AUC = 0.8173, B-Accuracy = 0.7495, Sensitivity = 0.7876, Specificity = 0.7115 Accuracy = 95.36%, Sensitivity = 70.79%, Specificity = 97.78%, Positive Prediction Value (PPV) = 75.76%
4 Conclusion and Future Work The early detection of DR helps to avoid the vision loss. The deep learning methods has helped to implement very accurate detection and diagnosis of the diabetic retinopathy complications. The review paper gives an outline of the recent deep learning algorithms used in exudate and blood vessel segmentation. Much research has not been done in the field for the pathological images. The future work in the detection of hard exudates, microaneurysms and hemorrhages detection using deep learning can be implemented by preventing overfitting of data in the future research. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest. ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Cheung, N., Mitchell, P., Wong, T.Y.: Diabetic retinopathy. Lancet 376, 124–136 (2010) 2. Philips, R., Forrester, J., Sharp, P.: Automated detection and qualification of retinal exudates. Gaefes Arch. Clin. Exp. Ophthalmol. 1231, 90–94 (1993) 3. Gardner, G.G., Keating, D., Williamson, T.H., Elliott, A.T.: Automatic detection of diabetic retinopathy using an artificial neural network: a screening tool. Br. J. Ophthalmol. 80, 940– 944 (1996) 4. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015) 5. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR, vol. abs/1409.1556 (2014). https://arxiv.org/abs/1409.1556 6. Naqvi, S.A.G., Zafar, M.F., ul Haq, I.: Referral system for hard exudates in eye fundus. Comput. Biol. Med. 64, 217–235 (2015)
A Review on Recent Developments for the Retinal Vessel Segmentation
1369
7. Walter, T., Klein, J.C., Massin, P., Erginay, A.: A contribution of image processing to the diagnosis of diabetic retinopathy-detection of exudates in color fundus images of the human retina. IEEE Trans. Med. Imaging 21(10), 1236–1243 (2002) 8. Niemeijer, M., van Ginneken, B., Russell, S.R., Suttorp-Schulten, M.S., Abramoff, M.D.: Automated detection and differentiation of drusen, exudates, and cotton-wool spots in digital color fundus photographs for diabetic retinopathy diagnosis. Invest. Ophthalmol. Vis. Sci. 48(5), 2260–2267 (2007) 9. Sopharak, A., Uyyanonvara, B., Barman, S., Williamson, T.H.: Automatic detection of diabetic retinopathy exudates from non-dilated retinal images using mathematical morphology methods. Comput. Med. Imaging Graph. 32(8), 720–727 (2008) 10. Sanchez, C.I., García, M., Mayo, A., Lopez, M.I., Hornero, R.: Retinal image analysis based on mixture models to detect hard exudates. Med. Image Anal. 13(4), 650–658 (2009) 11. Ali, S., Sidibe, D., Adal, K.M., Giancardo, L., Chaum, E., Karnowski, T.P., Mériaudeau, F.: Statistical atlas based exudate segmentation. Comput. Med. Imaging Graph. 37(5–6), 358– 368 (2013) 12. Fraz, M.M., Jahangir, W., Zahid, S., Hamayun, M.M., Barman, S.A.: Multi scale segmentation of exudates in retinal images using contextual cues and ensemble classification. Biomed. Sig. Process. Control 35, 50–62 (2017) 13. Cassin, B., Solomon, S.A.B.: Dictionary of Eye Terminology, 2nd edn. Triad Publishing Company, Gainesville (1990) 14. Bouma, B.E., Tearney, G.J.: Handbook of Optical Coherence Tomography, 1st edn. Marcel Dekker, New York (2001) 15. Niemeijer, M., Staal, J.J., Ginneken, B.V., Loong, M., Abramoff, M.D.: DRIVE: digital retinal images for vessel extraction (2004). http://www.isi.uu.nl/Research/Databases/ DRIVE/ 16. Hoover, A., Kouznetsova, V., Goldbaum, M.: Locating blood vessels in retinal images by piecewise threshold probing of a matched filter response. IEEE Trans. on Med. Imaging 19(3), 203–210 (2000) 17. MESSIDOR: Methods for Evaluating Segmentation and Indexing techniques Dedicated to Retinal Ophthalmology (2004). http://messidor.crihan.fr/index-en.php 18. Zhang, Z., Yin, F.S., et al.: ORIGA–light: an online retinal fundus image database for glaucoma analysis and research. In: Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2010, Buenoa Aires, Argentina, September, pp. 3065–3068. IEEE (2010) 19. Geeta Ramani, R., Balasubramanian, L.: Retinal blood vessel segmentation employing image processing and data mining techniques for computerized retinal image analysis. Biocybern. Biomed. Eng. 36, 102–118 (2016) 20. Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., Van Der Laak, J.A.W.M., van Ginneken, B., Sánchez, C.I.: A survey on deep learning in medical image analysis. Medical Image Anal. 42, 60–88 (2017) 21. Khojasteh, P., Aliahmad, B., Kumar, D.K.: A novel color space of fundus images for automatic exudates detection. Biomed. Sig. Process. Control 49, 240–249 (2019) 22. Chen, B., Wang, L., Sun, J., Chen, H., Fu, Y., Lan, S.: Diverse lesion detection from retinal images by subspace learning over normal samples. Neurocomputing 297, 59–70 (2018) 23. Long, S., Huang, X., Chen, Z., Pardhan, S., Zheng, D.: Automatic detection of hard exudates in color retinal images using dynamic threshold and SVM classification: algorithm development and evaluation. BioMed Res. Int. (2019). http://doi.org/10.1155/2019/3926930 24. Badawi, S.A., Fraz, M.M.: Multiloss function based deep convolutional neural network for segmentation of retinal vasculature into arterioles and venules. BioMed Res. Int. (2019). https://doi.org/10.1155/2019/4747230
1370
S. A. Kumar and J. Satheesh Kumar
25. Almotiri, J., Elleithy, K., Elleithy, A.: A Multi-anatomical retinal structure segmentation system for automatic eye screening using morphological adaptive fuzzy thresholding. IEEE J. Transl. Eng. Health Med. 6, 1–23 (2018) 26. Fu, H., Cheng, J., Xu, Y., Zhang, C., Wong, D.W.K., Liu, J., Cao, X.: Disc-aware ensemble network for glaucoma screening from fundus image. IEEE Trans. Med. Imaging 37(11), 2493–2501 (2018)
Smart Imaging System for Tumor Detection in Larynx Using Radial Basis Function Networks N. P. G. Bhavani1,2, K. Sujatha2(&), V. Karthikeyan3, R. Shobarani4, and S. Meena Sutha5 1
Department of EEE, Meenakshi College of Engineering, Chennai, India 2 Department of EEE, Dr. MGR Educational and Research Institute, Chennai, India [email protected] 3 Department of EIE, Dr. MGR Educational and Research Institute, Chennai, India 4 Department of CSE, Dr. MGR Educational and Research Institute, Chennai, India 5 Department of ECE, Rajalakshmi Engineering College, Chennai, India
Abstract. Tumor in Larynx is a fatal disease, which is capable to make the humans speechless. The Computed Tomography (CT) images are used to detect the benign and malignant tumors in larynx. The objective of this project is to develop an organized scheme to analyze and evaluate its probabilities with the help of a typical user friendly simulation tool like MATLAB. The innovation in this scheme has a well developed strategy for detection of malignant and benign tumors using parallel computing image related machine learning algorithms. The preliminary stage includes noise removal and feature extraction using Principal Component Analysis (PCA) from the region of interest in the larynx. This method uses images from the open source data base. The feature set extracted serves as the input for training the Radial Basis Function Network (RBFN) so as to identify the malignant and benign tumor present in the larynx. The Diagnostic Efficiency (DE) is found to be 94 to 95%. Keywords: Larynx tumor Principal Component Analysis Function Network Diagnostic efficiency
Radial Basis
1 Introduction The larynx which is otherwise called as voice-box, is a triangular shaped structure responsible for voice generation, respiration and serves as a protection to trachea from aspirations. The larynx is made up of cartilages, epiglottis, thyroid gland, cricoid, arytenoid, corniculate and cuneiform. Muscles and connective tissues help to connect the cartilages. The epiglottis is a piece of tissue which safeguards the trachea [1, 2]. Due to its strange importance, the disorders like laryngeal cancer require a meticulous
© Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 1371–1377, 2020. https://doi.org/10.1007/978-3-030-37218-7_144
1372
N. P. G. Bhavani et al.
care. The treatment includes excision surgery, radiation, chemotherapy or even total laryngectomy. The pre-staging methods used are Computed tomography (CT) and magnetic resonance imaging (MRI) prior to surgery so as to identify vocal cord disorders. Unfortunately, the other structures like nerves and blood vessels adjacent to the larynx to may be affected because of complications such as nerve damage, inflammation, tissue breakdown, fibrosis and pain. Manual diagnostic reviews by radiologists, oncologists and surgeons are tedious, delayed and subjected to inaccuracies in determining the exact stage and offering an appropriate treatment. Therefore, this automated smart imaging diagnostics for larynx tumor identification offers best sites for radiotherapy which will reduce complications thereby producing a favorable outcome. The CT image of larynx shown in the Fig. 1 indicates normal, subglottic and supraglottic tumors [3, 4].
(a) normal larynx
(b) Subglottic tumor
(c) Supraglottic tumor
Fig. 1. CT images of larynx [4]
The entire manuscript contains four sections with Sect. 1, introducing the readers to Larynx tumors, Sect. 2, work carried out so far related to laryngeal cancer, Sect. 3, describing the various steps involved in identifying the laryngeal cancer, Sect. 4, depicting the results and their connected discussion and finally Sect. 5, with concluding remarks.
2 Literature Survey Related to Laryngeal Cancer Cancer includes the chondrosarcoma, lymphoma and the paraganglioma to be the commonest types of laryngeal cancers. The endoscopy may not reveal such submucosal types except for a bulge. The presence of sub-mucosal mass may be confirmed if boundaries are detected by the imaging techniques. This will guide the endoscopist to the most appropriate site for biopsy [5]. In pretherapeutic stagging of SCC in larynx, CT and MRI play an important role in diagnosis. The extent to which, the cancer is spread is determined by ‘T’ staging
Smart Imaging System for Tumor Detection in Larynx
1373
method for treatment of laryngeal cancer in patients. The tumor spread in the different regions of larynx can be interpreted from MRI and CT scan using standardized image processing algorithms which identify those using key anatomical features and characteristic patterns [6, 7].
3 Methodology Images of the larynx with benign and malignant tumor are taken from the open source data base for diagnosis. This data base consists of images which are nearly 25 benign and 25 malignant accounting to a total of 50 images of the larynx. The noise removal is done using median filtering. The overall schematic is represented in Fig. 2.
Fig. 2. Overall schematic for identification of larynx tumor
3.1
Feature Extraction
For the detection to take place the multi-dimensional image should be transferred to a two dimensional feature space which is done using Principal Component Analysis (PCA). The PCA computes Eigen values and Eigen vectors followed discriminant
1374
N. P. G. Bhavani et al.
vectors which facilitates the mapping of N-dimensional feature set on to a two dimensional space thereby reducing the computational complexity. The Eigen values and Eigen vectors are unique values for each and every image of larynx, hence it helps to discriminate the images of larynx which are affected by benign and malignant tumors [8, 9]. 3.2
Detection of Skin Lesion Using Radial Basis Function Networks (RBFN)
RBFN approach is more spontaneous than the Multi Layer Perceptron (MLP). The similarity measure is used by RBFN to classify the input samples used for training the RBFN. Every node of RBFN stores a sample as an example from the training set. When the real time classification has to take place with a new and unknown input, each node computes the Euclidean distance between the input and its sample. If the input resembles the first sample type then the new input is assigned to benign category else it is assigned to malignant category. The network architecture for RBFN is shown in Fig. 3. The flowchart in Fig. 4, depicts how the RBFN is used for detecting the benign and malignant tumors in larynx.
Fig. 3. Architecture for RBFN
Smart Imaging System for Tumor Detection in Larynx
1375
Fig. 4. Flowchart for RBFN for diagnosing the larynx tumor
4 Results and Discussion The RBF network uses discriminant vectors from PCA as inputs. The number of neurons in hidden layer is called as RBF neurons which uses Gaussian function as the activation function and finally the output layer uses threshold function as the activation function. The input vector is the two-dimensional vector which is the output of the PCA which is subjected to classification. The entire input vector is connected to each of the
1376
N. P. G. Bhavani et al.
RBF node. Each RBF node stores a sample vector which is one of the discriminant vectors from the training set and compares the same with sample to produce the outputs in the range of the similarity index so that the benign and malignant tumors present in the larynx are identified as shown in Fig. 5. Those images marked with red circles indicate the images of the larynx with benign tumors and those indicated by a blue cross denotes the images of the larynx with malignant tumors. The Diagnostic Efficiency is illustrated in Table 1.
Fig. 5. Classification of larynx tumor by RBFN
Table 1. Determination of diagnostic efficiency for RBFN S. No Total no. images for larynx Benign Malignant 1. 25 25
Diagnostic efficiency (%) Benign Malignant 94.2 95.3
5 Conclusion The RBFN when used to develop an automated larynx tumor detection system yielded a diagnostic efficiency of nearly 94.2% and 95.3% to identify the benign and malignant tumors. On par with the performance, these outputs are tested by taking the opinion of experts for demonstrating an artificially intelligent scheme for classifying the larynx tumor with a competence level comparable to the radiologists. Automated medical imaging system using RBFN, can also be developed as a mobile application so that the
Smart Imaging System for Tumor Detection in Larynx
1377
devices can efficiently comply with the requirements of the radiologists outside the hospital. It is projected that nearly six billion smart phone subscriptions will exist in a couple of years and can therefore potentially provide a smart detection low cost universal access to vital diagnostic care. In future the classification efficiency can be improved by using advanced filtering techniques and robust classifiers like deep learning neural networks. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest. ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Hashibe, M., Boffetta, P., Zaridze, D., Shangina, O., Szeszenia-Dabrowska, N., Mates, D., et al.: Contribution of tobacco and alcohol to the high rates of squamous cell carcinoma of the supraglottis and glottis in central Europe. Am. J. Epidemiol. 165, 814–820 (2007) 2. Nikolaou, A.C., Markou, C.D., Petridis, D.G., Daniilidis, I.C.: Second primary neoplasms in patients with laryngeal carcinoma. Laryngoscope 110, 58–64 (2000) 3. Connor, S.: Laryngeal cancer: how does the radiologist help? Cancer Imaging 7, 93–103 (2007) 4. Hermans, R.: Staging of laryngeal and hypopharyngeal cancer. Eur. Radiol. 16, 2386–2400 (2006) 5. Yousem, D.M., Tufano, R.P.: Laryngeal imaging. Magn. Reson. Clin. N. Am. 10, 451–465 (2002) 6. Becker, M., Burkhardt, K., Dulguerov, P., Allal, A.: Imaging of the larynx and the hypopharynx. Eur. J. Radiol. 66, 460–479 (2008) 7. Curtin, H.D.: The larynx. In: Som, P.M., Curtin, H.D. (eds.) Head and Neck Imaging, 4th edn, pp. 1595–1699. Mosby, St Louis (2003) 8. Zbaren, P., Becker, M., Läng, H.: Pretherapeutic staging of laryngeal carcinoma: clinical findings. CT MRI histopathol. Cancer 77, 1263–1273 (1996) 9. American Joint Committee on cancer: Larynx. In: Greene, F.L., (ed.) American Joint Committee on Cancer: AJCC Staging Manual, 7th edn., pp. 57-68. Springer, New York (2010)
Effective Spam Image Classification Using CNN and Transfer Learning Shrihari Rao(&) and Radhakrishan Gopalapillai Department of Computer Science and Engineering, Amrita School of Engineering, Bengaluru, Amrita Vishwa Vidyapeetham, Bengaluru, India [email protected], [email protected]
Abstract. Machine Learning is emerging as one of the most active research areas in the field of Artificial Intelligence [AI]. It is significantly beneficial for artificial systems to be able to find reusable characteristics that capture structure in the setting in order to enable a more flexible and general learning. In this work, we have developed an Image Classifier model, which can identify spam images such as “good-morning”, “good-night” and “few quotes message with image” from the group of images and classify it accordingly. To accomplish our objective, together with pre-trained information set such as VGG16, VGG19 and MobileNet model, we retrained the model with fresh information dataset. In the assignment of large-scale image classification, supervised learning using profound convolutionary neural network has demonstrated its promise. It is now well placed as a construction block to be component of a bigger scheme that addresses multimedia duties in real life. Our model is capable of handling dynamic data sets and delivers an excellent accuracy. Keywords: Deep learning Incremental model Fine tuning model CNN model
Spam image classification
1 Introduction Object classification is a simple job for human beings, but for machine it has been viewed as a complicated issue. It has received an increased attention due to an increase in high-capacity pcs, accessibility of high-quality, low-priced video cameras, and a growing need for automatic video or image analysis. Currently, the Internet has several effects on our daily life cycle. One of the researchers multifarious duties is the accurate classification of the field-captured photographs, social media pictures, whatsapp/stored pictures, etc. Human can do manual surveillance can be performed using field surveys. This method’s constraints are high labor costs and it takes a long time. Convolution Neural Networks have been successful in classifying images. CNN fundamentally has a layer of convolution, a pooling layer and a completely connected layer. Filter weights are defined in the convolutionary layers. While preparing the CNN model, the uneven seated weights are refreshing for subsequent epochs. Relu function is provided in order to stay away from the vanishing gradient issues. In the interim, the CNN adds non-linearity, enabling the scheme to portray non-linearities [1, 5]. © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 1378–1385, 2020. https://doi.org/10.1007/978-3-030-37218-7_145
Effective Spam Image Classification Using CNN
1379
In addition, it is essential to create incremental teaching techniques in an epochs in which information sets are developing frequently, with fresh courses and samples. A typical idea to mitigate problem of building of model from scratch we can utilize preprepared CNNs for work on a careful arrangement of information and adjust them to contemporary data sets or assignments, rather than train the full system starting with scrach [2]. One strategy to adapting a network to fresh information or assignments is fine tuning [3]. Here, the initial network’s output layer is adapted either by replacing it with classes that correspond to the fresh assignment, or by adding fresh courses to current ones. At that point the loads in this layer are introduced haphazardly, and all system parameters are acclimated to the objective for the new assignment. While this structure on the crisp classes is exceptionally viable, its proficiency on the antiquated classes will endure drastically if the system isn’t prepared together on all classes [4]. This issue, wherein a neural system overlooks prior got the hang of understanding when it is changed in accordance with a new task, is called heartbreaking obstruction or overlooking. This is also called overfitting.
2 Related Work Our study endeavors to carry the guarantee of profound figuring out how to make it practical and increasingly broad worldview of gradual learning. We sort out the exchange of the related work according to studies in computer vision fields. Classification of Using CNN: Characterization step orders identified objects in predefined classes [6, 7] using reasonable techniques that contrast image designs and target patterns. The classification can be visualizing by below picture Fig. 1(a). The key structure of CNN comprises of two layers, the first is for extraction of the capacity, and every one of the neurons in a single layer is associated with the past layers nearby window. Other layer is the mapping layer of highlights. CNN is utilized as it permits changes and bending in the image to be strong, uses less memory requests, is easier, and offers the better preparing model [6]. Convolution layer that converts with a tiny filter through the input picture. This weight may be random at the beginning, and subsequently the filter’s co-efficient will be used to convert. Dot weight product resulting in comparative figures with specified inputs. Using the RELU layer to activate the next layer [7]. For effectiveness and precision purpose zero padding and striding will be used. Classification Using Deep Learning and Incremental Model: Noteworthy advancement has been made in sorting multi-class pictures over the previous century, both regarding classification amount and exactness. ImageNet, for instance, has around 22 K categories [8]. An unrefined meaning of gradual learning is that learning with crisp data is a progressing technique. Earlier research managed two critical circumstances, the second is appropriate to this exploration. The first is idea float in the information stream of training, this helps the classifier to learns in a non-stationary condition [6, 10]. The second is when current classifiers are related with the crisp courses to be educated.
1380
S. Rao and R. Gopalapillai
Along these lines proposed the inquiry about in the case of inclining the nth classifier is simpler (than examining the first), and from that point forward researchers have started to handle this issue utilizing move learning instinct, utilizing strategies that utilization less examples to get new models or better speculation [11]. In the context of incremental learning or in transfer learning, the task with DCNN is to mix in one architecture function extractor and classifier. Goodfellow et al. [8] studied in gradient-based neural networks the disastrous issue of forgetting. However, the analysis has been done on the tiny networked MNIST data set, and recommend to use available deep convolution neural network configuration. Nevertheless, it reaffirms qualitatively the trouble of achieving brilliant execution all the while on old and new assignments. Numerous works have focused on domain adjustment, for example, oneshot learning by modifying attributes learned ImageNet 1 K information set to different datasets [12]. The latest zero-shot learning work assumes that a semantic embedding room is available to which DCNN outputs are projected. Our job varies in the objective as model has to transfer the learning with a bigger dataset in the same assignment. There were 2 different incremental model i.e. clones and a flat increment [13]. Calibrating among order and division offers reasonable projections of the system. Indeed, even the most exceedingly terrible model accomplished almost 75% of the past best results. VGG16 is by all accounts 56.0 superior to past techniques. Despite the fact that VGG and GoogleNet are as precise as classifiers, FCN-VGG19 is more exact than other pre-prepared models [12].
3 Fine Tuned Incremental Model Its most necessary to use a pre-prepared model for all of ours own needs, we start by evacuating the first classifier, at that point we include another classifier that accommodates our motivations, lastly it should tweak our model to one of three methodologies: Train the Whole Model: Its utilization the pre-prepared model design for this situation and train it as indicated by your informational index. You’re taking in the model sans preparation, However its required huge data set for training from scratch for better accuracy (and higher computational power PC). Train Certain Layers and Keep Other Layer Solidified: Lower layers suggests general properties (free issue), while higher layers insinuate express characteristics (subordinate issue). As a general rule, if a little course of action of data and various parameters, to sidestep over fitting, required to leave more layers hardened. Of course, if the instructive file is huge and the amount of parameters is close to nothing, then required to improve the model by means of getting ready more layers for the new task as over fitting isn’t an issue. Convolution Foundation Frozen: Major idea is to keep up the base of the convolution in its unique structure and utilize its yields to encourage the classifier. Once utilized pre-prepared model as a fixed extraction component, then it helpful for less chance of computers computational power. This aids the advantage that informational
Effective Spam Image Classification Using CNN
1381
collection is little and additionally pre-prepared model takes care of an issue that is fundamentally builds the model. Below Fig. 2(b) provides a schematic presentation of above mentioned three approaches. Fine tuning algorithm with second method is updated below in Fig. 1.
Fig. 1. Fine tuning algorithm using Keras
Fig. 2. (a) Basic steps for followed for CNN (b): Various way of incremental model building
Indeed, only optimizing in a subgroup of the feature space has not been updated with new weights for the network. If our training dataset is similar to any image dataset, there wont be concern about accuracy if datasets are very distinct from the image set, freezing will mean a reduction in precision. Unfreezing everything will allow us to optimize the entire space of the feature and receive a better optima. However computational time will be huge for this.
1382
S. Rao and R. Gopalapillai
Selection of New Samples for Training the Incremental Model: We propose two distinctive memory arrangements to test the conduct of our gradual learning approach in various situations. The principal arrangement thinks about that there is restricted limit of K tests in the agent memory. The more classes are put away, the less examples per class are held, as the memory limit is free of the quantity of classes. The quantity of tests per class “m” is in this manner characterized by m = K/c. Here c is the quantity of classes put away in the memory. Here assumption is memory limit is K. The subsequent establishment retains a many duplicates for each class. Sample training pictures are shown below Fig. 3 for incremental model.
Fig. 3. Sample image used for spam detection [8]
4 Implementation Details Our models are actualized on Keras, tensor flow and python, 30 epochs and an additional 30 epochs are performed for adjusted calibrating for each steady stage. Our initial 30 epochs learning rate starts at 0.1 and is divided by 10 each on 10 epoch. In case of tweaking, a similar abatement is utilized, then again, actually the beginning recurrence is 0.01. We train the systems utilizing traditional stochastic inclination decent with around 500 examples, which is smaller than expected groups, 0.0001 weight decay and 0.9 energy. This limit has been set to handle over fitting. We apply L2-regularization and irregular clamor with parameters p = 0.5, p = 0.55) on the slopes. However all these parameter we could configurate in Keras while building the model. Since class-steady learning benchmarks are not effectively available, we are following the standard setup [9, 14] of partitioning the classes of a conventional multiclass accumulation of data into gradual groups.
Effective Spam Image Classification Using CNN
1383
VGG 16 and VGG 19 Dataset [4]: In the 2014 [8] article, profound convolution neural system Simonyan and Zisserman propelled the VGG arrange design. This system is characterized by its straight forwardness, utilizing just 33 convolution layers stacked in developing profundity over one another. Max pooling is utilized to decrease volume size. A SoftMax classifier pursues two fully connected layers, each with 4,096 nodes. On top of it we added few spam images Fig. 4(a) depicts the accuracy against epochs for incremental model with frozen and partially frozen layer. It can be observed that all frozen conv layer has less accuracy Since its holds only predefined images. Red color graphs depicts accuracy for the few layers frozen network with weights transferred to next layer with new set of data.
Fig. 4. (a) Comparison of partially and fully frozen incremental model, (b) Loss and Accuracy against epochs for partially frozen VGG16
High accuracy is accomplished after 35 epochs in the case of few layers frozen network. The accuracy does not enhance in later epochs even reduces in 25–45 epochs intervals. The graph in Fig. 4(b) shows the intersection between accuracy for new data and validation loss. The loss of validation propagates the model’s capacity to simplify fresh information. If our training information accuracy (“acc”) continues to improve as our validation information loss (“val loss”) deteriorates, For our data set on nearly 45 epoch validation accuracy is deteriorating as training accuracy is increased. Around 32nd epoch onwards, it is likely to be in an overfitting scenario, i.e. our model just begins memorizing the information. This means the model can show the same result after the 35th epoch, but it won’t be better. This model is therefore enough to train on 45 epochs. However, to receive the best accuracy, its mandatory to have more datasets for training. My current experiment yields result as tabulated below. Epochs vs loss and epoch vs accuracy plots captured while building the model for vgg16 of partially frozen model. As we observe the graphs, loss is less against than accuracy. İdeally loss would be zero if system understood image clearly. Practically its impossible in practical.
1384
S. Rao and R. Gopalapillai
However specific to respective task model can yields better results Detailed numerical information of loss and accuracy has been tabulated Table 1 below. Table 1. Accuracy and loss for 45 epochs on incremental learning Epochs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Loss 3.0904 2.4273 2.1312 2.0602 2.0085 1.8754 1.8271 1.7587 1.7025 1.6631 1.6595 1.6051 1.6069 1.5683 1.5974
Accuracy 0.3962 0.3982 0.4025 0.4058 0.4568 0.4275 0.4712 0.4952 0.5002 0.5093 0.5174 0.5237 0.5329 0.5391 0.5486
Epochs 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Loss 1.5459 1.5134 1.5376 1.4886 1.5021 1.5417 1.5534 1.4928 1.5577 1.4779 1.4881 1.492 1.4561 1.4365 1.3926
Accuracy 0.5587 0.5575 0.5624 0.5928 0.5728 0.5792 0.5876 0.5881 0.5993 0.6025 0.6482 0.6132 0.6285 0.6297 0.6359
Epochs 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
Loss 1.3981 1.3988 0.3987 0.388 0.3851 0.3746 0.3758 0.3801 0.3325 0.3672 0.2564 0.3456 0.3782 0.3473 0.3684
Accuracy 0.6522 0.6825 0.6477 0.6492 0.6583 0.6588 0.6592 0.6631 0.6691 0.6694 0.6718 0.6718 0.6757 0.6773 0.6774
On analyzing the above table, it clearly indicates that accuracy is increased as epochs increased, and loss is travelling opposite to accuracy. As we increase data set accuracy will reduce. Overall accuracy has been listed in Table 2 for various preprepared model. However, with small data set it achieved better results. Table 2. Accuracy for various incremental learning Models New data scratch model build New data model merged with VGG19 New data model merged with VGG16
Accuracy (45 epochs) 31.15% 65.25% 67.44%
Validation accuracy (45 epochs) 9% 52.41% 58.53%
Loss (45 epochs) 12.78% 38.42% 36.84%
5 Conclusion In this paper the convolution neural system-based methodology alongside with a steady methodology that has been used for developing a binary classification of spam pictures for example sort of “hello”, “goodbye” picture messages and couple of other spam pictures with statements. It tends to make a progressively summed up model by
Effective Spam Image Classification Using CNN
1385
gathering more spam pictures for information collection. It will likewise improve the precision of the model. In future study, CNN and Text classification techniques can be combined to make the model to be much more precise in recognizing the spam pictures. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest. ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Sowmya, V., Soman, K.P., Deepika, J.: Image classification using convolutional neural networks. Int. J. Sci. Eng. Res. 5(6), 1661–1668 (2014) 2. Wu, Y., Qin, X., Pan, Y., Yuan, C.: Convolution neural network based transfer learning for classification of flowers. College of Computer and Information Engineering, Guangxi Teachers Education University Nanning, China 3. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature, hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014) 4. Neena, A., Geetha, M.: Image classification using an ensemble-based deep CNN. In: Advances in Intelligent Systems and Computing, vol. 709, pp. 445–456 (2018) 5. Suresh, R., Prakash, P.: Deep learning based image classification on amazon web service. J. Adv. Res. Dyn. Control Syst. 10, 1000–1003 (2018) 6. Kavitha, D., Hebbar, R., Harsheetha, M.P.: CNN based technique for systematic classification of field photographs. In: International Conference on Design Innovations for 3Cs Compute Communicate Control (2018) 7. Korshunova, K.P.: A convolutional fuzzy neural network for image classification. Moscow Power Engineering Institute in Smolensk, Smolensk, Russia 8. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) 9. Shmelkov, K., Schmid, C., Alahari, K.: Incremental learning of object detectors without catastrophic forgetting. Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France 10. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016) 11. Thrun, S.: Is learning the n-th thing any easier than learning the first? In: Advances in neural information processing systems, pp. 640–646 (1996) 12. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Computer Vision and Pattern Recognition (2009) 13. Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2017) 14. Rebuffi, S.A., Kolesnikov, A., Sperl, G., Lampert, C.H.: iCaRL: incremental classifier and representation learning. In: CVPR (2017)
Unsupervised Clustering Algorithm as Region of Interest Proposals for Cancer Detection Using CNN Ajay K. Gogineni1(&), Raj Kishore2, Pranay Raj1, Suprava Naik3, and Kisor K. Sahu4 1
School of Electrical Sciences, IIT Bhubaneswar, Bhubaneswar 752050, India {akg11,pr16}@iitbbs.ac.in 2 Virtual and Augmented Reality Centre of Excellence, IIT Bhubaneswar, Bhubaneswar 752050, India [email protected] 3 Department of Radiodiagnosis, All India Institute of Medical Sciences, Bhubaneswar, Bhubaneswar 751019, India [email protected] 4 School of Minerals, Metallurgical and Materials Engineering, IIT Bhubaneswar, Bhubaneswar 752050, India [email protected]
Abstract. Deep learning methods are now getting lot of attention due to their success in many fields. Computer-aided bio-medical image analysis systems act as a tool to assist medical practitioners in correct decision making. Use of Deep Learning algorithms to predict Cancer at early stage was highly promoted by Kaggle Data Science Bowl 2017 competition and Cancer Moonshot Initiative. In this article, we have proposed a novel combination of unsupervised machine learning tool which is modularity optimization based graph clustering method and Convolutional Neural Networks (CNN) based architectures for lung cancer detection. The unsupervised clustering method helps in reducing the complexity of CNN by providing Region of Interest (ROI) proposals. Our CNN model has been trained and tested on LUNA 2016 dataset which contains Computed Tomography (CT) scans of lung region. This method provides an approximate pixel-wise segmentation mask along with the class label of the ROI proposals. Keywords: Machine learning Graph clustering Modularity CNN ROI CT scan Cancerous nodules
1 Introduction The current research in cancer detection using deep learning methods is increasing exponentially. There are several software based firms which are spending huge amount of money on machine learning based researches including Google and Microsoft [1]. The main reason behind this growth is the availability of a huge amount of labelled data as open source in various competitions such as LUNA 2016 and Kaggle Data Science Bowl 2017 [2, 3] and also the development of fast computing tools like GPUs which © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 1386–1396, 2020. https://doi.org/10.1007/978-3-030-37218-7_146
Unsupervised Clustering Algorithm as Region of Interest Proposals
1387
makes computation several times faster than CPUs [4]. Cancer Moonshot Initiative [5] promoted the idea to accelerate cancer research and predict cancer at an early stage. This increased the number of researchers working on it. Around 9.6 million people died in 2018 due to cancer [6] and around 33% of the deaths were caused by smoking and use of tobacco, which directly affects the lung. The chance of survival of a lung cancer patient decreases with increase in the size of the cancerous nodule. So the early detection of a cancerous nodule in the patient lung helps in increasing his/her survival probability. Due to early detection, doctors can take timely therapeutic intervention and can reduce mortality. Several deep learning models have already been developed for computer-aided lung cancer detection. Many deep learning algorithms, for both segmentation and classification are being used to analyse CT scans of lung region by taking advantage of the labelled data. Approach to Kaggle Data Science Bowl 2017 competition of a particular team is described in [7]. Use of 3D Dense U-Net for Lung Cancer segmentation from 3D CT scans is proposed in [8]. The purpose of the present article is to introduce a novel combination of unsupervised machine learning tool, which is a modularity optimization based graph clustering method and deep learning method for classification of cancerous nodules inside the lung. The clustering algorithm was developed for spatially embedded networks [9, 10].
2 Method This section consists of three parts: A, B and C. In Part A, we will discuss about the type of data used and its pre-processing, In Part B, we will discuss the unsupervised machine learning method which is used to detect the nodules in the lung. In Part C, we will discuss the CNN architecture that is used for classification of cancerous nodules which are detected in part B and how to obtain a rough segmentation mask of cancerous nodules. 2.1
Part A: Data Type and Pre-processing
The data is obtained from LUNA 2016 challenge [2]. It has 880 CT scans of clinically proven lung cancer cases. A CT scan is stored in a dicom format which is converted to a 3D matrix of size 512 512 z. z is the number of 2D slices in the 3D volume, which lies in the range 150 to 280. The 2D images are obtained by taking a particular slice along the z direction which has a shape of 512 512 and contains grey scale values which indicate the electron density obtained from the CT Scans. A significant fraction of the scan does not contain lung region. Since we can find cancerous nodules only in the lung region, an initial pre-processing step is to mask the lung region from the entire scan (explained in the next section). The masked lung region is shown in Fig. 1(b). Further processing is done on the masked image. A separate annotation file contains the locations of cancer in all the scans. This information is used to extract the 2D slices containing cancer.
1388
A. K. Gogineni et al.
2.1.1 Lung Segmentation Lung segmentation is done using marker controlled Watershed segmentation [11], which is a morphological operation. Sobel filter [12] is applied on the Image along x and y directions which are later used to compute the Sobel gradient. The pixel values of the image lie in the range −1024 to +1000. The portion surrounding the lung region has a very high positive value of intensity as compared to the lung region. Inside the lung, there are blood vessels, cancer nodules, etc. which has high positive intensity value. The remaining portion of the lung region has intensity value around −750. Using these intensity values, we obtain internal and external markers corresponding to inside and outside the lung region. Watershed segmentation algorithm is then applied to the Sobel gradient image along with the markers. Markers are used to avoid over segmentation. 2.2
Part B: Unsupervised Machine Learning Method
The segmented portion is then converted into a network. This network is then clustered using unsupervised clustering method discussed in Sect. 2.2.2 to detect the different nodules present. These steps are discussed in details in following sections. 2.2.1 Network Generation A network is defined in terms of nodes and edges between these nodes. The segmented lung region is represented into network by considering its pixels (2D) centre as node and an edge is formed between each node and its first nearest neighbouring pixels. In 2D, there may be either four or eight neighbours are possible based on the neighbourhood definition. Here, we have considered edge and diagonal pixels as neighbours. Thus eight neighbours are possible for inner pixels. The generated network is unweighted, since weight of each edge is unity.
Fig. 1. Lung portion segmentation (a) the Dicom image and (b) the segmented left lung portion (white region).
2.2.2 Network Clustering The generated network of lung portion is partitioned into “communities” which can also be called as nodules in this case by maximizing a modularity function given in Eq. 1. The modularity function was developed by Raj et al. for spatially embedded networks [6]. They have successfully applied this function on granular networks obtained from granular assemblies [7] for identifying the community structure which
Unsupervised Clustering Algorithm as Region of Interest Proposals
1389
has highest associability with the naturally identifiable structures. The modularity function reads as: QðrÞ ¼
1 X aij Aij hðDxij Þbij Jij 2d ri ; rj 1 2m i;j
ð1Þ
Here m is the total number of edges in the network, aij and bij are the strength of connected and missing edges between ith and jth nodes respectively. The edge strengths, aij and bij, can also measures the similarity between the ith and jth nodes. The connection matrix Aij contains the information of edge between nodes. The elements of connection matrix is either zero or one. The parameters aij and bij is the difference between the local average grey scale value (a similarity parameter) of node i and j and the global average grey scale value of the given network as shown in Eq. 2. ki þ kj aij ¼ bij ¼ \k [ 2 Where, \k [ ¼ N1
N P
ð2Þ
kr and N is the total nodes and k is the node degree.
r¼1
2d ri ; rj 1 ¼
1 1
ri ¼ rj ði:e in same communityÞ ri 6¼ rj ði:e: in different communities)
ð3Þ
The modularity function in Eq. 1 is not comparing the actual connections (Aij) with some predefined null model, so it is independent of any null model comparison. It also takes into accounts the effect of inter-community edges while determining the most modular structure as depicted in Eq. 3. It has another distinction over other existing methods of modularity calculation. It uses Heaviside unit step function h (Dxij) to restricts the over penalization for missing links between two distant nodes within same community. It defines the neighborhood and penalizes only for the missing links inside the neighborhood. Here is the difference in Euclidian distance between nodes i, j and a predefined cutoff distance for neighborhood xc, which is chosen as xc = 1.05 (Ri + Rj) for the present study (where, R is the distance between two touching pixels). The unsupervised learning algorithm consists of three main steps. First step is initialization. Here, each node is considered to be in a different community. Later we merge the nodes to form a community. Each node is selected iteratively and checked for possible merging sequentially with all its connected neighbor communities. Modularity is calculated after each merge and the merge which has highest positive change is selected. Node merging continues until no further merging increases the modularity of the system. Final step is Community merging. The node merging often converges at local maxima in the modularity curve. In order to remove any local maxima trap, we merge any two connected community. If this merging increases the modularity then it is selected. The output of community merging is the final clustered network which has the highest value for modularity.
1390
2.3
A. K. Gogineni et al.
Part C: Supervised Machine Learning
Various semantic segmentation algorithms exist [13] out of which Regional based Convolutional Neural Network (RCNN) related algorithms [14] use a two steps process: first step is to segment the image into various ROI’s and the second step to perform semantic segmentation on the proposed ROI’s. RCNN proposes ROI’s from an image by using selective search [15]. Initially the image is segmented into multiple small regions and later these small regions are combined (based on colour similarity, texture similarity, size similarity, etc.) to form a larger region. These regions are the Region of Interest Proposals (ROI’s). These ROI’s are warped or resized to a specific size which is fed to a ConvNet where the output of ConvNet is used as a Bounding Box Regressor and a SVM classifier. Around 2000 ROI’s are generated per image and requires the CNN to be run 2000 times on a single image. FastRCNN [16] is built to share the computation across all the 2000 regions. The ROI’s are generated using a CNN. ROI’s are resized to a specific shape using ROI pooling and the resized regions are fed into a Fully Convolutional Network (FCN). A softmax layer is used on top of the FCN to predict output classes. The RCNN type architectures perform very well when we have a precise bounding box coordinates and labels of each bounding box which are hard to obtain and expensive. An end-to-end learning process using MaskRCNN [17] to perform all the tasks of interest at once such as ROI proposal, Bounding Box Regression, Classification and semantic segmentation of all the ROI’s is very well suited when the dataset is well labelled which is not always the case. Some domains such as medical does not have datasets with rich labels for all the problems of interest. In such sparsely labelled datasets, dividing the training process into sub-blocks can help in understanding the data better. We can deal with individual blocks separately and combine the results when required. Instead of using CNN or selective search, we propose a clustering based algorithm to find regions in the 3D volume and in 2D images that are similar to each other based on the pixel values. Reducing the amount of data fed to CNN reduces computation time and learns features that are more relevant. The clustering method also stores cluster ID’s which are serial numbers where pixels with same cluster ID belong to a cluster which can be used to fine-tune the segmentation masks obtained as outputs from UNets [18]. 2.3.1 Proposed Model Using the clustering algorithm, we obtain less than 100 ROI’s of fixed size and fed as input to CNN which acts as a classifier and predicts whether the ROI is cancerous or not. The ROI’s of fixed size are fed to CNN where binary classification is done. The data consists of 8050 Images of size 50 50 generated from LUNA dataset. Out of these 8050 images, 5187 images are used for training, 1241 for validation and 1622 for testing. Training of CNN was done on K80 GPU, provided for free by Google Colaboratory. We used Transfer Learning, where the initial weights of CNN are same as that of pre-trained CNN model weights trained on ImageNet [19]. These weights are then fine tuned to perform classification on a different dataset. We trained the network using
Unsupervised Clustering Algorithm as Region of Interest Proposals
1391
fit_one_cycle and learning rate schedulers techniques [20]. The main idea is to train different regions (hidden layers) of the CNN with different learning rates. The initial layers of the CNN learns features that are general to any image classification task such as horizontal and vertical edges, so we use a higher learning rate for the initial layers. As we go deeper into the network, the layers learn features that are more specific to the target dataset and also learn the discriminative features between various classes [21]. So, these layers are trained at a lower learning rate. The learning rate is varied between a specific upper and lower bound that are selected using learning rate finder. Using a lower learning rate results in very less loss and hence very minor changes in the gradient which takes lot of time to converge. Higher value of learning rate results in large gradients and we might overshoot from attaining the global minima. We should pick a learning rate that is neither too high nor too low to get a better generalisation and quicker training [20]. Using a single learning rate might raise the problem of getting stuck at local minima or at saddle points where the gradient is not enough to push the loss to a global minima or away from local minima in the loss landscape. We select an upper and lower bound on learning rates which are 10 times different in magnitude and train the CNN. Typically the upper bound is 1e−4 and lower bound is 1e−5. The higher value of learning rate helps in getting out of local minima or saddle points. This helps in moving to a stable point in the loss landscape. The algorithm consists of the following steps. Initially, the Lung region is masked from the image. Later, we perform clustering on the lung region which divides the image into a number of smaller regions (clusters). A 50 50 region is cropped around all the clusters. These 50 50 regions are used to train the CNN to perform binary classification. 2.4
Different CNN Architectures
We have used various pre-trained CNN architectures e.g. SeResNext50, ResNet50 and Densenet161 and among them Densenet161 provided better results for classification. These different architectures are discussed below. ResNet50: Deep Neural networks were initially harder to train and diverge from local minima frequently which resulted in a poor training loss. Typically the mapping of inputs x to outputs y is done through a single neural network, represented by function H(x). The idea of ResNet [22] was to modify the mapping by adding an extra skip connection to the network, represented by H(x) + x. They result in a smooth loss landscape, due to which we do not get stuck at local minima and saddle points and get a better training. We used a variant of ResNet named Resnet50 which has 50 convolutional layers [22]. SeResNext50: SeNets are proposed in [23]. Unlike many neural network architectures that focus on finding informative features along the spatial dimension, SeNets focus also on the channel-wise features. The basic SeNet block, SE block maps the input X to a feature map, U by convolution. In SeResNext [23, 24], a squeeze-and-excitation block is applied at the of each non-identity branch of ResNext Block. ResNext block is an extension to ResNet block. ResNet block has a single skip connection and a single convolution layer. ResNext block has multiple convolution layers along with a single skip connection [24].
1392
A. K. Gogineni et al.
DenseNet161: DenseNet architecture is similar to ResNet architecture, where each layer is connected to every other layer in a feed forward fashion [22]. The skip connections are not just added but are concatenated after each layer. The features computed at initial layers can be used at later layers. The individual blocks of various CNN architectures are stacked on top of one another with some simple convolution layers in between to maintain the dimensionality while going from one layer to the next. It contains various Dense Blocks as shown in Fig. 2(d).
Fig. 2. Building blocks of CNN based architectures (a) ResNet50 (b) ResNext (c) SeResNext50 (d) DenseNet161
3 Segmentation Mask of Cancerous Nodules The cluster ID’s can be used to get rough object localisation similar to object localisation in [25]. Segmentation masks of various classes in an image is obtained using a CNN trained to perform classification. Saliency map of a particular class captures the most discriminative part of an image responsible for CNN to predict that class as output. Applying GrabCut segmentation algorithm [26] on these Saliency Maps provides a corresponding mask of the particular class. Class Activation Maps (CAM) [27] can also be used to find the discriminative regions in the image used by the CNN to perform classification. CAM can be obtained by performing global average pooling over the final activation layer of CNN. The feature map obtained after global average pooling is extrapolated to the size of original image. These maps show which regions in
Unsupervised Clustering Algorithm as Region of Interest Proposals
1393
the image was highly activated to produce a certain class. Instead of computing CAM, saliency maps and performing GrabCut segmentation for semantic segmentation, we can use a combination of clustering algorithm and CNN to localise objects and get a segmentation mask. The clustering algorithm produces bounding boxes and cluster ID’s. The bounding boxes are classified using a CNN and the corresponding cluster ID’s act as a Segmentation map of a particular class predicted by the CNN.
4 Results The result section is also divided into two parts. Part A contains the result of unsupervised machine learning technique and Part B has the result of proposed neural network architecture. 4.1
Part A: Clustering
As discussed in Sect. 2.2, the segmented lung portion is clustered into different “communities” based on the grey scale intensity of each pixel (Fig. 3). For the given system, these communities are called nodules. We have drawn a 50 50 square box to surround each nodule. The region inside these boxes are called ROI and later used as input to different CNN based architectures for classification purpose. We have generated *8050 ROI’s for training the CNN.
Fig. 3. Nodule detection in 2D. (a–c) Different CT scan slices at different height (Z locations) and (d–f) the clusters/nodules detected using clustering algorithm. The colour assigned to each nodule and the background is randomly selected. The red square boxes are the ROI’s which are used in different CNN architectures.
1394
4.2
A. K. Gogineni et al.
Part B: Cancerous Nodule Detection
For cancerous nodules detection, we have used approximately 8050 ROI’s for training different CNN architectures. These ROI’s (Fig. 5(a)) are classified as either cancerous (labelled as 1) or non-cancerous (labelled as 0) as shown in Fig. 4(b). Based on these classifications the cancerous ROI are selected and the cancerous nodule region (detected by clustering algorithm) is masked by assigning the pixel values as 1 and the rest pixels as 0. This creates a binary image as shown in Fig. 4(c).
Fig. 4. Cancer segmentation (a) Different ROI’s generated by image clustering (b) Classification labels on each corresponding ROI’s in panel (a), obtained by CNN based architectures (c) The cancer mapping of the nodule area detected by clustering algorithm.
The confusion matrix obtained from a CNN based architecture (Fig. 5) is used for accuracy measurement. The matrix has four elements named ‘True positive (TP)’, ‘True negative (TN)’, ‘False positive (FP)’ and ‘False negative (FN)’. The TP and TN are the correctly detected nodules whereas FP and FN are the wrongly detected nodules respectively. An accurate model should have very low value of FN and high value of TP.
Fig. 5. Confusion matrices of different CNN based architectures: (a) SeResNext50, (b) DenseNet161 and (c) ResNet50.
Unsupervised Clustering Algorithm as Region of Interest Proposals
1395
5 Conclusions We have proposed a novel combination of unsupervised machine learning tool with CNN based architectures for lung cancer detection/classification. Complexity of a CNN can be reduced by using the techniques such as ROI proposals. We have shown that this method is helpful in providing an approximate pixel-wise segmentation mask along with the class label of the ROI proposals. We have tried different CNN based architectures like SeResNext50, DenseNet161 and ResNet50. From the false negative value of different architectures, it is clear that DenseNet161 is better than the other two architectures for the studied system. Obtaining pixel-wise segmentation masks is often time consuming and requires human expertise. This method can be used to generate a rough segmentation mask which can be modified later to get a good segmentation mask. An ensemble of various methods can be used to reduce the number of False Positives. If any of the models predict the image to be cancerous, we can make the overall prediction as cancerous. Acknowledgement. This project is funded by Virtual and Augmented Reality Centre of Excellence (VARCoE), IIT Bhubaneswar, India. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest. ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. https://www.techrepublic.com/article/report-the-10-most-innovative-companies-in-machinelearning/#ftag=CAD-00-10aag7f 2. LUNA. https://luna16.grand-challenge.org/ 3. Kaggle Data Science Bowl (2017). https://www.kaggle.com/c/data-science-bowl-2017 4. https://medium.com/@andriylazorenko/tensorflow-performance-test-cpu-vs-gpu79fcd39170c 5. Cancer Moonshot. https://www.cancer.gov/research/key-initiatives/moonshot-cancer-initiative 6. Cancer Research UK. https://www.cancerresearchuk.org/health-professional/cancer-statistics/ worldwide-cancer. Accessed January 2019 7. Kuan, K., Ravaut, M., Manek, G., Chen, H., Lin, J., Nazir, B., Chandrasekhar, V.: Deep learning for lung cancer detection: tackling the Kaggle Data Science Bowl 2017 challenge. arXiv preprint arXiv:1705.09435 (2017) 8. Kamal, U., Rafi, A.M., Hoque, R., Hasan, M.: Lung cancer tumor region segmentation using recurrent 3D-DenseUNet. arXiv preprint arXiv:1812.01951 (2018) 9. Kishore, R., Gogineni, A.K., Nussinov, Z., Sahu, K.K.: A nature inspired modularity function for unsupervised learning involving spatially embedded networks. Sci. Rep. 9(1), 2631 (2019)
1396
A. K. Gogineni et al.
10. Kishore, R., Krishnan, R., Satpathy, M., Nussinov, Z., Sahu, K.K.: Abstraction of mesoscale network architecture in granular ensembles using ‘big data analytics’ tools. J. Phys. Comm. 2(3), 031004 (2018) 11. Sankar, K., Prabhakaran, M.: An improved architecture for lung cancer cell identification using Gabor filter and intelligence system. Int. J. Eng. Sci. 2(4), 38–43 (2013) 12. Sharma, D., Jindal, G.: Identifying lung cancer using image processing techniques. In: International Conference on Computational Techniques and Artificial Intelligence (ICCTAI), vol. 17, pp. 872–880 (2011) 13. Guo, Y., Liu, Y., Georgiou, T., Lew, M.S.: A review of semantic segmentation using deep neural networks. Int. J. Multimedia Inf. Retrieval 7(2), 87–93 (2018) 14. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) 15. Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vision 104(2), 154–171 (2013) 16. Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) 17. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017) 18. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and ComputerAssisted Intervention, pp. 234–241. Springer, Cham (2015) 19. Simon, M., Rodner, E., Denzler, J.: ImageNet pre-trained models with batch normalization. arXiv preprint arXiv:1612.01452 (2016) 20. Smith, L.N.: Cyclical learning rates for training neural networks. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 464–472 (2017) 21. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: European Conference on Computer Vision, pp. 818–833. Springer, Cham (2014) 22. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770– 778 (2016) 23. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018) 24. Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017) 25. Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013) 26. Rother, C., Kolmogorov, V., Blake, A.: GrabCut: interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. (TOG) 23(3), 309–314 (2004) 27. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2929 (2016)
Assistive Communication Application for Amyotrophic Lateral Sclerosis Patients T. Shravani(&), Ramya Sai, M. Vani Shree, J. Amudha, and C. Jyotsna Department of Computer Science and Engineering, Amrita School of Engineering, Bengaluru, Amrita Vishwa Vidyapeetham, Bengaluru, India [email protected], [email protected], [email protected], {j_amudha,c_jyotsna}@blr.amrita.edu
Abstract. Eye tracking is one of the advanced techniques to enable the people with Amyotrophic lateral Sclerosis and other locked-in diseases to communicate with normal people and make their life easier. The software application is an assistive communication tool designed for the ALS Patients, especially people whose communication is limited only to eye movements. Those patients will have difficulty in communicating their basic needs. The design of this system’s Graphical User Interface is made in such a way that it can be used by physically impaired people to convey their basic needs through pictures using their eye movements. The prototype is user friendly, reliable and performs simple tasks in order the paralyzed or physically impaired person can have an easy access to it. The application is tested and the results are evaluated accordingly. Keywords: Amyotrophic lateral sclerosis Eye tracking data Fixation value Graphical user interface
Eye tracker Gaze
1 Introduction In patients with ALS, the brain loses the ability to initiate and control muscle movement, as the motor neurons die and the patient become paralyzed. He also loses his ability to speak. This condition is very frustrating to the patient. One of the Eye Tracking applications is, using this mechanism for assistive communication for physically disabled people. In this paper we are describing the assistive application we have developed for ALS patients which helps them to convey their basic needs through Human Computer Interaction (HCI) [1, 2]. Eye Tracker is a device which tracks the movement of eyes and determines the exact positions where the user is looking at the screen. Our application works on the data collected by the eye tracker. The easiest way to communicate is to use an application on a traditional computer where the mode of understanding Graphical User Interface (GUI) is done using pictures and conveying his needs is done using his eye movements which are being tracked by an eye tracking device. It is uncomplicated and painless way for the patients to convey the basic needs rather than typing the message by the patient with eye movements © Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 1397–1408, 2020. https://doi.org/10.1007/978-3-030-37218-7_147
1398
T. Shravani et al.
which is a stressful task. Pictures are also comfortable way of understanding the needs of patients by the care taker. An advantage of it is once the user or the patient gets used to the application, he needn’t search for the position of the particular block of need on the screen. This makes his job easier to convey his need to the care taker. Section 2 discuss the related work on establishing communicative system with the help of eye gaze features. Section 3 explains the system architecture which comprises the creation of GUI and its usage. Section 4 presents the results obtained. Heat map is used to visualize various results obtained. Section 5 draws the conclusion and future scope of this research study.
2 Related Work Various interaction methods are being invented based on eye-tracking technology with the information concerning the patient’s needs and limitations of eye trackers combined [3–5]. There are various approaches which use eye tracking as input to computers. The first approach uses eye gaze as a pointing device and combined with touch sensor to work as a multimodal input [6]. The second approach uses eye gestures as input to computer. The third approach is providing a user interface which assist the user with eye tracking technology. An eye communication system is a graphical user interface with the icons of basic needs of the patient and alphabets for entering text and it works with the help of an eye tracker [7]. Functionality of the communication system is accessed based on the usability tests. A cheap, reliable and portable communication support system for the disabled developed using Electro-Oculography. Four directional eye movements and blink are tracked using 5 electrode EOG configuration for communication using a visual board for reference [8]. The board was specifically designed based on modelling of English language to increase the speed by reducing the number of eye movements required. Using these system users has achieved an average speed of 16.2 letters/min. An eye tracking application has been developed using the infrared sensors and emitters as the input device. The screen was divided into no one grids and hierarchical approach is used for selection of those grids [9]. The advantages of this system are low cost and user friendliness. An assistive communication tool has been developed which consists of a camera and a computer. The images will be captured by the camera and the system can interpret the data contained in the images [10]. Thus, this tool focuses on camera-based gaze tracking. Various eye tracking applications built using different technologies, their advantages and limitations are shown in Table 1.
Assistive Communication Application
1399
Table 1. Comparison of different applications S. Application No 1 Eye communication system for disabled people
Eye tracking Communication technology Eye tracker - Communication eye tribe via symbols or icons, texts
2
EOG based text Electro Communication communication Oculography with texts
3
Eye gaze tracking for people with locked in diseases Eye - GUIDE (eye gaze user interface design)
4
Developed using infrared Sensors
Communication with texts
Built using a Communication through texts computer and a camera
Advantages
Limitation
Can specify needs through images. If the desired option is not available, then user can convey it by texts Average typing speed 16.2 Letters/min Easy and invasive method for recording large eye movements Low cost, user friendliness, and eye strain reduction.
Very small icons
Low cost
Not suitable for daily use due to close contact of electrodes
Eye movement are limited to certain degrees
Limitation is due to saccades. fixations and noise that contaminates the gaze data
3 System Architecture Our system comprises a GUI which can help an ALS patient to convey his basic needs. Our model helps the communication using pictures. There is no limitation in eye movements and the care taker can easily understand patient’s need. 3.1
Application Program/Interface (API) Overview
The Fig. 1 shows hardware and software components of the eye tracking system. Application Program Interface (API) layer overview is given below.
Application Application Program Interface Eye Tracking Server Eye Tracking Device Fig. 1. API layers
1400
T. Shravani et al.
Application: The first screen of the API is the welcome page which is divided into nine blocks as shown in the Fig. 2. Each block provides with the user’s basic needs which are Thirst, Food, Switch on/off the lights, Switch on/off the fan, Washroom, Entertainment, Emergency and Exit the Application. Our application is made in such a way that if the user gazes for 0.5 s or above at a particular block, event gets triggered and a new page will be opened in order to specify the desired option as described below.
Fig. 2. GUI of the welcome page [4, 6, 8]
When you gaze at the Thirst block for 0.5 s or above it will lead you to the next screen which has specific options on what a person wants to drink as shown in Fig. 3. It has options like tea, coffee, milk, coconut water, soft drinks, water, and a back option which will take the application to the welcome page. Assume the user desires to drink milk and selects the bottom right block with his eye gaze. Then the GUI of milk block shown in Fig. 4 will be opened. He can go back to welcome page by selecting top right back block in case it is a wrong option selected or he desires to convey his other requirement. Other blocks also perform the similar functionalities. On gazing at the Exit block which is at the bottom right corner of the welcome page, the user means that he has nothing to convey and the application will end and the eye tracker will stop recording the gaze data. API: Programmable interface to provide access to eye tracking device. iView X™ API which we have used is part of the iView X™ SDK (Software Development Kit) which can be downloaded from SMI website. We have used Python 2.7 programming language to build eye tracking application.
Assistive Communication Application
1401
Fig. 3. GUI of thirst block [6, 8, 9]
Eye Tracking Server: Eye tracking server application collects the data from the eye tracking device and provides the data via the API [9, 11]. Eye tracking Server is used as a generic name for software component. Eye Tracking Device: The eye tracker used is SMI RED-m eye tracker. The use of a laptop allows conceiving a mobile, practical and economic solution, applicable in the person’s everyday life. Laptop with USB 2.0 port is used to connect the eye tracker.
Fig. 4. GUI of milk block
1402
T. Shravani et al.
Fig. 5. Positioning the eye tracker
Place the RED-m Eye Tracking Device at the center point (located in step 1 above) and in the hinge area of the laptop as shown in Fig. 5. Angle the RED-m Eye Tracking Device upwards towards the eyes of the user. 3.2
Working of the Application
The Fig. 6 shows the activities performed by the application. When the helper opens the application calibration and validation functions are performed and when they get the appropriate accuracy the application is opened. If the user gazes the intended block for more than 0.5 s, then the corresponding action is performed and repeated unless the user gazes at the exit block. Once the user gazes at the exit block the eye tracker stops collecting the data and the application is closed. 3.3
Steps Involved
Eye Tracker Set Up: Now the Eye Tracker device needs to be connected to the Eye Tracking Server. Start the iView RED-m application which automatically starts the eye tracking server and establish connection with the eye tracking device. Calibration and Validation: Once the application is started the calibration and the validation processes will take place and the result of it which is accuracy and it is displayed on the command window. Make sure that accuracy value is less which means deviations happened are less. After the completion of calibration and validation the application is started. It will take the gaze data and perform the preferred action until the user gazes at the exit block.
Assistive Communication Application
1403
Fig. 6. Activity diagram
4 Results and Analysis 4.1
Heat Map
Several test cases have been applied to the application and the results were analyzed using heat maps. The user was asked to follow a specified path and using the gaze data which consists of timestamp, gaze x position, gaze y position and the duration of the event the following heat maps were generated as shown in Fig. 7. Test Case 1: The path asked to follow: thirst ! coffee ! back ! exit
1404
T. Shravani et al.
Fig. 7. Heat maps drawn with Test case 1 data
Test Case 2: The path to follow: emergency ! back ! food ! snacks ! back ! exit is shown in Fig. 8.
Fig. 8. Heat maps drawn with Test case 2 data
From the heat maps, we conclude that, the gaze data generated the correct flow of events. In the interface where the message is displayed only the back option block was enabled, when the user gazes at the remaining areas, no event is triggered.
Assistive Communication Application
4.2
1405
Fixation Graphs
Fixation or visual fixation is the maintaining of the visual gaze on a single location. The application was implemented using fixation values with the range from 0.1 s to 1 s. Graphs are generated using event data to determine appropriate range of the fixation value. For the generation of graphs, the user was asked to follow a specific path with different fixation values. The path to be followed is: switch on/off the fan ! back ! exit NOTE: Top of blue bar: Start time of event Top of red bar: End time of the event Total red block: Event span Series 1: Time taken for event to occur Series 2: Event duration Fixation Value of 0.1 s: As the fixation value is very small many number of events are triggered before we intend to see the desired block. As shown in Fig. 9, since the user had fixated only for 0.1 s, the corresponding page will not be opened. The y-axis shows the time in micro second.
Fig. 9. Fixation graph for 0.1 s
Fixation Value 0.3 s: In this case before triggering the actual event, many events which are not specified in the path are triggered in between the actual path. But, the number of events triggered for 0.3 s are less when compared with 0.1 s. It is shown in Fig. 10. Y-axis represents the time in micro seconds.
1406
T. Shravani et al.
25000000 20000000 15000000 10000000 5000000 0
Series2 Series1
Fig. 10. Fixation graph for 0.3 s
When the fixation value is in between 0.5 s to 1 s the user is able to gaze at the specified path and generate the events appropriately. Fixation Value 0.9 s: But, as the fixation point increases the time taken to trigger the event also increases. This may lead the user with uneasiness as he has to gaze at a desired block for more amount of time. It is shown in Fig. 11.
25000000 20000000 15000000 Series2
10000000
Series1
5000000 0 Turn ON/OFF fan
Looking back
Exit application
Fig. 11. Fixation graph for 0.9 s
Fixation Value 1 s: User has to gaze for a long time for the events to trigger. It is shown in Fig. 12.
Assistive Communication Application
1407
25000000 20000000 15000000 Series2
10000000
Series1
5000000 0 Switching ON/OFF lights
Looking back
Exit application
Fig. 12. Fixation graph for 1 s
When the fixation value is in between 0.5 s to 1 s the user is able to gaze at the specified path and generate the events appropriately. But, as the fixation point increases the time taken to trigger the event also increases. Our application is successful so far in the amount of time the user needs to gaze for the required action to happen. 0.5 to 0.7 s interval is precise for the occurring of the event as per the user’s choice. If the fixation value is less, then unnecessary events are triggered (i.e. if value is from 0.1 to 0.4 s). If the fixation value is high though it is triggering the correct block it is taking a lot of time. So, an optimum value of 0.5 s is taken as the fixation value.
5 Conclusion and Future Scope The advantage of this system is prior learning is not required for its usage by any patient. The figurative icons represented by symbolic images helps to establish the connection so that the patients can inform the medical staff about their symptoms, feelings and needs. Our analysis of the eye tracking data using heat maps also gave the anticipated results. The optimal fixation value is determined to be 0.5 s upon the analysis done using graphs. Advancements can be done to our application by incorporating text to speech conversion techniques, application can be enhanced in such a way that care takers or doctors can have the flexibility of modifying the GUI according to each patient’s explicit needs, differentiating natural and intended eye movements. The application can be customized according to the need of each patient.
1408
T. Shravani et al.
Acknowledgment. This study was conducted in the Eye Tracking and Computer Vision Research Lab in Amrita School of Engineering, Bengaluru. Compliance with Ethical Standards ✓ All authors declare that there is no conflict of interest. ✓ No humans/animals involved in this research work. ✓ We have used our own data.
References 1. Kiyohiko, A., Shoichi, O., Minoru, O.: Eye-gaze input system suitable for use under natural light and its applications toward a support for ALS patients. In: Current Advances in Amyotrophic Lateral Sclerosis. IntechOpen (2013) 2. Venugopal, D., Amudha, J., Jyotsna, C.: Developing an application using eye tracker. In: 2016 IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT). IEEE (2016) 3. Pavani, M.L., Prakash, A.B., Koushik, M.S., Amudha, J., Jyotsna, C.: Navigation through eye-tracking for human–computer interface. In: Information and Communication Technology for Intelligent Systems, pp. 575–586. Springer, Singapore (2019) 4. Drewes, H.: Eye gaze tracking for human computer interaction. Dissertation LMU (2010) 5. Jyotsna, C., Amudha, J.: Eye gaze as an indicator for stress level analysis in students. In: International Conference on Advances in Computing, Communications and Informatics (ICACCI). IEEE (2018) 6. da Eira, M.S.: Eye communication system for nonspeaking patients (2014) 7. Liu, S.S., et al.: An eye-gaze tracking and human computer interface system for people with ALS and other locked-in diseases. In: CMBES Proceedings, vol. 33, no. 1 (2018) 8. TallaVamsi, S.M., Shambi, J.S.: EOG Based Text Communication System for Physically Disabled and Speech Impaired (2008) 9. Lupu, R.G., Ungureanu, F.: A survey of eye tracking methods and applications. Buletinul Institutului Politehnic din Iasi, Autom. Control Comput. Sci. Sect. 3, 72–86 (2013) 10. Anacan, R., et al.: Eye-GUIDE (eye-gaze user interface design) messaging for physicallyimpaired people. arXiv preprint arXiv:1302.1649 (2013) 11. Sawhney, T., Pravin Reddy, S., Amudha, J., Jyotsna, C.: Helping hands – an eye tracking enabled user interface framework for amyotrophic lateral sclerosis patients. Int. J. Control Theor. Appl. 10(15) (2017). ISSN: 0974-5572 12. Lule, D., Kübler, A., Ludolph, A.: Ethical principles in patient-centered medical care to support quality of life in amyotrophic lateral sclerosis. Front. Neurol. 10, 259 (2019) 13. Lenglet, T., Mirault, J., Veyrat-Masson, M., Funkiewiez, A., del Mar Amador, M., Bruneteau, G., Lacomblez, L.: Cursive eye-writing with smooth-pursuit eye-movement is possible in subjects with amyotrophic lateral sclerosis. Front. Neurosci. 13 (2019) 14. Chipika, R.H., Finegan, E., Shing, S.L.H., Hardiman, O., Bede, P.: Tracking a fast-moving disease: longitudinal markers, monitoring, and clinical trial endpoints in ALS. Front. Neurol. 10 (2019) 15. Yunusova, Y., Ansari, J., Ramirez, J., Shellikeri, S., Stanisz, G.J., Black, S.E., Zinman, L.: Frontal anatomical correlates of cognitive and speech motor deficits in amyotrophic lateral sclerosis. Behav. Neurol. (2019)
Author Index
A Abinaya, V., 1085 Abujar, Sheikh, 1303 Aditya S, K., 364 Adriana, Sarmiento Rubiano, 697 Agarwal, Ritu, 384 Agrawal, Pankaj, 993 Agrawal, Piyush, 1031 Agrawal, Shakti, 555 Ahmed, Mohammed Riyaz, 1294 Alegavi, Sujata, 148 Alharbi, Fahd, 1179 Amudha, J., 312, 645, 921, 1397 Andrea, Carvajal Tatis Claudia, 697 Anitha, J., 555 Anitha, M. V., 219 Ansal Muhammed, A., 717 Anupama, M. A., 413 Anuradha, M., 843 Anusha, H. C., 606 Anusree, K., 921 Apoorva, S., 364 Asha, K. N., 1009 Aswin, S., 179 Ayyavaraiah, Monelli, 1125 B Bai, V. Thulasi, 1283 Balaji, S., 631 Basak, Setu, 1044 Basilia, Cynthia, 689 Batra, Kushal, 708 Beschi Raja, J., 257 Bhattad, Niraj, 1221 Bhavana, V., 402
Bhavani, N. P. G., 1371 Bhavani, S., 1273 Bhavanishankar, K., 563 Bhuvaneswari Amma, N. G., 422 Bijoy, M. B., 717 Biju, Vinai George, 689 Binoy, Sandra, 689 Biswas, Abhik, 964 Bose, Subash Chandra, 26 Bucci, Nunziatina, 140 C Challa, Manoj, 286, 321 Chandrakumar, T., 40 Chaudhari, Rajendra P., 193 Chaudhary, Sanjay, 1265 Chinnaiyan, R., 321 Chinnaswamy, Arunkumar, 902 Chowdhury, S. M. Mazharul Hoque, 1303 Churi, Prathamesh, 1023 D Dahikar, Pradeep, 993 Darshan, S., 768 Darshini, P., 364 Datta, Sujoy, 964 De, Souranil, 964 Deepankan, B. N., 384 Deshpande, Anuja, 993 Deshpande, Anuoama, 779 Devakumari, D., 109 Dhondkar, Ashutosh, 294 Dhongade, Sayali M., 193 Divya, C. D., 656
© Springer Nature Switzerland AG 2020 S. Smys et al. (Eds.): ICCVBIC 2019, AISC 1108, pp. 1409–1413, 2020. https://doi.org/10.1007/978-3-030-37218-7
1410 E Elangovan, K., 247 Elías, Parody Muñoz Alexander, 697 F Fernández, Claudia, 1 G Gagana, H. S., 983 Gaitán, Mercedes, 226, 268 Gambani, Simran, 909 Gangrade, Sakshi, 598 Garg, Amit Kumar, 1162 Gaurang, Desai Prutha, 902 Gayathri, M., 931 George, Joby, 1310, 1319 Ghole, Tanvi, 798 Ghosh, Subhajit, 1344 Gireesan, Ganga, 62 Gogineni, Ajay K., 1386 Goldar, Adarsh, 213 Gopalapillai, Radhakrishan, 1378 Gulati, Savy, 536 Gupta, Deepa, 871 Gupta, Jagrati, 345 Gurav, Aaditya, 1078 H Hadzieva, Elena, 26 Handore, Mahendra V., 779 Harakannanavar, Sunil S., 1146, 1213 Harikumar, R., 1248 Hema, A. S., 606 Hemaanand, M., 768 Himabindu, Y., 1355 Hossain, Syed Akhter, 1303 Hullole, Pramod, 1213 I Irfan, Shadab, 1344 J Jagadeeswaran, S., 768 Jain, Aastha, 1023 Jayachandran, A., 1154 Jayalakshmi, N., 1190 Jayaraj, P. B., 717 Jeeva Priya, K., 1335 Jeyakumar, Gurusamy, 355, 472, 480 Jha, Smriti, 371 Jose, Deepa, 624 Joseph, S. Iwin Thanakumar, 1327 Joy, Fancy, 894
Author Index Jyotinagar, Varshapriya, 161 Jyotsna, C., 312, 645, 1397 K Kakati, Munindra, 8 Kakwani, Divyanshu, 1221 Kalaiselvi, K., 751, 1232 Kalamani, M., 1248 Kamarasan, M., 1094 Kamthekar, Swati S., 338 Kanabur, Vidyashree, 1213 Kannan, Nithiyananthan, 1179 Karande, Shruti, 798 Karthika, R., 768 Karthikeyan, B., 204 Karthikeyan, C., 727 Karthikeyan, V., 1371 Karunakaran, V., 1327 Kaur, Manpreet, 541 Kaushik, Sajal, 1031 Kavitha, C. R., 583 Kavya, S. R., 606 Kawathekar, Seema, 1106 Keerthi, M., 555 Khadar Basha, S. K., 1137 Khatun, Fatema, 1303 Khopkar, Vishal, 161 Kiran, K. V. D., 504 Kishore, Raj, 463, 1386 Kolhare, Nilima R., 193 Koppad, Deepali, 345 Kothari, Raj, 974 Kousalya, R., 1067 Koushik, G. V. S. N., 1335 Krishnamoorthi, M., 1248 Krishnan, K. Raghesh, 431 Krishnarathinam, Anuradha, 843 Kulkarni, Sujata, 798 Kumar, Manish, 131 Kumar, Mukesh, 1171 Kumar, P. Nirmal, 624 Kumar, Ranjeet, 131, 204 Kumar, Silpa Ajith, 1363 Kumar, Udhaya, 827 Kunaraj, K., 631 Kurup, R. Vimal, 413 L Laaxmi, N. T., 40 Lakshmi, G. Durga, 431 Lakshmi, M., 1273 Lezama, Omar Bonerge Pineda, 1, 48, 187, 226, 268
Author Index Lis-Gutiérrez, Jenny Paola, 90 Lokesh, S. V., 204 Luna-Cardozo, Marisabel, 140 M Mahadevan, G., 1137 Mahajan, Arpana D., 1265 Maheswari, N., 843 Mahimai Don Bosco, F. P., 631 Malathi, P., 395 Malathy, C., 931 Mamatha, M. N., 1116 Manimegalai, P., 1273 Manju Priya, A. R., 871 Manjusha, R., 1355 Maria Wenisch, S., 631 María, Suarez Marenco Marianella, 697 Mariappan, Ramasamy, 727 Marina, Ninoslav, 26 Mathew, Blessy B., 689 Mathew, Linda Sara, 62, 171, 219 Mazharul Hoque Chowdhury, S. M., 1044 Meena Sutha, S., 1371 Meena, R., 1283 Meenakshi, B., 827 Meenakshi, M., 240 Mhatre, Apurva, 294, 1078 Mishra, Akshay Kumar, 213 Mobarak, Youssef, 1179 Mondkar, Harshita, 798 Monika, P., 1203 Monisha, 496 Moorthi, S. Manthira, 1085 Morgado Gamero, W. B., 697 Mubarakali, Azath, 26 Mudaliar, Navin Kumar, 1078 Mulla, Nikahat, 909 Murthy, Shrujana, 583 N Naga Rajath, S. M., 304 Naidu, Srinath R., 598 Naik, Gaurish M., 589 Naik, Suprava, 1386 Nair, Ajun, 294, 1078 Nair, Akash, 1078 Namboodiri, T. Sreekesh, 1154 Nandagopal, A. S., 942 Nandhana, K., 40 Narayanan, Mahadevan, 294, 1078 Narmadha, S., 71 Nidhya, R., 131
1411 Nikkil Kumar, P. C., 395 Nirmalakumari, K., 442 Nishanth, K. N., 983 Nishanth, M., 480 Nivendkar, Swapnil, 161 Nivetha, S., 843 Nussinov, Zohar, 463 O Oliver, A. Sheryl, 843 Omana, J., 1283 Osheen, T. J., 171 Oza, M. P., 1085 P Palanisamy, Poonkodi, 496 Pamina, J., 257 Panicker, Akilesh, 294 Parab, Jivan S., 589 Parameswaran, Latha, 768, 788, 1256, 1355 Parida, D. K., 463 Parikh, Satyen M., 1059 Parikshith, H., 304 Parveen, Summia, 496 Patel, Ashok R., 1059 Patel, Meghna B., 1059 Patil, P. M., 779 Patil, Yogini B., 1106 Pavan Kumar, S. P., 304 Pérez, Doyreg Maldonado, 187, 268 Ponnusamy, R., 679 Poonkuntran, S., 1085 Prabhakar, Sunil Kumar, 529, 576, 615, 662 Prabhakaran, M., 931 Prabhu, L. Arokia Jesu, 1154 Prabhu, Shreyank, 161 Prabhu, Yogini, 589 Prajwalasimha, S. N., 606 Prakash, Abhishek, 213 Prakash, S., 247 Prashanth, B. U. V., 1294 Prashanth, C. M., 689 Prashanth, C. R., 1146 Prashanth, K. V. Mahendra, 179 Prashantkumar, K., 275 Pravin Kumar, S. K., 79 Priyadharsini, C., 1327 Priyangha, B. S., 56 Punithavathi, V., 109 Purnikmath, Veena I., 1213 Purushottam, P., 275 Pushpa, B., 1094
1412 Q Quitian, Oscar Iván Torralba, 90 R Rabby, SK. Fazlee, 1303 Radha, R., 708 Radha, V., 514 Rai, Saloni, 1162 Raj, Kiran S., 480 Raj, Pranay, 1386 Raja, K. B., 1146 Rajaguru, Harikumar, 442, 488, 529, 576, 615, 662 Rajeevan, T. V., 827 Rajendra, A. B., 656 Rajendrakumar, S., 827 Rajeshwari, I., 1051 Rajkumar, P., 442 Rajkumar, R., 1009 Raju, G. T., 453, 1203 Rakesh Chowdary, P., 768 Rakshit, Soumik, 964 Ramalingam, Devakunchari, 863 Ramanujam, E., 40 Ramya, R., 79 Rangaswamy, C., 453 Ranjan, Ayush, 179 Rao, Shrihari, 1378 Ravalika, R., 645 Reddy, B. Girish K., 1335 Reddy, Sravan Kumar, 1221 Rekha, B. S., 1221 Revathy, R., 402 Romero, Ligia, 1 Roshini, A., 504 Rudra, Sudipta, 809 Rukkumani, V., 79 S Sachin, S., 275 Saha, Srinjoy, 964 Sahu, Kisor K., 463, 1386 Sai, Ramya, 1397 Sajan, Gill Varghese, 1310 Saketh, Konjeti Harsha, 472 Sam Peter, S., 257 SandhyaSree, V., 329 Sanjay Tharagesh, R. S., 788 Sanjay, H. A., 364 Sankar Ganesh, P. V., 735 Sannasi Chakravarthy, S. R., 488
Author Index Santhi, M., 98 Sara, S. Belina V. J., 1232 Saranya, N., 79 Sarma, Parismita, 8 Sasikala, T., 882 Sasikumar, P. K., 950 Satheesh Kumar, J., 1363 Sathiamoorthy, S., 679 Sathya Bama, S., 257 Satpathy, Manoranjan, 463 Satpute, Omkar, 161 Sedamkar, Raghvendra, 148 Sekhar, Sachin, 708 Selvakumar, S., 422 Selvaraj, Shanthi, 496 Seshikala, G., 453 Seth, Harsh, 863 Shah, Dhruvesh, 1023 Shahna, E., 1319 Shete, Pranoti S., 120 Shinde, Sonal Suhas, 338 Shobarani, R., 1371 Shravani, T., 1397 Shubha, P., 240 Shyamala, K., 1051 Shyry, S. Prayla, 523 Sidramappa, 606 Silva, Amelec Viloria, 140 Silva, Jesús, 1, 48, 226, 268, 697 Sindhu, A., 514 Sindu, S., 1067 Siva Nageswara Rao, G., 727 Sivakumar, D., 670 Snigdha, P., 364 Sofia, R., 670 Solano, Darwin, 1 Soman, K. P., 413 Sonkar, Ayush, 213 Soundarya, S., 257 Sowmiya, J., 751 Sowmya, V., 413 Sowmyasri, P., 645 Sravanthi, Tatiparthi, 1327 Sree, Katamneni Vinaya, 355 Sridhar, C. S., 1137 Sridhar, P., 788, 1256 Srinivasan, G. N., 1221 Srinivasan, K., 79 Srinivasan, Ramakrishnan, 902 Sripriya, P., 735 Srishilesh, P. S., 788 Srivanth, P., 523
Author Index Srividhya, S., 247 Sruthi, M. S., 257 Subramaniam, P., 827 Subramanian, Rama, 727 Sudha, B. G., 371 Sudhamani, M. V., 563 Sudir, P., 1137 Sujatha, K., 1371 Suneeta, V. B., 275 Sunitha, N. R., 983 Sunny, Ani, 942 Sunny, Leya Elizabeth, 98 Suny, Shams Shahriar, 1044 Supreet, M., 275 Surendran, K. T. Sithara, 882 Suresh, S., 204 T Tamuly, Sudarshana, 312 Thakkar, Tej, 909 Thakur, Amit, 213 Thanga Manickam, M., 1256 Thangaraj, P., 1190 Thangavel, Senthil Kumar, 788, 809, 1256 Thangavel, Senthilkumar, 827 Thangavelu, S., 329 Thengade, Anita, 120 Thiyaneswaran, B., 56 Thomson, Derrick, 523 Tiwari, Swapnil, 863 Torres-Samuel, Maritza, 140 Torse, Dattaprasad, 1213 Tumpa, Zerin Nasrin, 1303
1413 U Uchil, Aditya P., 371 Uttarwar, Sayli, 909 V Valarmathi, R. S., 1248 Vani, A., 1116 Vani Shree, M., 1397 Varalakshmi, J., 624 Varela, Noel, 48, 187, 226, 268 Varghese, Tanya, 555 Varma, J. S. Mohith B., 1335 Vashist, Lokender, 1171 Vásquez, Carmen Luisa, 140 Vedamurthy, H. K., 19 Veer (Handore), Shubhangi S., 779 Veerasamy, Murugesh, 26 Venkata Achyuth Kumar, S., 402 Venkateswarlu, Bondu, 1125 Vignesh, N., 233 Vijay Bhaskar Reddy, V., 402 Vijayakumar, V., 71, 894 Viloria, Amelec, 90, 187 Vimal Kumar, S., 1190 Vinayakumar, R., 413 Vindya, N. D., 19 Vinothkanna, R., 950 Vinutha, D. C., 233 Vyas, Gargi, 974 Y Yoganathan, N. S., 56 Yogesh, M., 1256