652 54 22MB
English Pages [933] Year 2022
Lecture Notes on Data Engineering and Communications Technologies 114
Mukesh Saraswat · Harish Sharma · K. Balachandran · Joong Hoon Kim · Jagdish Chand Bansal Editors
Congress on Intelligent Systems Proceedings of CIS 2021, Volume 1
Lecture Notes on Data Engineering and Communications Technologies Volume 114
Series Editor Fatos Xhafa, Technical University of Catalonia, Barcelona, Spain
The aim of the book series is to present cutting edge engineering approaches to data technologies and communications. It will publish latest advances on the engineering task of building and deploying distributed, scalable and reliable data infrastructures and communication systems. The series will have a prominent applied focus on data technologies and communications with aim to promote the bridging from fundamental research on data science and networking to data engineering and communications that lead to industry products, business knowledge and standardisation. Indexed by SCOPUS, INSPEC, EI Compendex. All books published in the series are submitted for consideration in Web of Science.
More information about this series at https://link.springer.com/bookseries/15362
Mukesh Saraswat · Harish Sharma · K. Balachandran · Joong Hoon Kim · Jagdish Chand Bansal Editors
Congress on Intelligent Systems Proceedings of CIS 2021, Volume 1
Editors Mukesh Saraswat Department of Computer Science & Engineering and Information Technology Jaypee Institute of Information Technology Noida, India K. Balachandran Department of Computer Science and Engineering CHRIST (Deemed to be University) Bengaluru, India
Harish Sharma Department of Computer Science and Engineering Rajasthan Technical University Kota, India Joong Hoon Kim Korea University Seoul, Korea (Republic of)
Jagdish Chand Bansal South Asian University New Delhi, India
ISSN 2367-4512 ISSN 2367-4520 (electronic) Lecture Notes on Data Engineering and Communications Technologies ISBN 978-981-16-9415-8 ISBN 978-981-16-9416-5 (eBook) https://doi.org/10.1007/978-981-16-9416-5 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Preface
This proceedings contains the papers presented at the 2nd Congress on Intelligent Systems (CIS 2021), organized by CHRIST (Deemed to be University), Bengaluru, India, and Soft Computing Research Society during September 4–5, 2021. Congress on Intelligent Systems (CIS 2021) invited ideas, developments, applications, experiences, and evaluations in intelligent systems from academicians, research scholars, and scientists. The conference deliberation included topics specified within its scope. The conference offered a platform for bringing forward extensive research and literature across the arena of intelligent systems. It provided an overview of the upcoming technologies. CIS 2021 provided a platform for leading experts to share their perceptions, provide supervision, and address participants’ interrogations and concerns. CIS 2021 received 370 research submissions from 35 different countries, viz., Algeria, Bangladesh, Burkina Faso, China, Egypt, Ethiopia, Finland, India, Indonesia, Iran, Iraq, Kenya, Korea, The Democratic People’s Republic of Madagascar, Malaysia, Mauritius, Mexico, Morocco, Nigeria, Peru, Romania, Russia, Serbia, Slovakia, South Africa, Spain, Switzerland, Ukraine, United Arab Emirates, UK, USA, Uzbekistan, Vietnam. The papers included topics about advanced areas in technology, artificial intelligence, machine learning, and data science. After a rigorous peer review with the help of program committee members and more than 100 external reviewers, 135 papers were approved. CIS 2021 is a flagship event of the Soft Computing Research Society, India. The conference was inaugurated by Fr. Dr. Abraham VM, Honorable Vice-Chancellor, CHRIST (Deemed to be University), Bangalore, and Chief Patron, CIS 2021. Other eminent dignitaries include Prof. Joong Hoon Kim, General Chair, CIS 2021; Fr. Joseph Varghese, Patron, CIS 2021; and Prof. K. Balachandran, General Chair, CIS 2021. The conference witnessed keynote addresses from eminent speakers, namely Prof. Xin-She Yang, Middlesex University, The Burroughs, Hendon, London; Prof. P. Nagabhushan, Indian Institute of Information Technology Allahabad; Prof. J. C. Bansal, South Asian University, New Delhi, India; Prof. Lipo Wang, Nanyang Technological University, Singapore; and Prof. Nishchal K. Verma, Indian Institute of
v
vi
Preface
Technology Kanpur, India. The organizers wish to thank Mr. Aninda Bose, Senior Editor, Springer Nature, New Delhi, India, for the support and guidance. Noida, India Kota, India Bengaluru, India Seoul, Korea (Republic of) New Delhi, India
Mukesh Saraswat Harish Sharma K. Balachandran Joong Hoon Kim Jagdish Chand Bansal
Contents
The Extraction of Automated Vehicles Traffic Accident Factors and Scenarios Using Real-World Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MinHee Kang, Jaein Song, and Keeyeon Hwang
1
Leaf Disease Identification in Rice Plants Using CNN Model . . . . . . . . . . . Allam Sushanth Reddy and Jyothi Thomas
17
Twitter Sentiment Analysis Based on Neural Network Techniques . . . . . . Ashutosh Singal and Michael Moses Thiruthuvanathan
33
Enhanced Stock Market Prediction Using Hybrid LSTM Ensemble . . . . Reuben Philip Roy and Michael Moses Thiruthuvanathan
49
Classifying Microarray Gene Expression Cancer Data Using Statistical Feature Selection and Machine Learning Methods . . . . . . . . . . S. Alagukumar and T. Kathirvalavakumar
63
Pythagorean Fuzzy Information Measure with Application to Multicriteria Decision Making . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anjali Munde
79
Developing an Improved Software Architecture Framework for Smart Manufacturing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gareth A. Gericke, Rangith B. Kuriakose, and Herman J. Vermaak
87
Intelligent Water Drops Algorithm Hand Calculation Using a Mathematical Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Sathish Kumar Ravichandran, Archana Sasi, and Ramesh Vatambeti A Study of Decision Tree Classifier to Predict Learner’s Progression . . . 113 Savita Mohurle and Richa Pandey An Overview of Blockchain and IoT in e-Healthcare System . . . . . . . . . . . 123 S. V. Vandana AryaSomayajula and Ankur Goyal
vii
viii
Contents
Energy-Efficient ACO-DA Routing Protocol Based on IoEABC-PSO Clustering in WSN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 M. Vasim Babu, C. N. S. Vinoth Kumar, B. Baranidharan, M. Madhusudhan Reddy, and R. Ramasamy Modelling Critical Success Factors for Smart Grid Development in India . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Archana, Ravi Shankar, and Shveta Singh Stability Analysis of Emerged Seaside Perforated Quarter Circle Breakwater Using Soft Computing Techniques . . . . . . . . . . . . . . . . . . . . . . . 177 Sreelakshmy Madhusoodhanan and Subba Rao A Risk-Budgeted Portfolio Selection Strategy Using Novel Metaheuristic Optimization Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 Mohammad Shahid, Zubair Ashraf, Mohd Shamim, Mohd Shamim Ansari, and Faisal Ahmad An Optimization Reconfiguration Reactive Power Distribution Network Based on Improved Bat Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 205 Thi-Kien Dao, Trinh-Dong Nguyen, Trong-The Nguyen, and Jothiswaran Thandapani Analyzing a Raga-Based Bollywood Song: A Statistical Approach . . . . . . 217 Lopamudra Dutta and Soubhik Chakraborty Security Prioritized Heterogeneous Earliest Finish Time Workflow Allocation Algorithm for Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . . 233 Mahfooz Alam, Mohammad Shahid, and Suhel Mustajab Dropout-VGG Based Convolutional Neural Network for Traffic Sign Categorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 Inderpreet Singh, Sunil Kr. Singh, Sudhakar Kumar, and Kriti Aggarwal Internet-Based Healthcare Things Driven Deep Learning Algorithm for Detection and Classification of Cervical Cells . . . . . . . . . . . 263 Shruti Suhas Kute, Amit Kumar Tyagi, Shaveta Malik, and Atharva Deshmukh Load Balancing Algorithms in Cloud Computing Environment—An Effective Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 N. Priya and S. Shanmuga Priya Assessment of the Spatial Variability of Air Pollutant Concentrations at Industrial Background Stations in Malaysia Using Self-organizing Map (SOM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 Loong Chuen Lee and Hukil Sino
Contents
ix
An Approach for Enhancing Security of Data over Cloud Using Multilevel Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 Binita Thakkar and Blessy Thankachan Advanced Spam Detection Using NLP and Deep Learning . . . . . . . . . . . . . 319 Aditya Anil, Ananya Sajwan, Lalitha Ramchandar, and N. Subhashini A Systematic Literature Review on Image Preprocessing and Feature Extraction Techniques in Precision Agriculture . . . . . . . . . . . 333 G. Sharmila and Kavitha Rajamohan A Comprehensive Study on Computer-Aided Cataract Detection, Classification, and Management Using Artificial Intelligence . . . . . . . . . . 355 Binju Saju and R. Rajesh Improving Black Hole Algorithm Performance by Coupling with Genetic Algorithm for Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . 367 Hrushikesh Bhosale, Prasad Ovhal, Aamod Sane, and Jayaraman K. Valadi Brain Tumor Analysis and Reconstruction Using Machine Learning . . . 381 Priyanka Sharma, Dinesh Goyal, and Neeraj Tiwari Ordered Ensemble Classifier Chain for Image and Emotion Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395 Puneet Himthani, Puneet Gurbani, Kapil Dev Raghuwanshi, Gopal Patidar, and Nitin Kumar Mishra A Novel Deep Learning SFR Model for FR-SSPP at Varied Capturing Conditions and Illumination Invariant . . . . . . . . . . . . . . . . . . . . 407 R. Bhuvaneshwari, P. Geetha, M. S. Karthika Devi, S. Karthik, G. A. Shravan, and J. Surenthernath Attention-Based Ensemble Deep Learning Technique for Prediction of Sea Surface Temperature . . . . . . . . . . . . . . . . . . . . . . . . . . . 425 Ashapurna Marndi and G. K. Patra Political Optimizer-Based Optimal Integration of Soft Open Points and Renewable Sources for Improving Resilience in Radial Distribution System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439 D. Sreenivasulu Reddy and Varaprasad Janamala Face and Emotion Recognition from Real-Time Facial Expressions Using Deep Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451 Shrinitha Monica and R. Roseline Mary A Real-Time Traffic Jam Detection and Notification System Using Deep Learning Convolutional Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461 Sedish Seegolam and Sameerchand Pudaruth
x
Contents
Design of a Robotic Flexible Actuator Based on Layer Jamming . . . . . . . 477 Kristian Kowalski and Emanuele Lindo Secco Sentiment Analysis on Diabetes Diagnosis Health Care Using Machine Learning Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491 P. Nagaraj, P. Deepalakshmi, V. Muneeswaran, and K. Muthamil Sudar Predicting the Health of the System Based on the Sounds . . . . . . . . . . . . . . 503 Manisha Pai and Annapurna P. Patil A Model Based on Convolutional Neural Network (CNN) for Vehicle Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519 F. M. Javed Mehedi Shamrat, Sovon Chakraborty, Saima Afrin, Md. Shakil Moharram, Mahdia Amina, and Tonmoy Roy A Transfer Learning Approach for Face Recognition Using Average Pooling and MobileNetV2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531 F. M. Javed Mehedi Shamrat, Sovon Chakraborty, Md. Shakil Moharram, Tonmoy Roy, Masudur Rahman, and Biraj Saha Aronya A Deep Learning Approach for Splicing Detection in Digital Audios . . . . 543 Akanksha Chuchra, Mandeep Kaur, and Savita Gupta Multi-criteria Decision Theory-Based Cyber Foraging Peer Selection for Content Streaming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559 Parisa Tabassum, Abdullah Umar Nasib, and Md. Golam Rabiul Alam Visualizing Missing Data: COVID-2019 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573 K. Lavanya, G. Raja Gopal, M. Bhargavi, and V. Akhil Study of Impact of COVID-19 on Students Education . . . . . . . . . . . . . . . . . 585 Deepali A. Mahajan and C. Namrata Mahender A Framework for Analyzing Crime Dataset in R Using Unsupervised Optimized K-means Clustering Technique . . . . . . . . . . . . . . 593 K. Vignesh, P. Nagaraj, V. Muneeswaran, S. Selva Birunda, S. Ishwarya Lakshmi, and R. Aishwarya Seed Set Selection in Social Networks Using Community Detection and Neighbourhood Distinctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609 Sanjeev Sharma and Sanjay Kumar An Optimized Active Learning TCM-KNN Algorithm Based on Intrusion Detection System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621 Reenu Batra, Manish Mahajan, and Amit Goel Ensemble Model of Machine Learning for Integrating Risk in Software Effort Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635 Ramakrishnan Natarajan and K. Balachandran
Contents
xi
A Questionnaire-Based Analysis of Network Forensic Tools . . . . . . . . . . . . 645 Rachana Yogesh Patil and Manjiri Ranjanikar On the Industrial Clustering: A View from an Agent-Based Version of Krugman Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653 Smicha Ait Amokhtar and Nadjia El Saadi An Efficient Comparison on Machine Learning and Deep Neural Networks in Epileptic Seizure Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 677 R. Roseline Mary, B. S. E. Zoraida, and B. Ramamurthy Analysis of Remote Sensing Satellite Imagery for Crop Yield Mapping Using Machine Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . 689 M. Sarith Divakar, M. Sudheep Elayidom, and R. Rajesh Analysis of Lung Cancer Prediction at an Early Stage: A Systematic Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 701 Shweta Agarwal and Chander Prabha Framework for Estimating Software Cost Using Improved Machine Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713 Sangeetha Govinda Deducing Water Quality Index (WQI) by Comparative Supervised Machine Learning Regression Techniques for India Region . . . . . . . . . . . . 727 Sujatha Arun Kokatnoor, Vandana Reddy, and K. Balachandran Artificial Ecosystem-Based Optimization for Optimal Location and Sizing of Solar Photovoltaic Distribution Generation in Agriculture Feeders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 743 U. Kamal Kumar and Varaprasad Janamala Optimized Segmentation Technique for Detecting PCOS in Ultrasound Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 759 S. Jeevitha and N. Priya Implementation of Morphological Gradient Algorithm for Edge Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773 Mirupala Aarthi Vardhan Rao, Debasish Mukherjee, and S. Savitha Support Vector Machine Performance Improvements by Using Sine Cosine Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 791 Miodrag Zivkovic, Nikola Vukobrat, Amit Chhabra, Tarik A. Rashid, K. Venkatachalam, and Nebojsa Bacanin French COVID-19 Tweets Classification Using FlauBERT Layers . . . . . . 805 Sadouanouan Malo, Thierry Roger Bayala, and Zakaria Kinda A Novel Weighted Extreme Learning Machine for Highly Imbalanced Multiclass Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817 Siddhant Baldota and Deepti Aggarwal
xii
Contents
An Enhanced Pixel Intensity Range-Based Reversible Data Hiding Scheme for Interpolated Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 831 Rama Singh and Ankita Vaish UAV Collaboration for Autonomous Target Capture . . . . . . . . . . . . . . . . . . 847 Lima Agnel Tony, Shuvrangshu Jana, V. P. Varun, Shantam Shorewala, B. V. Vidyadhara, Mohitvishnu S. Gadde, Abhishek Kashyap, Rahul Ravichandran, Raghu Krishnapuram, and Debasish Ghose A Leaf Image-Based Automated Disease Detection Model . . . . . . . . . . . . . 863 Aditi Ghosh and Parthajit Roy Multi-agent Cooperative Framework for Autonomous Wall Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 877 Kumar Ankit, Lima Agnel Tony, Shuvrangshu Jana, and Debasish Ghose Construction of a Convex Polyhedron from a Lemniscatic Torus . . . . . . . 895 Ricardo Velezmoro-León, Robert Ipanaqué-Chero, Felícita M. Velásquez-Fernández, and Jorge Jimenez Gomez An Ant System Algorithm Based on Dynamic Pheromone Evaporation Rate for Solving 0/1 Knapsack Problem . . . . . . . . . . . . . . . . . 911 Ruchi Chauhan, Nirmala Sharma, and Harish Sharma Ant System Algorithm with Output Validation for Solving 0/1 Knapsack Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 921 Ruchi Chauhan, Nirmala Sharma, and Harish Sharma Fitness-Based PSO for Large-Scale Job-Shop Scheduling Problem . . . . . 933 Kavita Sharma and P. C. Gupta Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 949
About the Editors
Dr. Mukesh Saraswat is an associate professor at Jaypee Institute of Information Technology, Noida, India. Dr. Saraswat has obtained his Ph.D. in Computer Science and Engineering from ABV-IIITM Gwalior, India. He has more than 18 years of teaching and research experience. He has guided two Ph.D. students, more than 50 M.Tech. and B.Tech. dissertations, and presently guiding four Ph.D. students. He has published more than 50 journal and conference papers in the area of image processing, pattern recognition, data mining, and soft computing. He was the part of successfully completed DRDE funded project on image analysis and SERB-DST (New Delhi) funded project on Histopathological Image Analysis. He is currently running a projects funded by and Collaborative Research Scheme (CRS), Under TEQIP III (RTU-ATU) on Smile. He has been an active member of many organizing committees of various conferences and workshops. He is also a guest editor of the Array, Journal of Swarm Intelligence, and Journal of Intelligent Engineering Informatics. He is an active member of IEEE, ACM, and CSI Professional Bodies. His research areas include image processing, pattern recognition, mining, and soft computing. Harish Sharma is an associate professor at Rajasthan Technical University, Kota, in the Department of Computer Science and Engineering. He has worked at Vardhaman Mahaveer Open University, Kota, and Government Engineering College, Jhalawar. He received his B.Tech. and M.Tech degree in Computer Engg. from Government Engineering College, Kota, and Rajasthan Technical University, Kota, in 2003 and 2009, respectively. He obtained his Ph.D. from ABV—Indian Institute of Information Technology and Management, Gwalior, India. He is secretary and one of the founder member of Soft Computing Research Society of India. He is a life time member of Cryptology Research Society of India, ISI, Kolkata. He is an associate editor of “International Journal of Swarm Intelligence (IJSI)” published by Inderscience. He has also edited special issues of the many reputed journals like Memetic Computing, Journal of Experimental and Theoretical Artificial Intelligence, Evolutionary Intelligence, etc. His primary area of interest is nature-inspired optimization techniques. He
xiii
xiv
About the Editors
has contributed in more than 105 papers published in various international journals and conferences. Dr. K. Balachandran is currently a professor and the head CSE at CHRIST (Deemed to be University), Bengaluru, India. He has total 38 years’ experience in Research, Academia, and Industry. He served as a senior scientific officer in the Research and Development unit of Department of Atomic Energy for 20 years. His research interest includes data mining, artificial neural networks, soft computing, and artificial intelligence. He has published more than fifty articles in well-known SCI/SCOPUS indexed international journals and conferences and attended several national and international conferences and workshops. He has authored/edited four books in the area of computer science. Prof. Joong Hoon Kim a faculty of Korea University in the School of Civil, Environmental, and Architectural Engineering, obtained his Ph.D. degree from the University of Texas at Austin in 1992 with the thesis title “Optimal replacement/rehabilitation model for water distribution systems.” Professor Kim’s major areas of interest include: optimal design and management of water distribution systems, application of optimization techniques to various engineering problems, and development and application of evolutionary algorithms. His publication includes A New Heuristic Optimization Algorithm: Harmony Search, Simulation, February 2001, Vol. 76, pp 60–68, which has been cited over 2300 times by other journals of diverse research areas. His keynote speeches include “Optimization Algorithms as Tools for Hydrological Science” in the Annual Meeting of Asia Oceania Geosciences Society held in Brisbane, Australia, in June of 2013, “Recent Advances in Harmony Search Algorithm” in the 4th Global Congress on Intelligent Systems (GCIS 2013) held in Hong Kong, China, in December of 2013, and “Improving the convergence of Harmony Search Algorithm and its variants” in the 4th International Conference on Soft Computing For Problem Solving (SOCPROS 2014) held in Silchar, India in December of 2014. He hosted the 6th Conference of the Asia Pacific Association of Hydrology and Water Resources (APHW 2013) and the 2nd International Conference on Harmony Search Algorithm (ICHSA 2015). And, he is hosting another one, the 12th International Conference on Hydroinformatics (HIC 2016) in August of 2016. Dr. Jagdish Chand Bansal is an associate professor at South Asian University, New Delhi, and a visiting faculty at Maths and Computer Science, Liverpool Hope University, UK. Dr. Bansal has obtained his Ph.D. in Mathematics from IIT Roorkee. Before joining SAU, New Delhi, he has worked as an assistant professor at ABV—Indian Institute of Information Technology and Management Gwalior and BITS Pilani. He is the series editor of the book series Algorithms for Intelligent Systems (AIS) published by Springer. He is the editor in chief of International Journal of Swarm Intelligence (IJSI) published by Inderscience. He is also the associate editor of IEEE Access published by IEEE. He is the steering committee member and the general chair of the annual conference series SocProS. He is the general secretary of Soft
About the Editors
xv
Computing Research Society (SCRS). His primary area of interest is swarm intelligence and nature-inspired optimization techniques. Recently, he proposed a fission– fusion social structure-based optimization algorithm, Spider Monkey Optimization (SMO), which is being applied to various problems from engineering domain. He has published more than 70 research papers in various international journals/conferences. He has supervised Ph.D. theses from ABV-IIITM Gwalior and SAU, New Delhi. He has also received Gold Medal at UG and PG levels.
The Extraction of Automated Vehicles Traffic Accident Factors and Scenarios Using Real-World Data MinHee Kang , Jaein Song , and Keeyeon Hwang
Abstract As automated vehicles (AVs) approach commercialization, the fact that the SAFETY problem becomes more concentrated is not controversial. Depending on this issue, the scenarios research that can ensure safety and are related to vehicle safety assessments are essential. In this paper, based on ‘report of traffic collision involving an AVs’ provided by California DMV (Department of Motor Vehicles), we extract the major factors for identifying AVs traffic accidents to derive basic AVs traffic accident scenarios by employing the random forest, one of the machine learning. As a result, we have found the importance of the pre-collision movement of neighboring vehicles to AVs and inferred that they are related to collision time (TTC). Based on these factors, we derived scenarios and confirm that AVs rear-end collisions of neighboring vehicles usually occur when AVs are ahead in passing, changing lanes, and merge situations. While most accident determinants and scenarios are expected to be similar to those between human driving vehicles (HVs), AVs are expected to reduce accident rates because ‘AVs do not cause accidents.’ Keywords Automated vehicle · Real datasets · Traffic accident · Road safety · Traffic scenario · Machine learning
1 Introduction AVs stand out in IT firms, unlike the traditional automobile manufacturing industry companies [1]. Especially, Waymo LLC preemptively is developing and occupying AVs technology and is being chased by the existing automobile manufacturing M. Kang (B) · J. Song · K. Hwang Hongik University, Seoul 04066, Republic of Korea e-mail: [email protected] J. Song e-mail: [email protected] K. Hwang e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. Saraswat et al. (eds.), Congress on Intelligent Systems, Lecture Notes on Data Engineering and Communications Technologies 114, https://doi.org/10.1007/978-981-16-9416-5_1
1
2
M. Kang et al.
industry as a latecomer [2]. This is judged that the IT companies that deal with data first occupied the AVs technologies due to the software characteristics of the AVs. However, the SAFETY issue is spotlighted due to pedestrian fatal crashes in operating Uber’s AVs in 2018 [3–5]. Also, by emerging the study that AVs have the probability of collision even in ideal conditions [6], commercialization of AVs has been delaying. Although AVs clearly will be to take root in our society, ultimately the safety issues will eventually draw more attention in the future [7]. According to the issues, the safety paradigm has been changing from a passive safety system in the past to an active safety system that can prevent various accidents in advance [8]. One method of the active safety system is the prevention of traffic accidents in advance by learning numerous situations [9, 10], even if countless traffic accidents occur in real traffic conditions [11]. Furthermore, the recently proposed method for AVs assessment is the scenarios with ensuring safety. According to Ulbrich et al. [12], a scenario is the temporal development between several scenes in a sequence of scenes, actions, and events, as well as goals and values, specified to characterize this temporal development. Namely, these scenario studies that directly involved in car safety assessment are essential [13–15] and ambiguity about scenarios should need to minimize [18]. Along with these features, diverse studies that use real data for extracting scenarios are being carried out [16–18]. This is more relevant with this new paradigm shift in test and validation [19]. For instance, Erdogan et al. [19] pointed out that proposed scenarios are mostly hand-driven or expert input for AVs safety assessment. The authors stated that a myriad of scenarios is expected to be presented, so it is alone would not be suitable. In order to supplement the limitation, they proposed multiple classification algorithms and effectively extracted scenarios. Also, Webb et al. [20] presented an approach and methodology for AVs safety scenario assessment to reduce accident fatalities through safety and trust driving. With regard to using real data, Schwall et al. [16] demonstrated the safety of AVs on urban roads based on actual Waymo driving data. In this paper, we propose a 2-step scenario-building method to improve safety by alleviating ambiguity. This is as follows: (1) Extract the major factors for determining AVs collision, and (2) based on extracted factors, draw the expected AVs traffic accident scenarios. The aim is to utilize traffic accidents data of AVs and machine learning methodology for the extraction of AVs traffic accident scenarios. The contribution is that (1) scenarios can be derived by using AVs accident data containing direct accidents, (2) as not human driving vehicles (HVs) data but real AVs accident data, empower actually AVs safety, (3) identify main decision factors through machine learning, and (4) be able to relieve dependence on the knowledge of experts which was deemed a blind spot. This paper is organized as follows: In Sect. 2, we present collecting and preprocessing AVs accident data. Section 3 summarizes the method of using machine learning and describes the process of extraction to major decision factors. Based on this, we derive fundamental AVs accident scenarios in Sect. 4. Finally, Sect. 5 concludes the paper.
The Extraction of Automated Vehicles Traffic …
3
2 Data Collecting and Preprocessing In this section, we collect and preprocess the AVs traffic accident data.
2.1 Collecting AVs Accident Data Currently, more than 13 production companies, including Waymo, GM, and Drive AI, are participating in the demonstration of autonomous driving in California, the USA. However, as the information about these test runs is confidential to the manufacturer, the manufacturer does not disclose data containing the driving information on its own. Although there is a limit to the collection of operating information data, the Department of Motor Vehicles (DMV) in California is releasing data on the ‘traffic accident’ situation that occurred in AVs [21]. The report shows AVs manufacturers, accident times, vehicle driving conditions (move and stop), autonomous driving modes, road users (pedestrians, etc.), weather, lighting, road conditions, information that can be inferred from the accident, etc. Accordingly, we collected data on AVs accidents released by the State of California. They have data from 2014, but the data they are currently disclosing is data from 2019 onward. So, we contacted DMV and shared data for 2014–2018. As a result of checking the data, about 300 data existed, and the report format was changed around April 2018.
2.2 Preprocessing AVs Accident Data We collected the accident history information provided by DMV and dataized what it regards will affect the accident situation. As previously described, the format of the report changed from 2018, so there was a limit to dataizing all data. As a result, data has been utilized since April 2018, with accounting for a total of 219 cases. The data determined by the critical variables was referred to in Kang et al. [10]. First of all, we extracted accidents that occurred in autonomous driving mode through ‘Mode’ variable. The data corrected consists of ‘Time,’ ‘AV injury,’ ‘Weather,’ ‘Lighting,’ ‘Roadway surface (Rs),’ ‘Roadway conditions (Rc),’ ‘Vehicle 1,2-Movement Preceding Collision (MPC),’ ‘Vehicle 1,2-Type of Collision (TC),’ and ‘Other associated factors (Other)’ as shown in Table 1. Among them, the extra variables such as MPC 1–1 and 2–1 were two marked values in raw data, which were determined to be variables that affect the accident situation by reviewing the ‘Accident Details’ section. Consequently, we finally obtained 138 data consisting of 15 variables.
4
M. Kang et al.
Table 1 AVs traffic accidents of features Feature
Description
Mode
The driving mode of AVs including autonomous and conventional mode
Time
The time when AVs accident occurred
AV injury
The degree of injury to AVs in the accident including Minor, Major, MOD, etc.
Weather
The weather at the time of the accident
Lighting
The illumination of the surrounding environment at the time of the accident
Roadway surface (Rs)
The type of road surface at the time of the accident
Roadway condition (Rc)
The condition of the road surface at the time of the accident
Movement preceding collision (MPC) of vehicle 1
Vehicle 1 movement before the accident
Movement preceding collision (MPC) of vehicle 1–1
Vehicle 1 (additional) movement before the accident
Movement preceding collision (MPC) of vehicle 2
Vehicle 2 movement before the accident
Movement preceding collision (MPC) of vehicle 2–1
Vehicle 2 (additional) movement before the accident
Type of collision (TC) of vehicle 1
Type of collision that caused the accident
Type of collision (TC) of vehicle 1–1 Type of collision (TC) of vehicle 2 Type of collision (TC) of vehicle 2–1 Other associated factors (other)
Other factors that may be associated with
3 Learning Method and Result In this section, we extract important factors from AVs accidents by utilizing machine learning methodology for the purpose of the study.
3.1 Random Forest Random forests [22] are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest
The Extraction of Automated Vehicles Traffic …
5
and the correlation between them. In other words, while some trees of random forests have the probability of overfitting, it will be prevented by generating numerous trees. Therefore, we utilize random forest technique that minimizes overfitting among the important variable extraction techniques to extract the main factors determining AVs traffic accidents. Among random forest importance variable extraction techniques, we deploy the ‘Drop Columns Importance’ method, which derives importance based on learning accuracy differences when learned except for certain variables.
3.2 Extraction of AVs Collision Factor Importance Prior to applying the random forests, we discovered the following facts from the confirmation of refined data: (1) AVs did not crash with the object (e.g., pedestrian, etc.) in the autonomous driving mode, and (2) in a collision situation, the collision was caused by neighboring vehicles (HVs), not AVs. Therefore, we derived the feature importance of the variables by setting the dependent variable to ‘AV injury’ (see Table 2). As a result of extraction (see Fig. 1), MPC2 and MPC1 were found to be the most important variables, with 24% and 22%, respectively. This is the movement of vehicles in pre-collision, and the movement of HVs (MPC1) is more important than AVs (MPC2). This is because AVs do not cause accidents, and most of the HVs around them are considered to cause accidents. Furthermore, the MPC itself is deemed to be due to the interaction between vehicles on the road. It is widely used as a vehicle safety indicator and is considered to have been derived as an important variable as it is related to time-to-collision (TTC) [23, 24], which means the time remaining immediately before the collision with the other vehicle. Next, ‘Time’ and ‘Lighting’ variables were ranked 3rd and 5th, which is deemed to be because the degree of injury in AVs operation varies depending on environmental conditions (day and night, lighting). This is related to be ‘street’s low-light at night’ environmental conditions like Uber pedestrian fatal collision in 2018 [25]. Finally, the variables Table 2 Learning environment of AVs traffic accidents
Feature
Class
AV injury
Output layer 0–4
Time
Input layer
Weather
Variable 0–3 0–7
Lighting
0–5
Roadway surface (Rs)
0–4
Roadway condition (Rc)
0–7
Movement preceding collision (MPC)
0–19
Type of collision (TC)
0–8
Other associated factors (other)
0–12
6
M. Kang et al.
Fig. 1 Factor for determining AVs collision
for ‘Type of Collision (TC)’ ranked 4th and 6th, respectively. This is considered that the variables can have various effects depending on the seating position of the occupants in the vehicle. In this regard, studies of seating position in the HVs have been conducted [26–28], and many studies have carried out in the AVs sector too [29–33].
4 AVs Traffic Accident Scenarios In this section, we derive a scenario for AVs accidents based on the factors previously extracted.
4.1 Scenario Combinations Based on Collision Situations Scenarios are derived based on the most important factors, Movement Preceding Collision 1 and 2, as previously identified data. Among them, there will be a variety of movement preceding collisions, but we have selected five types of movement in a highway environment with relatively few objects, such as signal light and pedestrians (Table 3). In addition, AVs are carried out on the assumption that they do not cause injury to nearby vehicles (Sect. 3.2), and since the mixed situations between AVs and HVs are expected even after commercialization, the neighboring vehicles are set as HVs.
The Extraction of Automated Vehicles Traffic … Table 3 Assumption for extracting scenarios
7
Environment
Driving behavior
2 lanes’ expressway which has mixed between AVs and HVs
Proceeding straight (PS) Slowing (S) Passing other vehicle (POV) Changing lanes (CL) Merging (M)
Considering the MPC of AVs and the MPC of HVs, the combination of 25 scenarios (5 * 5) is derived, but considering the location and lane of AVs and HVs, a myriad of scenarios (25 * the location * the lanes) are derived. Therefore, we simulate the conditions separated by locational positions (ahead/rear) of AVs in two lanes environment. Over a hundred scenarios [25 * 2(ahead/rear) * 2(two lanes)] are generated which are presented in the order ‘AVs (Ego)-HVs (Neighboring)’ with each given code (movement preceding collisions are divided into PS, S, POV, CL, and M; Table 3).
4.2 AVs Ahead Situation If AVs are forward vehicles, the surrounding vehicle, which is likely to crash, runs behind the AVs. Cases, where an accident may occur accordingly, shall be as shown in Table 4. As a result of the scenario, it was found that the majority of scenarios where rearend collisions occur depend on the behavior of the following neighborhood vehicles because the AVs are the leading vehicle. These scenarios have been found to occur a lot in passing other vehicles, changing lanes, merging situations where there is a lot of interaction between vehicles.
4.3 AVs Rear Situation If AVs are rear vehicles, the surrounding vehicle, which is likely to crash, runs in front of the AVs. Cases, where an accident may occur accordingly, are shown in Table 5. As a result of the scenario, since the Ego vehicle is a rear vehicle, it is expected that AVs maintain an environment where AVs do not crash according to the behavior of the preceding neighboring vehicle. Most traffic accidents occur in situations where there are some interactions such as passing other vehicles, changing lanes, and merging, and it is a frontal collision.
8
M. Kang et al.
Table 4 Scenarios in AVs on ahead situation: it means that yellow car is the AVs, gray car is HVs, and warning sign is crash probability
DescripƟon Environment
In the another lane
In the same lane
PS-PS PS-S PS-POV PS-CL
PS-M
S-PS S-S S-POV S-CL
S-M
POV-PS POV-S POV-POV (continued)
The Extraction of Automated Vehicles Traffic … Table 4 (continued)
POV-CL
POV-M
CL-PS CL-S CL-POV CL-CL
CL-M
M-PS
M-S
M-POV
M-CL
M-M
9
10
M. Kang et al.
Table 5 Scenarios in AVs in the rear situation: it means that yellow car is the AVs, gray car is HVs, and warning sign is crash probability
DescripƟon Environment In the another lane
In the same lane
PS-PS PS-S PS-POV PS-CL
PS-M S-PS S-S S-POV S-CL
S-M
POV-PS POV-S POV-POV (continued)
The Extraction of Automated Vehicles Traffic … Table 5 (continued)
POV-CL
POV-M
CL-PS CL-S CL-POV CL-CL
CL-M
M-PS
M-S
M-POV
M-CL
M-M
11
12
M. Kang et al.
4.4 Comparison with HVs Traffic Accidents NHTSA [34] states that ‘The safety benefits of automated vehicles are par-amount. Automated vehicles’ potential to save lives and reduce injuries is rooted in one critical and tragic fact: 94% of serious crashes are due to human error.’ Prior to the introduction of AVs, physical accident factors between HVs are speed, type of vehicles, time, violation of the law, etc. [10, 35–39]. Even in mixed situations with HVs after AVs are introduced, the majority of accident factors will be caused by HVs, so the factors that determine the accident are probably not expected to change much. However, the introduction of AVs is expected to lead to making the new accident factors. For example, a human driver was aware of AVs, but it is expected that an accident could occur because the human driver himself judged it to be a risk factor for an accident [40]. Furthermore, it is expected that the increasing market penetration rate (MPR) of AVs could lead to unavoidable accident situations between AVs. Although there is a limit to analyzing current data, it needs to be carried out in a way that increases safety by identifying and preventing it in advance through various methods.
4.5 Summary As a result of the overall scenario elicitation, accidents will vary depending on whether AVs are at the forefront or at the rear of the surrounding vehicles. In addition, different collisions may occur in the same situation depending on the lanes of the vehicle being operated. In particular, various accidents are expected to occur in the vicinity of passing lanes and ramp sections where there are many interactions. According to the regulation that the vehicles can only change to the left lane in the POV environment, many side collisions are expected, and it is leading to serious accidents depending on the high-speed operation. In addition, for CL environments, various accidents expected than POV environments due to the free lane change situation. In the M environment, it is the only entering traffic environment on the highway, and more accidents are expected due to the interaction between the entry vehicle and the mainline vehicle. Although these would be similar to the accident situation and type of accident between HVs currently operating, we only derived scenarios consisting of accidents caused by surrounding vehicles because AVs did not cause accidents.
5 Conclusion and Future Research While various studies are being conducted on the safety of AVs, research has recently focused on the development of active safety systems. Among them, the field of AVs
The Extraction of Automated Vehicles Traffic …
13
scenarios that directly affect vehicle safety and testing is actively being studied. However, while countless scenarios are expected to be derived from the process of scenario-making, the tendency to rely on expert knowledge and the lack of data utilization are emerging as problems. Accordingly, the study is being conducted to extract scenarios through minimizing dependencies and utilizing real data. As part of this flow, we have derived scenarios through machine learning techniques using about 300 data obtained from real traffic accidents of AVs. As a result of eliciting determinants of AVs accidents, the MPC of HVs was identified as the most important factor, followed by the MPC of AVs. Based on this (most importantly), we have drawn accident situation scenarios that probably occur in various ways through the extracted vehicle-to-vehicle movement variables. The scenarios were shown up where the most interactions occur and resulted in a number of collisions at the overtaking, lane changes, and merging in the expressway. Such a collision is expected to be similar to that between HVs, but it is considered that there will be a difference in the degree of accident depending on the condition that AVs did not cause accidents. This work is expected to provide practical help in establishing scenarios by deriving basic scenarios through machine learning based on AVs real-world accident data and to mitigate expert knowledge dependencies. Machine learning and AI require a larger amount of data, but in this study, there was a limitation of the lack of data. However, since this is expected to accumulate more autonomous vehicle accident data, this research could be expected to use as basic data in the future. Similarly, we disregarded many situations as scenarios derived from the assumptions of various situations. In addition, since AVs do not interact with only one vehicle, further studies are needed later considering this. It does not mean that AVs can only prevent accidents if they are rear than the surrounding vehicles, but imply that various/basic scenarios can be established by extracting accident determinants based on accident data. In the future, we will be presenting a study to establish scenarios in the multilane expressway and in urban environments where various objects interact with each other. Furthermore, beyond the presentation of simple scenarios, predictions of where accidents will occur from the Ego vehicle’s perspective will directly help improve safety, so we will conduct these studies later. Acknowledgements This work is supported by Ministry of Land, Infrastructure and Transport of Korea (grant 21AMDP-C161754-01) and 2020 Hongik University Research Fund.
References 1. https://time.com/3719270/you-asked-how-do-driverless-cars-work/. Access 03.04.2021 2. Navigant Leaderboard (2020) https://www.greencarcongress.com/2020/03/20200324-nav igant.html 3. Abraham H, Lee C, Brady S, Fitzgerald C, Mehler B, Reimer B, Coughlin JF (2017) Autonomous vehicles and alternatives to driving: trust, preferences, and effects of age. In: Proceedings of the transportation research board 96th annual meeting
14
M. Kang et al.
4. Zhang T, Tao D, Qu X, Zhang X, Lin R, Zhang W (2019) The roles of initial trust and perceived risk in public’s acceptance of automated vehicles. Transp Res Part C Emerg Technol 98:207–220 5. Hartwich F, Witzlack C, Beggiato M, Krems JF (2019) The first impression counts—a combined driving simulator and test track study on the development of trust and acceptance of highly automated driving. Transp Res Part F Traff Psychol Behav 65:522–535 6. Goodall NJ (2014) Ethical decision making during automated vehicle crashes. Transp Res Rec 2424(1):58–65 7. Riedmaier S, Ponn T, Ludwig D, Schick B, Diermeyer F (2020) Survey on scenario-based safety assessment of automated vehicles. IEEE Access 8:87456–87477 8. Kim K, Kim B, Lee K, Ko B, Yi K (2017) Design of integrated risk management-based dynamic driving control of automated vehicles. IEEE Intell Transp Syst Mag 9(1):57–73 9. Kang M, Song J, Hwang K (2020) For preventative automated driving system (PADS): traffic accident context analysis based on deep neural networks. Electronics 9(11):1829 10. Lee H, Kang M, Song J, Hwang K (2020) The detection of black ice accidents for preventative automated vehicles using convolutional neural networks. Electronics 9(12):2178 11. Wen M, Park J, Cho K (2020) A scenario generation pipeline for autonomous vehicle simulators. HCIS 10:1–15 12. Ulbrich S, Menzel T, Reschka A, Schuldt F, Maurer M (2015) Defining and substantiating the terms scene, situation, and scenario for automated driving. In: 2015 IEEE 18th international conference on intelligent transportation systems. IEEE, pp 982–988 13. Webb N, Smith D, Ludwick C, Victor T, Hommes Q, Favaro F, Ivanov G, Daniel T (2020) Waymo. https://time.com/3719270/you-asked-how-do-driverless-cars-work/. Access: 03.04.2021’s safety methodologies and safety readiness determinations. arXiv:2011.00054 14. Stellet JE, Zofka MR, Schumacher J, Schamm T, Niewels F, Zöllner JM (2015) Testing of advanced driver assistance towards automated driving: a survey and taxonomy on existing approaches and open questions. In: 2015 IEEE 18th international conference on intelligent transportation systems. IEEE, pp 1455–1462 15. De Gelder E, Paardekooper JP, Saberi AK, Elrofai H, Ploeg J, Friedmann L, De Schutter B (2020) Ontology for scenarios for the assessment of automated vehicles. arXiv:2001.11507 16. Schwall M, Daniel T, Victor T, Favaro F, Hohnhold H (2020) Waymo public road safety performance data. arXiv:2011.00038 17. Elrofai H, Paardekooper JP, de Gelder E, Kalisvaart S, den Camp OO (2018) Scenario-based safety validation of connected and automated driving. Netherlands Organization for Applied Scientific Research, TNO, Technical Report 18. Pütz A, Zlocki A, Bock J, Eckstein L (2017). System validation of highly automated vehicles with a database of relevant traffic scenarios. Situations 1:E5 19. Erdogan A, Ugranli B, Adali E, Sentas A, Mungan E, Kaplan E, Leitner A (2019) Real-world maneuver extraction for autonomous vehicle validation: a comparative study. In: 2019 IEEE intelligent vehicles symposium (IV). IEEE, pp 267–272 20. Webb N, Smith D, Ludwick C, Victor T, Hommes Q, Favaro F, Ivanov G, Daniel T (2020) Waymo’s Safety Methodologies and Safety Readiness Determinations. arXiv:2011.00054 21. https://www.dmv.ca.gov/portal/vehicle-industry-services/autonomous-vehicles/autonomousvehicle-collision-reports/ 22. Breiman L (2001) Random forests. Mach Learn 45(1):5–32 23. Nadimi N, Ragland DR, Mohammadian Amiri A (2020) An evaluation of time-to-collision as a surrogate safety measure and a proposal of a new method for its application in safety analysis. Transp Lett 12(7):491–500 24. Li Y, Wu D, Lee J, Yang M, Shi Y (2020) Analysis of the transition condition of rear-end collisions using time-to-collision index and vehicle trajectory data. Accident Anal Prevent 144:105676 25. https://time.com/5205767/uber-autonomous-car-crash-arizona/. Access 10.03.2021 26. Lund UJ (2005) The effect of seating location on the injury of properly restrained children in child safety seats. Accid Anal Prev 37(3):435–439
The Extraction of Automated Vehicles Traffic …
15
27. Choi J (2017) Multinomial logit framework to evaluate the impact of seating position on senior occupant injury severity in traffic accidents. J Korean Soc Safe 32(3):141–150 28. Viano DC, Parenteau CS, Edwards ML (2007) Rollover injury: effects of near-and far-seating position, belt use, and number of quarter rolls. Traffic Inj Prev 8(4):382–392 29. Koppel S, Jiménez Octavio J, Bohman K, Logan D, Raphael W, Quintana Jimenez L, LopezValdes F (2019) Seating configuration and position preferences in fully automated vehicles. Traffic Inj Prev 20(sup2):S103–S109 30. Lopez-Valdes FJ, Bohman K, Jimenez-Octavio J, Logan D, Raphael W, Quintana L, Fueyo RSD, Koppel S (2020) Understanding users’ characteristics in the selection of vehicle seating configurations and positions in fully automated vehicles. Traffic injury prevention, pp 1–6 31. Forman J, Lin H, Gepner B, Wu T, Panzer M (2018) Occupant safety in automated vehicles— effect of seatback recline on occupant restraint. JSAE, Paper Number 20185234 32. Jin X, Hou H, Shen M, We H, Yang K (2018) Occupant kinematics and biomechanics with rotatable seat in autonomous vehicle collision: a preliminary concept and strategy. IRCOBI, Athens, Greece. Sept 12–14 33. Kitagawa Y, Hayashi S, Yamada K, Gotoh M (2017) Occupant kinematics in simulated autonomous driving vehicle collisions: influence of seating position, direction and angle. Stapp Car Crash J 61:101–155 34. US Department of Transportation (2018) Preparing for the future of transportation: automated vehicles 3.0 35. Harb R, Yan X, Radwan E, Su X (2009) Exploring precrash maneuvers using classification trees and random forests. Accid Anal Prev 41(1):98–107 36. De Oña J, Mujalli RO, Calvo FJ (2011) Analysis of traffic accident injury severity on Spanish rural highways using Bayesian networks. Accid Anal Prev 43(1):402–411 37. Rolison JJ, Regev S, Moutari S, Feeney A (2018) What are the factors that contribute to road accidents? An assessment of law enforcement views, ordinary drivers’ opinions, and road accident records. Accid Anal Prev 115:11–24 38. Zeng Q, Gu W, Zhang X, Wen H, Lee J, Hao W (2019) Analyzing freeway crash severity using a Bayesian spatial generalized ordered logit model with conditional autoregressive priors. Accid Anal Prev 127:87–95 39. Tang J, Liang J, Han C, Li Z, Huang H (2019) Crash injury severity analysis using a two-layer Stacking framework. Accid Anal Prev 122:226–238 40. Kim K, Cho SA (2020) Lessens learned from crash types of automated vehicles: based on accident data of automated vehicles in California, USA. Korean Soc Transp 17(2):34–42 (9 p). (in Trans)
Leaf Disease Identification in Rice Plants Using CNN Model Allam Sushanth Reddy and Jyothi Thomas
Abstract Rice is a staple food crop for more than 10 countries. High consumption of rice demands better yield of crop. Fungal, bacterial and viral are different classes of diseases damaging rice crops which results in low and bad yield as per quality and quantity of the crop. Some of the most common diseases affecting plants are fungal blast, fungal brown spot, fungal sheath blight, bacterial blight and viral tungro. The deep learning CNN model with ResNet50V2 architecture was used in this paper to identify disease on the paddy leaves. Mobile application proposed in this paper will help farmers to detect disease on the leaves during their regular visit. Images were captured using this application. The captured images were tested using the trained deep learning model embedded with mobile application. This model predicts and displays input images along with the probabilities compared to each disease. The mobile application also provides necessary remedies for the identified disease with the help of hyperlink available in mobile application. The achieved probability that the model can truly classify the input image in this project was 97.67%, and the obtained validation accuracy was 98.86%. A solution with which farmers can identify diseases in rice leaves and take necessary actions for better crop yield has been demonstrated in this paper. Keywords Deep learning · CNN · ResNet50V2 · Mobile application
1 Introduction Rice is a staple food crop for over half the world’s population. Countries consuming rice as a staple food are China, India, Bangladesh, Indonesia, Vietnam, Philippines, Thailand, Burma, Japan, Brazil, etc., with China on the top. There are various diseases which affect crop yield. Fungal blast, fungal brown spot, fungal sheath blight, bacterial blight and viral tungro are predicted in this research. The total average rice yield loss in a year is about 33% in India. Out of total yield loss, 35% is by blast, 25% by A. S. Reddy (B) · J. Thomas Christ (Deemed-to-be-University), Bangalore, Karnataka, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. Saraswat et al. (eds.), Congress on Intelligent Systems, Lecture Notes on Data Engineering and Communications Technologies 114, https://doi.org/10.1007/978-981-16-9416-5_2
17
18
A. S. Reddy and J. Thomas
sheath blight, 20% by bacterial blight, 10% by tungro and remaining 10% by other diseases. The diseases stated above affect leaves which help in the photosynthesis process. As the photosynthesis process is a key tool to have high yield, the attack of these diseases on leaves reduces the yield. Early detection of rice leaf diseases will help farmers to have healthy plant growth and reduce yield loss. This detection is done manually with naked eye and the gained knowledge of diseases from farmers’ experience which might not be accurate or with the help of experts who may charge. To detect diseases on paddy leaves, a deep learning model along with a mobile application and camera functionality could solve the problem by identifying and learning patterns in classified training images which are used to predict unseen diseased images. The remedies were provided using hyperlinks to web pages available online for further reading and control of diseases.
2 Related Work Naga Swetha R. and V. Shravani proposed that the use of SVM and KNN model identified leaf disease with Raspberry Pi and a cloud-based solution is used to let farmers know what remedies need to be taken [1]. Lakshmana Rao Rowthu, M. Naveen Kumar and Ch. Narayana Rao proposed that conversion of RGB values to HSV values reduced colour issues. Image segmentation is done based on the Otsu method, and the obtained data is used to classify images with various models, where CNN classifier gives a higher accuracy rate of 97.58% [2]. Swarup Rautaray, Manjusha Pandey, Mahendra Kumar Gourisaria, Ritesh Sharma and Sujay Das used CNN model with transfer learning and machine learning approach on VGG-16 model which yields test accuracy of 90%, and the dataset used is unknown [3]. Md. Mafiul Hasan Matin, Amina Khatun, Md. Golam Moazzam and Mohammad Shorif Uddin applied AlexNet technique to detect the three prevalence of rice leaf diseases and got an outcome of 99% accuracy with 120 images augmented to a total of 900 images [4]. M. Shahriar Sazzad, Ayrin Anwar, Mahiya Hasan and Md. Ismile Hossain stated that use of CNN model with 300 images has achieved an accuracy of 97.43% [5]. Sharath N. Payyadi, Varun S. D., Satya Gururaj Kalluru and Archana R. Kulkarni states that the disease identification in paddy leaf was developed using CNN algorithm and use of Raspberry Pi with camera module will provide the result. GSM module was used to send the message regarding the disease and remedies that need to be taken for farmers, and achieved an accuracy of 95% [6]. A. Sony used R programming language to train CNN model and achieved an accuracy of 86.6% with 3 classes of diseases [7]. Maadugundu Jyothi, Ajay Kumar and Sirisha identified features that are extracted from images using Haralick’s texture feature from colour co-occurrence matrix, and an ANN model is trained to identify the pattern [7]. Hari Krishnan, Priyadharshini K., Gowsic M., Mahesh N., Mr. S. Vijayananth and Mr. P. Sudhakar develop a model using the k-means algorithm to form different clusters. Classification of diseases was done based on SVM [8]. K. S. Archana and Arun Sahayadhas
Leaf Disease Identification in Rice Plants Using CNN Model
19
converted leaf image from RGB to HSV format. The diseased spot was identified using hue value and using k-means algorithm [9]. S. Ramesh and D. Vydeki developed a system where the images captured by camera will be processed on Raspberry Pi using the OpenCV Python module. The image is processed using five various algorithms, and the results are uploaded to cloud and can be viewed on mobile app [10]. Suresha M., Shreekanth K. N. and Thirumalesh B. V. proposed a method for identification of blast and brown spot diseases. Global threshold method has been applied, and a KNN classifier has been used to classify the data [10]. Amrita A. Joshi and B. D. Jadhav extracted features and combined them as per the diseases, and these diseases have been classified using minimum distance classifier (MDC) and k-nearest neighbour classifier (KNN) [11]. S. Pavithra, A. Priyadharshini, V. Praveena and T. Monika used SVM classification to analyse the input image and the feature extraction techniques [12]. John William Orillo, Jennifer Dela Cruz, Leobelle Agapito, Paul Jensen Satimbre and Ira Valenzuela used back propagation neural network technique to enhance accuracy and performance of image processing with MATLAB [13].
3 Proposed Method There are various methods available to identify leaf diseases in plants. Every method was an evolution of different strategies using CNN model. Figure 1 shows the flow diagram of the proposed study.
3.1 Dataset Dataset is an essential part of the process which is used to train and test the model. The dataset collected from Kaggle datasets consists of 2800 images along with data augmentation. The dataset was appended with the new class of “healthy” images along with data augmentation to match the available dataset. The new dataset consists of 3360 images with 6 classes, namely fungal blast, fungal brown spot, fungal sheath blight, bacterial blight, viral tungro and healthy. Each class in the training and validation set consists of 510 and 50 images, respectively.
3.2 Pre-processing The dataset used in this study was partially pre-processed. The dataset consists of rotated images, padding images, left and right shift images with reflection property applied. The dataset does not consist of images with the same dimensions. The new class of images added to the dataset has undergone data augmentation to have similarities in the dataset. The images in the dataset are resized and saved as 256 ×
20
A. S. Reddy and J. Thomas
Identify the
dataset Pre-process the dataset
Test using Mobile App
Data Normalization
Develop & deploy using Mobile App
Create a Model
Convert model to tflite format
Train & validate the model
Test the model
Fig. 1 Workflow diagram
256 pixels using cv2 resize technique. All the images are stored in Google Drive for training and testing purposes.
3.3 Data Normalization The Google Drive is mounted to Google Colab to develop and train the model. The images in the pre-processed dataset stored in the Google Drive were converted to a 3-dimensional image array. All the converted images are placed into another array where each element in the array is a 3-dimensional array. All the image classes are also identified and stored to another array for mapping purposes. The images by default are in UInt8 format which has the range of 0–255. The data type of the image arrays was converted to float 32 format of range [0,1] for faster execution of arithmetic operations, faster model training and execution purpose.
Leaf Disease Identification in Rice Plants Using CNN Model
21
3.4 Create a Model CNN model with ResNet50V2 architecture was used in this study. ResNet50V2 model is a quantized version of the ResNet-152 layer architecture which is used to train the model with thousands of classes. This V2 model can learn and classify patterns for a smaller number of classes and achieve results almost similar to the 152 layer ResNet model. The main advantage of the ResNet model is that it can learn the difference between input and output layers for any convolutional block in the architecture where the input is added to the output. The version 1 architecture uses weights, batch normalization and ReLU one after the other. The version 2 architecture uses batch normalization, ReLU and weights in order as shown in Fig. 2 which gives higher accuracy than the version 1 architecture. Max pooling layer is applied only when we need to reduce the size of the channel. In this study, a predefined ResNet50-V2 model was used given by Keras framework. The channels were flattened, and dense and dropout layers were applied to
Xi
Xi
Weight
BN
BN
ReLU
ReLU
Weight
Weight
BN
BN
ReLU Weight
addion
addion
ReLU
Xi+1 (a) ResNet - V1 Fig. 2 ResNet50 version 1 and 2
Xi+1
(b) ResNet - V2
22
A. S. Reddy and J. Thomas
Fig. 3 One cycle policy learning rate
classify the trained feature maps into various groups. Dropout function was used to set random pixels to zero so as to prevent the model from overfitting.
3.5 Training and Validation To train the model, one cycle policy learning rate technique with predefined min and max learning rates was used. This approach will select the lower learning rate to 1/5th or 1/10th of the max learning rate as shown in Fig. 3. Train the model with this learning rate as starting, and reach the max learning rate within the 30% of the epochs in the first step. The learning rate keeps reducing and reaches min learning rate within 80–90% of the epochs. The remaining number of epochs is trained with a very low learning rate so as to gain high accuracy and prevent overfitting.
3.6 Test the Model The “evaluate” function was used to evaluate the model with the obtained weights. The evaluation function shows the result in terms of probability which is testing loss and accuracy.
3.7 Convert the Model to .tflite Format In order to develop mobile application, the model was converted to TensorFlow Lite format with less than 200 Mb. It is always recommended to quantize the model to reduce the size of mobile application. Hence, post-training dynamic quantization technique was applied where model’s weights are quantized and activations are
Leaf Disease Identification in Rice Plants Using CNN Model
23
dynamically quantized at inference. In this method, size of the model was reduced to half of its original size. The accuracy when converted to TensorFlow Lite format with post-training dynamic quantization degrades slightly.
3.8 Develop and Deploy Using Mobile App To develop a mobile application, load the tflite model to the ml folder in Android Studio. The model when opened in Android Studio gives sample execution of the code which can be used to develop further to get the desired probabilities. The TensorFlow buffers, tensor image conversion and image normalization techniques help to develop applications in a much easier way. The execution of the model can be achieved using one function in Android Studio. The output values are normalized to probabilities and are shown in UI design of the mobile application. The image can either be captured using a mobile camera by clicking on the “camera” button or can be loaded to the application from the mobile gallery by clicking “Gallery” button. The remedies and further reading of the predicted disease are available on the website which can be accessed using the link specified in the mobile app with one click.
3.9 Test the Mobile App The developed mobile application is loaded to Android mobile device and tested. The accuracy predicted by the original model and the quantized model slightly varies due to quantization.
4 Experimental Results A tool for farmers to predict diseases accurately with the help of technology is like climbing a step towards the advancement in agriculture. The problem of misclassification with naked eye has been solved by the use of a deep learning ResNet50V2 model that will support farmers to identify leaf diseases accurately in early stages of crop. Figure 4 shows the training and validation accuracy obtained by the proposed model. Figure 5 shows the training and validation loss obtained by the model. Results show the ResNet model has started with very low accuracy, and as the training progresses, the model achieved training accuracy of 98.86% and validation accuracy of 97.67%. Figure 5 shows how the training and validation loss has been decreasing along with training and validation accuracy increases while the training progresses. The model has been able to achieve best performance with lower error
24
A. S. Reddy and J. Thomas
Fig. 4 Training and validation accuracy
Fig. 5 Training and validation loss
rate. Figures 6, 7, 8, 9, 10 and 11 show the working functionality of mobile application developed in this study. Figures 7 and 11 show the probability 94.86% and 98.8%, respectively, for the diseased images given to the model using mobile application. The disease on both the leaves with fungal sheath blight and bacterial blight was predicted correctly. Figure 8 shows the remedies of the disease while clicking on the hyperlink available.
5 Experimental Set-Up The hardware specifications for the proposed study were Windows 10 Operating System with 8 Gb RAM and 1 TB hard disc. Samsung Galaxy A71 (API 30 or above) with 16 mega pixel camera or above was used to execute the mobile application. Google Drive was used to store the dataset which consists of 3360 images. Android Studio 4.3 was used to develop mobile application for this study. The size of the TensorFlow Lite file generated for this study was more than 200 Mb which normally does not support for mobile application development. Post-training dynamic quantization compression technique was applied and obtained a reduced file of 56.2 Mb.
Leaf Disease Identification in Rice Plants Using CNN Model
25
Fig. 6 Application UI
Validations were performed on 300 real-time images captured from paddy fields in Nallabelly, Warangal Rural District, Telangana. Results were verified and validated with the agricultural officer in the locality. Proposed study included healthy images and diseased images for better prediction.
26
A. S. Reddy and J. Thomas
Fig. 7 Image selected from gallery along with prediction
6 Discussion The trained model in this research was compared with three models, namely CNN classifier [2], AlexNet [4] and CNN model with feature enhancement [5] as shown in Fig. 12. The test accuracy of these models was 97.58%, 99% and 97.43%, respectively. Though 99% accuracy was obtained using AlexNet with 120 images which are
Leaf Disease Identification in Rice Plants Using CNN Model
27
Fig. 8 Description of the identified disease
augmented to 900 images, the proposed study with ResNet50V2 architecture used 3360 images which are classified into 77 batches with 32 images in each batch and achieved an accuracy of 97.67%. In this study, ResNet50V2 model’s fully connected layer was modified using dropout, dense, batch normalization and ReLU layers to avoid overfitting or underfitting.
28
A. S. Reddy and J. Thomas
Fig. 9 Direct capture using mobile camera
The proposed work could be extended to identify the exact paddy leaf images and predict the disease that support mobile application in iOS device.
Leaf Disease Identification in Rice Plants Using CNN Model
29
Fig. 10 Accepted image
7 Conclusions The dataset was identified, and pre-processing techniques were applied to the data along with addition of a new class in the train and test set. The images were normalized which was passed to the model for training purposes. The ResNet50V2 model was used with modified fully connected layers. The training and validation have been
30
A. S. Reddy and J. Thomas
Fig. 11 Predictions of captured image
accomplished successfully with an accuracy of 98.86%. The test set with 300 images was evaluated and found to have an accuracy of 97.67%. The trained model was quantized and converted to TensorFlow Lite (.tflite) format to develop a mobile application. The quantized model was deployed in mobile application successfully, and the input image disease was predicted with probabilities. The use of proposed model helps farmers to predict diseases using mobile camera which is cost effective.
Leaf Disease Identification in Rice Plants Using CNN Model
31
Compara ve Results 120.00% 100.00% 80.00% 60.00% 40.00% 20.00% 0.00%
97.58%
99%
CNN Clasifier
AlexNet
97.43%
97.67%
CNN Model with CNN ResNet50V2 Feature Enhancement
Model Name Fig. 12 Comparative results
References 1. Naga Swetha R, Shravani V (2020) Monitoring of rice plant for disease detection using machine learning. Int J Eng Adv Technol 9(3):851–853 2. Rowthu LR et al (2020) Early identification of rice plant diseases using machine learning algorithms. J Inf Comput Sci:368–372 3. Rautaray SS et al (2020) Paddy crop disease prediction—a transfer learning technique. Int J Recent Technol Eng (IJRTE) 8(6):1490–1495 4. Matin MMH et al (2020) An efficient disease detection technique of rice leaf using AlexNet. Sci Res Publishing 8:49–57 5. Sazzad TMS et al (2020) An image processing framework to identify rice blast. In: International congress on human computer interaction, optimization and robotic applications (HORA), pp 1–5 6. Payyadi SN et al (2020) Disease detection in paddy crop using CNN algorithm. Int J Recent Technol Eng 8(6):5298–5304 7. Maadugundu J et al (2019) Oryza: an IOT application to detect rice crop diseases uses image processing. Int J Sci Dev Res 4(12):190–197 8. Krishnan H et al (2019) Plant disease analysis using image processing in MATLAB. In: International conference on system, computation, automation and networking (ICSCAN), pp 1–3 9. Archana KS, Sahayadhas A (2018) Automatic rice leaf disease segmentation using image processing techniques. Int J Eng Technol 7(3.27):182–185 10. Suresha M et al (2017) Recognition of diseases in paddy leaves using kNN classifier. In: 2nd international conference for convergence in technology (I2CT), pp 663–666 11. Joshi A, Jadhav BD (2016) Monitoring and controlling rice diseases using image processing techniques. In: International conference on computing, analytics and security trends (CAST), pp 471–476 12. Pavithre S et al (2015) Recognition of diseases in paddy leaves using kNN classifier. Int J Commun Comput Technol 3(1):16–20 13. Orillo JW et al (2014) Identification of diseases in rice plant (Oryza sativa) using back propagation artificial neural. In: International conference on humanoid, nanotechnology, information technology, communication and control, environment and management (HNICEM), pp 1–6
Twitter Sentiment Analysis Based on Neural Network Techniques Ashutosh Singal and Michael Moses Thiruthuvanathan
Abstract Our whole world is changing everyday due to the present pace of innovation. One such innovation was the Internet which has become a vital part of our lives and is being utilized everywhere. With the increasing demand to connected and relevant, we can see a rapid increase in the number of different social networking sites, where people shape and voice their opinions regarding daily issues. Aggregating and analysing these opinions regarding buying products and services, news, and so on are vital for today’s businesses. Sentiment analysis otherwise called opinion mining is the task to detect the sentiment behind an opinion. Today, analysing the sentiment of different topics like products, services, movies, daily social issues has become very important for businesses as it helps them understand their users. Twitter is the most popular microblogging platform where users put voice to their opinions. Sentiment analysis of Twitter data is a field that has gained a lot of interest over the past decade. This requires breaking up “tweets” to detect the sentiment of the user. This paper delves into various classification techniques to analyse Twitter data and get their sentiments. Here, different features like unigrams and bigrams are also extracted to compare the accuracies of the techniques. Additionally, different features are represented in dense and sparse vector representation where sparse vector representation is divided into presence and frequency feature type which are also used to do the same. This paper compares the accuracies of Naïve Bayes, decision tree, SVM, multilayer perceptron (MLP), recurrent neural network (RNN), convolutional neural network (CNN), and their validation accuracies ranging from 67.88 to 84.06 for different classification techniques and neural network techniques. Keywords Twitter sentiment analysis · Naïve Bayes · Decision tree · SVM · Multilayer perceptron (MLP) · Recurrent neural network (RNN) · Convolutional neural network (CNN) · Comparison of various classification techniques
A. Singal (B) · M. M. Thiruthuvanathan Department of Computer Science and Engineering, School of Engineering and Technology, Christ (Deemed to be University), Bangalore 560074, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. Saraswat et al. (eds.), Congress on Intelligent Systems, Lecture Notes on Data Engineering and Communications Technologies 114, https://doi.org/10.1007/978-981-16-9416-5_3
33
34
A. Singal and M. M. Thiruthuvanathan
1 Introduction Today, Twitter acts as the largest pool of personal opinions on the Web where the opinions range from the movie, they saw to a remark on the phone they use, their political views to their religious views, and sometimes even what they are thinking at the moment. Therefore, this makes Twitter the best corpus and hence this allows us to diversify the applications of sentiment analysis. Research shown by Rambocas and Gama [1] suggested that millions of people use social media to voice their opinions, feelings, and their daily life. Today, people express anything from social life to their experience with a product or service on social media sites. Such sites help people to connect with like-minded people where they interact with each other to influence and inform other. Furthermore, social media sites act as a great platform from small and big businesses alike to know their users and advertise products and services they might like or to communicate directly with them to get their viewpoint on a particular product or service. Due to this public communication, the success or failure of a business cannot be hidden. Jose et al. [2] explained that 87% of Internet users get influenced by costumer reviews in their buying decision, so we can suggest that social networking can alter the decision and behaviour of a customer. In other words, finding customer sentiments can play a huge part in designing a game plan to complete with other businesses and improving a product to lure more customers in. Till date, there have been research projects that use sentiment analysis to extract the general opinion of public with respect to any political issue. Recently due to the increasing act of hostility and a lot of false communication on social networking sites like Twitter, Facebook, Instagram, etc., the Government of India has shown their concern over censorship on such sites. Social media users have continued to be against any such proposals regarding restrictions on posting various types of content. Provided we are against censorship of social networking sites, this trend of sentiment analysis research on Twitter can be extended to various practical uses in business (intelligent marketing, feedback), technology (personal recommendation system), and not just limited to politics.
2 Literature Survey 2.1 Related Work Pak and Paroubek [3] gave the reasons why microblogging sites are a great corpus for sentiment analysis. They concluded: • Microblogging platforms are used by millions of different people to express their opinions about everything; thus, it is an important source of people’s sentiments.
Twitter Sentiment Analysis Based on Neural Network Techniques
35
• Since Twitter’s userbase is a mix of common people to celebrities to even country presidents, it is possible to get posts from users of different social and economic groups. • Twitter users hail from different parts of the globe and hence show different traits. Samsir et al. [4] used Naïve Bayes algorithm to find out the general public’s opinion to Indonesian President Jokowi’s policies regarding the fight against COVID. This study was conducted during the early days of the start of COVID pandemic where it was found that 35% of the data had positive sentiment, 49% had negative, and 20% had neutral sentiment. Batra and Rao [5] used a dataset consisting of tweets for a span of 2 months from June 2009, which summed up to 60 million tweets. Then, the entity was extracted using Stanford NER, followed by URL’s and mentions being used as argument for the entities found. A dataset of 200,000 online reviews of different products which were labelled positive and negative was used to train the model. Using this, the model computed the probability that a chosen bigram or unigram is being used in a positive context, and the same computation was done in a negative context. Machuc et al. [6] used machine learning approach to find out the sentiment of people online regarding COVID pandemic. The model they built gave them a classification accuracy of 78.5%, and the divide between positive and negative tweets was 54% and 46%, respectively.
2.2 Opinion Mining Opinion mining refers to a branch of NLP which involves the computational study of emotions, opinions, sentiments, etc., which are expressed in text format as suggested by Carpenter and Way [7] and Francesco and David [8] that opinion mining has application in many domains like accounting, law, marketing, education, technology, etc. In the initial days, social media platforms gave Web users a platform to express their opinions, thoughts, etc., as suggested by Pak and Paroubek [3]. Munjal et al. [9, 10] proposed some models for opinion dynamics based on Ostwald ripening [9] and natural phenomenon [10].
2.3 Twitter Twitter is a microblogging platform where users can share short information called tweet which is limited to 280 characters. Here, users express their opinions about everything going on around them. And hence, it makes Twitter a great platform to extract opinions of the general population on a specific topic. A collection of these tweets can be used to create a perfect dataset for sentiment analysis, which is otherwise called opinion mining or NLP.
36
A. Singal and M. M. Thiruthuvanathan
2.4 Twitter Sentiment Analysis Sentiments can be found tweets or the comments on that tweet, and they also provide useful indicators for many different purposes as suggested by Annett and Kondrak [11]. Additionally, Saif et al. [12] and Prabowo and Thelwall [13] said that sentiments can be categorized into positive and negative. Sentiment analysis is a method to extract subjectivity and polarity from the semantic orientation which shows us the sentiment of the text or phrase as suggested by the Taboada et al. [14]. Also, Alswaidan and Menai [15] showed us that there are 2 ways how emotions are recognized, i.e. explicit and implicit. Munjal et al. [16] also proposed a framework for sentiment analysis, where the explicit emotion recognition is keyword based and the implicit emotion recognition is rule based or machine learning or deep learning based. From this, we can say that there are two main ways to extract sentiment: lexicon-based approach and machine learning-based approach. Lexicon-Based Approach Lexicon-based approach uses a predefined list of words where each word is already given a specific sentiment. Also, Sharma and Dey [17] stated that a lexicon sentiment is used to detect word carrying opinion in the dataset and to predict the opinion expressed in the text. Additionally, Goncalves et al. [18] showed a flaw in lexiconbased approach that it cannot adapt and create trained models for a specific context or purpose which learning-based approach can do. Machine Learning-Based Approach Machine learning-based methods often need a supervised classification-based approach where the sentiment detection is done in a binary manner, i.e. either positive or negative. This approach also requires labelled data to train the models. Additionally, Goncalves et al. [18] mentioned that machine learning-based approach is more suitable for Twitter sentiment analysis when compared to lexicon-based approach.
3 Data Description The training dataset used to train the models is entered in the form of a csv file which is in the format of tweet id, sentiment of tweet, tweet. Here, tweet id is a unique number assigned to distinguish each tweet, sentiment of tweet is given in 1 or 0 where 1 is positive and 0 is negative, and then the tweet itself is enclosed in double quotes (“”). Likewise, the test dataset is a csv file in the format tweet id, tweet. The datasets are a combination of words, emojis, special symbols, links, and mentions (username of a person who the user wants to refer to, here the username precedes with @ symbol). Here, the words and emojis play a part in predicting sentiment, whereas symbols, links, mentions do not and hence are overlooked. Due to
Twitter Sentiment Analysis Based on Neural Network Techniques
37
Table 1 Stats of pre-processed train dataset Total
Unique
Max
Positive
Negative
Tweets
1,600,000
–
Average –
–
800,000
800,000
User mentions
786,298
–
0.4914
12
–
–
Emoticons
13,549
–
0.0085
6
11,592
1957
URLs
77,480
–
0.0484
5
–
–
Unigrams
19,650,571
278,673
12.2816
40
–
–
Bigrams
18,054,919
3,263,017
11.2843
–
–
–
Table 2 Stats of pre-processed test dataset Total
Unique
Max
Positive
Negative
Tweets
300,000
–
Average –
–
206,740
93,260
User mentions
222,935
–
0.7431
12
–
–
Emoticons
3173
–
0.0106
5
2845
328
URLs
8625
–
0.0287
4
–
–
Unigrams
3,196,551
86,302
10.6551
40
–
–
Bigrams
2,898,369
782,155
9.6612
–
–
–
this divide of important and not important parts of the dataset, they are pre-processed to regulate the dataset. The datasets used to train and test the techniques have 1,600,000 and 300,000 tweets, respectively. After pre-processing the dataset as reported in the next section, initial statistical analysis was done and the statistics are shown in Tables 1 and 2 for training dataset and test dataset, respectively.
4 Proposed Methodology 4.1 Pre-processing Tweet is generally casual in nature and is a combination of words, emojis, special symbols, links, and mentions; hence, the dataset is noisy, and as Prathap et al. [19] suggested we can also add that pre-processing will help in improving the accuracy of the model so the dataset needs to be standardized to dataset which the classifiers can use to build an optimized model. Also keeping in mind what Dawei et al. [20] said, we are processing URL’s, mentions, and hashtags. This also helps in reducing the size of the dataset. The standardization process goes as follows: • All tweets are converted to lowercase. • Remove the quotes and spaces at the end of the tweets.
38
A. Singal and M. M. Thiruthuvanathan
Table 3 Regex to handle tweets Element
Regex
Replacement
URL
((www\.[\S]+)|(https?://[\S]+))
URL
Mentions
@[\S]+
USER_MENTION
Hashtag
#(\S+)
Tag without #
More than 1 dot (.)
\.{2,}
Space ()
More than 1 space ()
\s+
Space ()
Table 4 Regex to handle emojis Emoji(s)
Emotion
Regex
Replacement
: D,: D,:-D, xD, x-D, XD, X-D
Laugh
(:\s?D|:-D|x-?D|X-?D)
EMO_POS
:),:),:-), (:, (:, (-:,:’)
Smile
(:\s?\)|:-\)|\(\s?:|\(-:|:\’\))
EMO_POS
< 3,:*
Love
(30 million by the year
B. Saju (B) · R. Rajesh CHIST (Deemed to be University), Bangalore, India e-mail: [email protected] B. Saju New Horizon College of Engineering, Bangalore, Karnataka, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. Saraswat et al. (eds.), Congress on Intelligent Systems, Lecture Notes on Data Engineering and Communications Technologies 114, https://doi.org/10.1007/978-981-16-9416-5_25
355
356
B. Saju and R. Rajesh
2020, and these numbers are likely to grow in the future as lifespan, and thereby the aging populous increases. Cataract being one of the major diseases faced by elderly people, a group getting expanded worldwide, and this age-related visual disease has gained a great interest. Research related to cataract has involved many studies using animals with induced cataracts also by using human lenses which removed from cataract patients. Cataract reduces the transparency of the lens in the eye, interrupts the flow of light through it to the retina, and eventually turns the lens opaque. It is the clouding of the lens in the eye that affects vision. As the cataracts grow in the eye, vision will gradually decrease, since the amount of light that can be penetrated into the lens decreases. Commonly seen in the elderly, cataracts can be developed in one or both the eyes. Blurred vision is one of the earliest symptoms of cataracts. As the lens becomes cloudier, sight gets reduced, finally resulting in total blindness. Usually this condition does not induce physical pain. On the basis of the place where cataracts form, they can be categorized into anterior cortical cataract, posterior cortical cataract, anterior polar cataract, posterior polar cataract, anterior sub-capsular cataract, and posterior sub-capsular cataract. Clinical expertise is mandatory for identifying cataract through a manual process, and hence, the shortage of trained ophthalmologists [1] poses a difficult challenge mainly in developing countries or rural area. Scales of the clinical grading has an inherent limitation where the results of staging are compromised by inter-examiner variability [2]. In short, the traditional method of cataract identification, which requires an ophthalmologist’s expertise, has a very limited reach for the screening [3]. Hence, it is the need of the hour to search for novel or alternative methods to overcome the current limitations in cataract detection [4].
1.1 Classification of Cataract Cataract is mainly classified into six categories depending on the position in which cataract forms. Capsular Cataract It is a type of cataract which affects the opacity of capsule of the lens. It is also known as complicated cataract or secondary cataract. Diminished or blurred vision is the first symptom of capsular cataract. At this stage, the patient can identify an object only closer to him. In some patients, double vision may occur due to the continuous clouding of lens. Anterior capsular cataract: It is a type of cataract in which these type of cataract may bounded by a base membrane. Like capsular cataract, these types of cataract have blurred vision or diminished vision as initial symptoms. Posterior capsular cataract: It is a type of cataract in which it is mainly present at the posterior portion of lens. The affected person can also experience clear vision
A Comprehensive Study on Computer-Aided …
357
only if the object is close to him. Here also, blurred vision or diminished vision is the first symptoms. Sub-capsular Cataract This is another classification of cataract which mainly appears at the back of the lens. Cortico-steroid-induced cataract is also included in this category. Reduced vision in presence of light is the initial symptoms of this type of cataract. It also classified into two as follows. Anterior sub-capsular cataract (ASCC): It is a type of capsular cataract which mainly appears at the anterior portion of the lens. It may cause wrinkled appearance in the lens capsule due to the resolving of myofibroblast. ASC is danger in person with who have taken steroid medication. Posterior sub-capsular cataract (PSCC): It is a type of cataract and is generally occurred due to the irregular arrangement of meridional rows of transitional cells. The presence of abnormal bladder like cells in the meridion row region is the cause of irregularity in transitional cells [5]. PSCCs sometimes associated with diseases such as retinal detachment, diabetes, and surgical trauma. It is considered as a serious problem in persons who have taken steroid medications. It is the fastest type of cataract compared to other two category. At the initial stage, a good quality bifocal lens can be used to control PSCC, but in sometimes should need a surgery to cure the disease. Cortical Cataract (CC) It is mainly occurred at the cortex of lens, hence the name. The major symptoms of CC include blurred vision, difficulty during drive at night, glare from artificial lighting, and loss of vision at sometimes [6]. It is formed slowly on our eyes and does not cause any disturbance at the initial stage. It can be treated by use of a prescribed glass or surgery. The surgery can be performed by the help of a skilled surgeon. CC surgery is safe and very effective procedure. Supranuclear Cataract (SNC) It is a type of cataract in which painless clouding is occurred at the nuclear portion of lens. It is considered as a danger situation in most of the patients. The patient who have supranuclear cataract is not able to read properly. Double vision and blurred vision are the symptoms at the initial stage. Nuclear Cataract (NC) It is developed on the nuclear portion of lens. This kind of cataract is commonly present in elder persons or older adults. The persons who have CC have thick and hard lens. At the severe stage of CC, the lens appears as brown in color [7]. Every object is appeared as dull or blurred in the eyes of affected person. The major symptoms include difficulty to see at bright light, appearance of faded color, double vision, difficulty during distant vision, etc.
358
B. Saju and R. Rajesh
Polar Cataract (PC) It is another classification of cataract which may appear as small, bilateral, and symmetric. It is also classified in to two category. They are anterior and posterior cataract. They are briefly explained as follows. Anterior polar cataract (APC): It is also special type of cataract, and it looks like a small dot in the eye. It may occur as a sporadic finding, in association with the ocular abnormalities as an inherited disorder. Posterior polar cataract (PPC): It is a type of polar cataract and is a subtype of lens opacity. It represents an area of degenerative lens fibers that form an opacity in the central posterior sub-capsular area of the lens. In many of the time, opacity of lens is adherent to the capsule. This type of cataract can be removed using surgery, and it may be complicated in many situations.
2 Computer-Aided Cataract Detection, Classification, and Management DL was deployed using ResNet to shape a three-phase successive artificial intelligence algorithm for detection of cataracts [6]. The AI mechanism distinguishes slit-lamp pictures among mydriatic and non-mydriatic images and between optical components and diffuses slit-lamp visibility into the procedure, cataractous, or postoperative IOL. First, if a cataract is evaluated depending on lens opacities classification system II, the severity and degree of the cataract/posterior capsular opacification will be determined and decided whether to follow up or refer the patient to alternative clinics. The application of retinal imaging in DR screening has been expanded. This prompted researchers to consider using color fundus photographs to construct automatic cataract evaluation technologies, possibly using retinal imaging as a groundbreaking cataract diagnostic method, e.g., 5495 pictures of the fundus [7] have shaped the AI algorithm. The initial extraction function and the random forest network (RF) are machine learning model for predicting cataract presence [8]. This analysis made use of a total of 5409 photographs, but splitting count of training and testing dataset was not defined. A mix of pretrained CNN SVM in AI. On millions of raw non-medical pictures, pretrained CNN was supplemented with 400 fundus images, the “perceptibility” of the fundus images is defined at four stages by ophthalmologists, and the diagnosis is correct. However, DL model based on the deep convoluted neural network was trained with 4004 fundus images, the accuracy obtained was 93.52% for diagenesis and 86.69% for grading [9]. The stage of nuclear cataract from a slit-lamp image is predicted depending on its neighboring labeled images from a list of ranked images, which is calculated using an optimal ranking function. A new
A Comprehensive Study on Computer-Aided …
359
measure for evaluation of ranking is proposed by Huang et al. [10] for studying the optimal ranking function through direct optimization. Pratap and Kokil [11] have proposed an effective computer-aided cataract diagnosis (CACD) technique based on network selection-based robust under additive white Gaussian noise (AWGN) for the cataract classification. The independent support vector networks were locally and globally trained with the extracted features at different levels. According to the input image’s noise level, the appropriate network has been chosen. Pretrained convolutional neural network (CNN) has been developed for an automatic feature extraction process. AWGN technique generated the synthetic noisy fundus retinal image falsely. Imran et al. [12] have been developed a hybrid model by combining the support vector machine (SVM) and deep learning approach for the classification of 4-class cataract. AlexNet, ResNet, and VGGNet are the transfer learning-based models that performed the feature extraction and SVM acted as the classifier. Data augmentation, normalization, green channel extraction, and resizing are the steps used for preprocessing the input images. Youden’s index, area under the curve (AUC), accuracy, precision, specificity, sensitivity, and F1-score are considered for the performance evaluation of this hybrid model. Mandelblum et al. [13] have performed the cataract classification based on the posterior nuclear color. The primary objective of this research was to validate the simplified preoperative nuclear classification score (SPONCS) for the real-world and clinical trials settings. In a surgical setting, the cataract grading score’s reliability was determined based on intra and inter-observer validity for fifteen cataract cases. Petrella et al. [14] designed a novel prototype for the characterization and classification of cataracts. A-scan signals have been acquired by the eye scan ultrasonic system (ESUS) at 20 MHz frequency. According to the signals energy levels analysis, the developed method automatically identified the lens interface. Two modules such as real-time acquisition module and reconfiguration as well as validation module have been performed for the cataract classification. Preclinical data was utilized for the evaluation of cataract classification system. Cao et al. [15] have been proposed an improved Haar wavelet feature extraction technique for the cataract detection. A hierarchical mechanism has been introduced for the transformation of 4-class classification issues to a 2-class classification issues. The automatic grading was performed by the Haar transform to attain the accurate features from the retinal images. Based on the ergodic searching process, the appropriate feature extraction threshold has been obtained using the improved Haar wavelet as well as backpropagation neural network (BP-net). Accuracy, sensitivity, specificity, and kappa coefficient were evaluated for the performance efficiency. Imran et al. [16] introduced a novel hybrid convolutional and recurrent neural network (CRNN) to classify the cataract. The short- and long-term spatial correlation among the patches was preserved by the combined benefits of CRNN. Multilevel feature representation was extracted by the VGGNet, ResNet, GoogLeNet, and AlexNet with coupled transfer learning. Using this technique, the medical resources were preserved and avoided the visual impairment. The implementation
360
B. Saju and R. Rajesh
was performed on real-time fundus images. Sensitivity, specificity, and accuracy have been considered for the performance evaluation. Hu et al. [2] have been evaluated the classification ability of ShuffleNet, deep learning network, and SVM classifier for cataract classification. Gray level cooccurrence matrix (GLCM) was utilized for the grading feature extraction. YOLO v3 model was introduced to enhance the entire stacking technique’s accuracy. Ocular images captured by smart phone-based slit lamp were utilized for the classification of cataract. The accuracy, AUC, specificity, precision, and F1 value were determined to know the efficacy of the classification system. Cao et al. [17] proposed an adaptive enhancement approach for the filtering of retinal images. The rooting and low-pass filtering was developed for the enhancement of retinal structure contrast. The color restoration was performed by the grayscale adjustment method. Moreover, the retinal image’s clarity was improved efficiently by the developed technique without any color difference. Combination score, color difference, and the standard deviation are computed for the performance evaluation. Pratap and Kokil [18] have introduced an effective CACD system for cataract detection. The transfer learning with pretrained CNN was utilized for the automatic classification of cataract. The features are extracted from fundus retinal images using the pretrained CNN, and the extracted features are classified using the SVM classifier. Accuracy and the processing time are evaluated to know the system efficiency. Zhou et al. [19] developed a novel automatic grading and cataract detection approach. The features are extracted from the retinal images using the combination of exponential DST (EDST-MLP) or the multilayer perceptron with discrete state transition (DST-MLP). After the feature extraction, the features were classified by the residual neural networks with EDST (EDST-ResNet) or DST (DST-ResNet). Zhang et al. [20] have introduced a novel six-level cataract grading technique for the multi-feature fusion. High level and low level features are the two features extracted using ResNet18 and GLCM, respectively. These extracted features are utilized for the grading of cataract automatically. The base learners used in this paper were the two SVM classifiers, and the meta-learner used here are fully connected neural networks (FCNN). These operations were useful for the probability output and the classification output. Zhang et al. [21] utilized synthetic minority oversampling technique (SMOT), Apriori algorithm, genetic feature selection, Natıve Bayesian classifier, and random forest for the classification of cataract. SMOTE was used for the preprocessing of images, and it was an over sampling technique that enhanced the quality of images. Then, the feature selection was performed by the Apriori algorithm as well as genetic algorithm to decrease the complexity of classification approach. The pediatric cataract patients’ post-operative complication was predicted by Natıve Bayesian classifier and random forest due to the non-numeric value attributes in the dataset. Cheng [22] have been proposed a novel sparse range-constrained learning (SRCL) algorithm for grading the medical image. The objective of sparse representation findings and image grading was combined as a single function by this novel SRCL method. Based on atoms, the testing image’s sparse representation has been found. This identified sparse representation was the same in both medical grading scores and
A Comprehensive Study on Computer-Aided …
361
the data or feature representation. Cataract grading from slit-lamp lens images and cup-to-disk ratio computation from retinal fundus images are the two applications in which the SRCL was applied. Ozgokce et al. [23] suggested a Lens Opacities Classification (LOCS) for the cataract classification from ultrasound images. Ultrasound elastography and B-mode ultrasound are performed using this LOCS technique to find the efficiency of the cataract detection. The primary objective of this system was to compare the classification of lens density using LOCS II with shear-wave elastography as well as B-mode ultrasound of lens. Real-time dataset was used for the findings. Xiong et al. [24] proposed a morphological technique for the detection and removal of lesions from retinal visible structure segmentation. Moreover, the wrong identification of vitreous opacity as the retinal structure has been avoided. Then, the features from the retinal images were classified by the decision tree algorithm. The retinal images are classified as five categories by this algorithm classification algorithm. Kappa coefficient, accuracy, specificity, F1 score, and sensitivity were evaluated to identify the efficiency of proposed system (Table 1).
3 Gap Identified and Proposed Method Cataract is identified as the main reason for visual impairment in global. In comparing with other eye diseases caused due to age factor such as diabetic retinopathy, agerelated macular degeneration, and glaucoma, the machine learning-based progress in the area of cataract is still not well explored. Considering this, many researches and studies were done on development of algorithms for automated cataract management utilizing of color fundus photographs. Existing works have been done commonly on fundus image datasets for automatic detection of cataract and grading utilizing a predefined feature set. But, the challenge is to detect cataract utilizing the other lens images at an early stage, thus allowing people to test for cataract themselves. Some works are presented or derived an artificial intelligence-based calculation for the identification of cataracts. Along with the advancements for future, cataract-related computer-aided research work is expected to undergo much improvements in the coming future. In machine learning procedures, the image quality is the most important role in identifying the cataracts. Because the quality difference in the images affects the performance of a classifier. Hence, it is very much recommended to use the same image quality level in both the training dataset and the testing dataset. The fundus and slit-lamp images of the eye retina are usually having very low contrast levels, noisy, and most of the time blurred since many delicate blood vessels are present. An appropriate feature extraction stage is necessary because of its great impact on the performance of the classification algorithm. Furthermore, existing classification systems result high executional complexity, and deep learning may be used to build a cataract automatic classification system. There is need of more advancements with the emphasis on extracting the features of blood vessel in retinal image and the cataract
362
B. Saju and R. Rajesh
Table 1 Comparison table for the various methodologies, image type, disadvantages, result, and identified gap Author and year of publication
Methodology used
Imaging type and dataset
Disadvantages
Result
Research gap/problem identified
Imran et al. Combination [12] of SVM and deep learning models have been developed. Transfer learning-based models were used for feature extraction
Retinal Time images and complexity largest was high local retinal dataset
95.65% of accuracy
Only general cataract staging done
Mandelblu et al. [13]
Sponcs validation for cataract classification
Slit-lamp images
Complex to apply in clinical settings, reference photographs required
Found an effective way for grading cataract hardness
Only cataract hardness was classified. Type of cataract was not considered
Pratap and Kokil [11]
AWGN with CNN has been proposed
Fundus retinal images and eye PACS dataset
Computational complexity, high computational cost
Good quality images are obtained based on NIQE score
Quality score calculation is done before classification
Petrella et al. [14]
New prototype was developed for cataract classification
Slit-lamp images and preclinical dataset
Probe eye coupling less efficiency
3 types of cataract were classified with 94% accuracy
All types of cataract classification is missing
Imran et al. Novel hybrid [16] CRNN for effective cataract classification
Fundus images and real-time dataset
Low efficiency, high computational cost
Cataract Only detection detection was is done done with 98% accuracy
Hu et al. [2] GLCM, YOLOv3 model, and SVM classifier have been proposed to detect the cataract
Ocular images and MSLPP dataset
Time complexity, computational complexity, less efficiency
93.5% accuracy
Only nuclear cataract grading is done
(continued)
A Comprehensive Study on Computer-Aided …
363
Table 1 (continued) Author and year of publication
Methodology used
Imaging type and dataset
Disadvantages
Cao et al. [17]
Adaptive enhancement approach to filter the retinal images
Retinal images and real-time dataset
Image handling Improved the Grayscale efficiency was clarity of image adjustment can low be done to improve the quality of retinal image
Pratap and Kokil [18]
Pretrained CNN and SVM classifier for automatic cataract detection
Fundus retinal images and four datasets utilized
Time consumption was high, limited accessibility
92.91% accuracy
Only stage/grade is classified. Type is not identified
Zhou et al. [19]
Novel automatic grading and cataract detection were proposed
Retinal images and real-time dataset
Slow convergence speed, computational complexity
Detected cataract with 94% accuracy
Only detection done
Zhang et al. A six-level [20] cataract grading technique has been developed for the multi-feature fusion
Fundus images and real-time dataset
Less 94.75% scalability, accuracy complex to apply in clinical settings
Only the identification of stage
Zhang et al. SMOTE, [21] Apriori, genetic feature selection, Naïve Bayesian classifier, and random forest were utilized for the cataract prediction
Slit-lamp images and three datasets utilized
Less convergence rate, time complexity
Post-operative complications are predicted pediatric cataract
Cheng [22]
Fundus retinal and slit-lamp images
High CE-Net is expensive, found highest image quality accuracy was low, difficult in clinical settings
New SRCL algorithm has been introduced for grading the medical image
Result
Accuracy was different for different complications
Research gap/problem identified
It is a general paper on optical image segmentations
(continued)
364
B. Saju and R. Rajesh
Table 1 (continued) Author and year of publication
Methodology used
Xiong et al. Morphological [24] technique with decision tree classification approach
Imaging type and dataset
Disadvantages
Result
Retinal images and real-time dataset
High 92.8% complexity, not accuracy fit for clinical settings
Research gap/problem identified Classified according to universal cataract severity
stage classification to reduce the time in processing. There is need to use other improved machine learning algorithms for better accuracy and efficiency. Moreover, there is a huge scope to increase the performance which is the motivation to carry out the present work. Work Plan and Implications The feature extraction stage is very important due to the impact on the performance of the classification system. An effective feature extraction-based methodology is important for the accurate prediction of cataracts. Existing studies on automatic cataract detection and grading depends on slit-lamp images utilize a well-defined set of features of the images that give an imperfect or even noisy representation. Effective feature extraction-based methodology for categorizing the severity of cataract utilizing retinal images is needed. To attain more appropriate features for the automatic categorizing of cataracts, a new set of condensed hybrid features (structural and non-structural features) extracted from the slit-lamp images. The quality of the images is selected by utilizing the hybrid naturalness image quality evaluator and perception-based image quality evaluator (hybrid NIQE-PIQE) approach. Here, the raw input image quality score is evaluated using hybrid NIQEPIQE approach, and based on the quality score value, deep learning convolutional neural network (DCNN) categorizes the images into a low quality, medium quality, and high quality images. The low and medium quality images are again processed to improve the quality. Then, the quality images are again preprocessed to remove the noise present in the images. The original image and the objective function are given as input to LCM. and the output of LCM is given as input to CLAHE. CLAHE will further enhance the image with ASO. The individual green channel (G-channel) is extracted from the selected quality RGB images for noise filtering. Moreover, hybrid modified histogram equalization and homomorphic filtering (Hybrid G-MHE-HF) is utilized for enhanced noise filtering. After the preprocessing of images, features are extracted for cataract prediction. An effective feature extraction-based methodology is very important for the accurate prediction of cataracts. Good feature extraction-based methodology for categorization of the severity of cataract utilizing retinal images is needed. To attain the more appropriate features for the automatic categorization of different cataracts, a new set of condensed hybrid features (structural and non-structural features) extracted from
A Comprehensive Study on Computer-Aided …
365
the slit-lamp images. The structural features extract the features of retinal image mainly include retinal vessels, optic disk, and vitreous opacity. Here, the structural features include disk damage likelihood scale (DDLS) and cup-to-disk ratio (CDR) and the non-structural features which include gray level run length matrix (GLRM), improved gray level co-occurrence matrix (IGLCM), first-order statistical (FoS), higher-order spectra (HOS), higher-order cumulate (HOC), and improved wavelets. Afterward, optimal features are selected for reducing the dimensionality using the modified red deer optimization algorithm. Finally, different types of cataract such as capsular, sub-capsular, cortical, supranuclear, nuclear, polar, and the respective stages of cataracts as mild, normal, moderate, and severe are predicted utilizing the deep capsule auto-encoder (DCAE) framework. The presented computer-aided automatic cataract detection methodology detects the various types and grades of the cataract such as normal, mild, moderate, and severe with the improved accuracy.
4 Conclusion Overall, the studies listed above showed positive results. The research result is significant. However, algorithms are not evaluated in an external set, and thereby, a general overview is drawn in all the mentioned above research. Further research is required to draw the complete picture. In this study, numerous techniques are analyzed for the cataracts classification and staging. Many research works are done in the field of cataracts detection due to the benefits of their applications. There is need to use other improved machine learning algorithms for better accuracy and efficiency.
References 1. Chen CL, McLeod SD, Lietman TM, Shen H, Boscardin WJ, Chang HYP, Whooley MA, Gelb AW, Lee SJ, Dudley RA (2021) Preoperative medical testing and falls in Medicare beneficiaries awaiting cataract surgery. Ophthalmology 128(2):208–215 2. Hu S, Wang X, Wu H, Luan X, Qi P, Lin Y, He X, He W (2020) Unified diagnosis framework for automated nuclear cataract grading based on smartphone slit-lamp images. IEEE Access 8:174169–174178 3. Resnikoff S, Lansingh VC, Washburn L, et al (2019) Estimated number of ophthalmologist’s worldwide (International Council of Ophthalmology Update): will we meet the needs? Br J Ophthalmol (Epub ahead of print) 4. Bailey IL, Bullimore MA, Raasch TW, Taylor HR (1991) Clinical grading and the effects of scaling. Invest Ophthalmol Vis Sci 32:422–432 5. Reddy S (2020) Comparison of vision outcomes between SICS (small-incision cataract surgery) and phacoemulsification in cataract surgery. Asian J Pharmaceutical Clin Res 64–68 6. Wu X, Huang Y, Liu Z et al (2019) Universal artificial intelligence platform for collaborative management of cataracts. Br J Ophthalmol 103:1553–1560 7. Dong Y, Zhang Q, Qiao Z, Yang J (2017) Classification of cataract fundus image based on deep learning In: IEEE international conference on imaging systems and techniques (IST), pp 1–5
366
B. Saju and R. Rajesh
8. Ran J, Niu K, He Z, Zhang H, Song H (2018) Cataract detection and grading based on combination of deep convolutional neural network and random forests. In: 2018 international conference on network infrastructure and digital content (IC-NIDC), pp 155–159 9. Zhang L, Li J, Zhang I, et al (2017) Automatic cataract detection and grading using deep convolutional neural network. In: 2017 IEEE 14th international conference on networking sensing and control (ICNSC), Calabria, pp 60–65 10. Huang W, Li H, Chan KL, Lim JH, Liu J, Wong TY (2009) A computer-aided diagnosis system of nuclear cataract via ranking. In: Yang GZ, Hawkes D, Rueckert D, Noble A, Taylor C (eds) Medical image computing and computer-assisted intervention–MICCAI 2009. Lecture notes in computer science, vol 5762. Springer, Berlin, Heidelberg, https://doi.org/10.1007/978-3-64204271-3_97 11. Pratap T, Kokil P (2021) Efficient network selection for computer-aided cataract diagnosis under noisy environment. Comput Methods Programs Biomed 105927 12. Imran A, Li J, Pei Y, Akhtar F, Yang JJ, Dang Y (2020) Automated identification of cataract severity using retinal fundus images. Comput Methods Biomech Biomed Eng Imaging Vis 8(6):691–698 13. Mandelblum J, Fischer N, Achiron A, Goldberg M, Tuuminen R, Zunz E, Spierer O (2020) A simple pre-operative nuclear classification score (SPONCS) for grading cataract hardness in clinical studies. J Clin Med 9(11):3503 14. Petrella L, Pinto C, Perdigão F, Gomes M, Santos M, Nunes S, Morgado M, Caixinha M, Santos J (2020) A-scan ultrasonic system for real time automatic cataract detection. Heal Technol 10:905–911 15. Cao L, Li H, Zhang Y, Zhang L, Xu L (2020) Hierarchical method for cataract grading based on retinal images using improved Haar wavelet. Inf Fusion 53:196–208 16. Imran A, Li, Pei Y, Akhtar F, Mahmood T, Zhang L (2020) Fundus image-based cataract classification using a hybrid convolutional and recurrent neural network. Vis Comput 1–11 17. Cao L, Li H, Zhang Y (2020) Retinal image enhancement using low-pass filtering and α-rooting. Sign Process 170:107445 18. Pratap T, Kokil P (2019) Computer-aided diagnosis of cataract using deep transfer learning. Biomed Sign Process Control 53:101533 19. Zhou Y, Li G, Li H (2019) Automatic cataract classification using deep neural networks with discrete state transition. IEEE Trans Med Imaging 39(2):436–446 20. Zhang H, Niu K, Xiong Y, Yang W, He Z, Song H (2019) Automatic cataract grading methods based on deep learning. Comput Methods Programs Biomed 182:104978 21. Zhang K, Liu X, Jiang J, Li W, Wang S, Liu L, Zhou X, Wang L (2019) Prediction of postoperative complications of pediatric cataract patients using data mining. J Transl Med 17(1):1–10 22. Cheng J (2018) Sparse range-constrained learning and its application for medical image grading. IEEE Trans Med Imaging 37(12):2729–2738 23. Ozgokce M, Batur M, Alpaslan M, Yavuz A, Batur A, Seven E, Arslan H (2019) A comparative evaluation of cataract classifications based on shear-wave elastography and B-mode ultrasound findings. J Ultrasound 22(4):447–452 24. Xiong L, Li H, Xu L (2017) An approach to evaluate blurriness in retinal images with vitreous opacity for cataract diagnosis. J Healthc Eng
Improving Black Hole Algorithm Performance by Coupling with Genetic Algorithm for Feature Selection Hrushikesh Bhosale, Prasad Ovhal, Aamod Sane, and Jayaraman K. Valadi
Abstract Feature selection is a very important preprocessing step in machine learning tasks. Selecting the most informative features provides several advantages like removing redundancy, picking up important domain features, and improving algorithm performance. Recently, the black hole algorithm mimicking the real-life behavior of stars and black holes was proposed in literature for solving several optimization tasks which includes feature selection. In this novel feature selection algorithm, each star represents a distinct subset, and the black hole represents the subset having the best fitness. The iterative movement of stars toward the black hole facilitates discovering the best subset. In this work, we have presented a hybrid feature selection algorithm coupling the existing binary black hole algorithm with an existing binary genetic algorithm. In this new algorithm, the control switches between the black hole and genetic algorithms. We have introduced the concept of switching probability parameters to facilitate the switching between the black hole and genetic algorithms. Our optimally tuned hybrid algorithm in terms of the switching probability improves the algorithm performance considerably. We have compared the results of the new algorithm with the existing algorithms with the help of nine publicly available benchmarking datasets. The results indicate that the synergistic coupling apart from improving accuracy selects smaller subsets. The coupled algorithm also has been found to have smaller a variance in accuracies. Keywords Feature subset selection · Swarm intelligence · Random forest · Black hole optimization algorithm · Genetic algorithm · Switching probability
H. Bhosale · A. Sane · J. K. Valadi (B) Flame University, Pune 412115, India P. Ovhal Centre for Modeling and Simulation, Savitribai Phule Pune University, Pune 411007, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. Saraswat et al. (eds.), Congress on Intelligent Systems, Lecture Notes on Data Engineering and Communications Technologies 114, https://doi.org/10.1007/978-981-16-9416-5_26
367
368
H. Bhosale et al.
1 Introduction Feature selection is a very important preprocessing step which adds enormous value to the data. It effectively removes noisy features and picks informative domain features which are relevant to prediction tasks. This preprocessing step in predictive analytics tasks finds a top-ranked subset of the dataset which provides multiple advantages. The computations become faster, and prediction accuracy increases with attribute selection. We also get very valuable domain information about attributes that correlate with the current task at hand. Filter, wrapper, and embedded methods are the three groups of feature selection methods. Filter methods are based on statistical algorithms which rank features based on their information content, correlation with the output, and correlation among themselves. Using filter methods, we can rank features without resorting to repeated use of a machine learning algorithm. From the ranked features as per the importance calculated, top k features can be readily selected based on a problem-dependent performance measure. Commonly used filter methods are Fisher score [1], relief [2], and correlation-based methods [3]. Statistical testing methods, for example Chisquare [4] and ANOVA [5], can also be categorized as filter heuristics. Filter methods are simple, easy to implement, faster, and scalable on the larger datasets but do not guarantee best results. Wrapper methods employ a machine learning classifier repeatedly to select the best subset of attributes. Wrapper methods, although timeconsuming, hold advantage over filter methods as they are more accurate. Commonly used wrapper methods are forward selection, backward selection, stepwise selection, etc. Embedded methods have feature selection embedded in the algorithm itself. Lasso [6] and random forest (RF) feature selection [7] methods are notable examples of embedded methods. Embedded methods take less time for computations and have minimal overfitting errors in comparison to wrapper algorithms. Nature-inspired algorithms are attractive alternatives to convention feature selection methods. Different interesting nature-inspired algorithms and computational methods have been developed to solve complex optimization problems. Most of the nature-inspired algorithms begin with randomly created trial solutions. Mimicking certain nature-inspired phenomena, they improve solutions iteratively until convergence. Black hole (BH) is a novel heuristic algorithm inspired by dynamic behavior of naturally occurring BHs in the universe. The algorithm developed by Hatamlou [8] starts with a randomly selected set of initial solutions termed as stars. The best solution at a given iteration is selected as the BH, and other solutions remain as stars. Similar to the naturally occurring BH, the best star which is termed as BH in the feature selection solution space attracts other stars toward it. This amounts to improvement in the fitness of current stars. The fitness in this work is a function of accuracy of a star (representing the chosen subset at any given iteration) which is evaluated by a classifier. Any star which gets too close in the vicinity of BH gets sucked in by BH and loses its identity. This vicinity around BH in heuristics is Schwarzschild radius and termed as event horizon, which means point of no return. Once a star enters the event horizon, it is destroyed by BH. After the destruction of
Improving Black Hole Algorithm Performance …
369
an old star by BH, a new star is randomly generated in the search space. The process is repeated until convergence criteria are satisfied. The principle of the BH heuristics algorithm was first implemented for clustering problems. The proposed modified algorithm is a hybrid algorithm which synergistically combines BH algorithm with genetic algorithm (GA).
2 Related Work Nemati et al. [9] implemented modified BH optimization algorithms for binary search problems and obtained better performance than binary particle swarm optimization [10], binary artificial fish swarm algorithm [11], and GA [12]. Kumar et al. [13] implemented the BH algorithm for various applications such as, cluster analysis, classification problem, and short-term scheduling. Pashaei and Aydin [14] proposed the BH algorithm for selecting a subset of attributes for the classification algorithm. This algorithm also uses a number of variables leading to difficult parameter search problems. Wu et al. [15] implemented the BH algorithm for multi-objective optimization and found that algorithm performance compares favorably against well-known algorithms like SPEA-II, PESA-II, NSGA-II, and MOEA/D. It is evident that the BH algorithm can be used to solve complex optimization problems similar to existing heuristic algorithms like GA [12], particle swarm optimization (PSO) [16], and ant colony optimization (ACO) [17]. Ovhal et al. [18] made modifications by replacing single BH with multiple BH’s and proposed three different algorithms on the basis of selection of second BH. Ovhal and Valadi [19] have coupled the existing BH algorithm with white hole (WH) and implemented it for dynamic optimization of chemically reacting systems. We have compared our proposed work with recently the proposed correlation-based hybrid ACO (HACO) algorithm [20] implemented for feature selection and the forest optimization algorithm (FOA) [21]. A variant of differential evolution also developed with black hole-based approach [22].
3 Algorithm 3.1 Existing BH Algorithm Employed for Feature Selection Prior to discussing the proposed coupled algorithm, for the sake of clarity, it is contingent on providing details of the BH algorithm employed in this study [14]. This algorithm uses BH heuristics with a binary-coded population as shown in Fig. 2; each member of the population is termed as a star. The stars are defined with bits having values of 0 or 1 representing absence or presence of an attribute, respectively, in the dataset as shown in Fig. 1. The number of bits is equal to the number of features in the dataset. The subsets with the selected features are classified using RF
370
H. Bhosale et al.
Fig. 1 Representative star
Fig. 2 Black hole schema
algorithm and are evaluated with accuracy as performance measure. Further, a fitness measure is used to penalize selection of a higher number of features. The best star with maximum fitness from the current generation is called a BH. After this step, the rest of the stars are moved toward the BH. Any star lying in the vicinity of the BH is absorbed by the BH, and another star is randomly created in its place. The new stars with the new locations represent the next generation. Fitness of the stars is evaluated again, and the star with better fitness than the current BH replaces the current BH. The current BH becomes a star. The procedure is repeated for a certain number of generations until convergence. The details of the algorithm are as follows: i.
ii.
Initialize population size, N, and maximum number of iterations. This population is called stars. Each star consists of randomly generated binary bits, of size equal to the total number of features in the dataset. Presence and absence of features are defined by 1 and 0, respectively. Example of a random star is shown in Fig. 1. Essentially, each star in the generated population represents a random subset of attributes of the original dataset. F i in the below example represents ith feature. Loop Employing RF classifier, build the optimal model and evaluate the fitness of all stars using Eq. (2). Accuracy =
Correct Predictions Total Predictions
(1)
Improving Black Hole Algorithm Performance …
Fitness = iii.
371
Accuracy 1 + λ × subset size
(2)
Move the stars toward the BH using Eq. (3). new
old
i
i
X = X +Uniform(0, 1) ∗ d
new old
X,X
BH
(3)
i
Hamming distance is calculated as shown in Eq. (4) d(i, j) =
n−1
y(i,k) = y( j,k)
(4)
k=0
iv.
Employing RF classifier, build the optimal model, and evaluate the fitness of all new stars
If BH fitness < fitness of new star, then BH = new star else Current BH will remain BH v.
Calculate event horizon using Eq. (5) Fitness of Black Hole Event horizon = Fitness of all stars
vi. vii.
(5)
If the hamming’s distance between BH and star < event horizon, then replace the current star with new star in search space Repeat the above procedure from step (ii)–(vi).
3.2 Existing Genetic Algorithm Employed for Feature Algorithm Goldberg and Holland introduced GA [12] with an ultimate idea of developing selfadaptive software. GA is inspired by natural evolution and selection. For feature selection with binary-coded algorithms, a set of trial solutions are initially created with randomly generated bits of 0’s and 1’s. Each trial solution is called a chromosome. The ones representing the feature are selected, whereas zeros represent the absence of features. The total number of bits is equivalent to the total number of features available in the dataset. The first step in GA is to select better performing solutions/chromosomes. For selecting better solutions, a selection algorithm is used. Crossover and mutation are the steps carried out after the selection procedure. The
372
H. Bhosale et al.
fitness is calculated at each step for a certain number of generations until convergence. The steps in the algorithm can be written as follows: i. ii.
iii.
iv.
Define parameters such as population size, total number of generations, crossover probability, and mutation probability. Initialize the population; randomly generate a population of size N. Each member of the population is called a chromosome. Each chromosome consists of randomly generated bits 0 or 1, of size equal to the number of features available in the dataset. A chromosome in GA is similar to that of a star defined in the BH algorithm. Evaluate accuracy and fitness of each chromosome using Eqs. (1) and (2), respectively. We have used tournament selection for selecting better performing chromosomes for each generation. Two chromosomes are selected from the population at random, and the chromosome with best fitness is selected. In this step, chromosomes with higher fitness may get selected more times, whereas the chromosome with the worst fitness will never get selected. Crossover is the next step in GA. We employed a two-point crossover method. Two chromosomes are selected from the population at random. On the basis of crossover probability, this step is carried out as shown in Figs. 3 and 4.
v.
Mutation is the final step in GA. We employed flip bit mutation in our work, in which bits of a chromosome are mutated based on the mutation probability. This process is carried out as shown in Figs. 5 and 6.
vi.
Repeat procedure from step (iii)–(v) for a predefined number of generations.
Fig. 3 Before crossover
Fig. 4 After crossover
Fig. 5 Before mutation
Fig. 6 After mutation
Improving Black Hole Algorithm Performance …
373
3.3 Combined BH-GA Algorithm In our combined algorithm, we initiate the algorithm by running the BH algorithm for an iteration following the method described in Sect. 2.1. After an iteration is over, the stars move toward the current BH employing the distance formula as given in Eq. (4). We now evaluate the fitness of the new stars and switch to GA algorithm with a switching probability SP. To carry out this step, we generate a random number from uniform distribution between zeros to one, and if the random number lies between zero to SP, we switch to GA algorithm. If the random number generated is equal to or more than SP, we continue with the next iteration of the BH algorithm. If the decision based on SP is to switch to GA, we use the locations of the stars as population members of the GA. With this population, we run the GA algorithm for one generation. This includes selection, crossover, and mutation. We evaluate the fitness and merge these new solutions of the GA population and their parent solutions (solution represented by the earlier stars (and BH)). We now select the top N solutions (N represents the size of population in BH or GA). With these top N solutions as stars and BH (BH is the star having the best fitness), we run the BH algorithm for another iteration. The switching between BH and GA is done depending on the switching probability SP. This process is repeated until the maximum number of generations. The algorithm can be described in the following steps: i. ii. iii. iv.
v. vi. vii.
Generate initial BH population/stars as described in step 1 in Sect. 2.1. Let the population size be N. Employing RF classifier, build the optimal model and evaluate the fitness of all stars using Eq. (2). Move the stars toward BH using Eq. (3) and re-evaluate fitness. Generate a random number between zero and one using uniform distribution. If the random number lies between zero and SP, switch to GA with the solutions represented by the stars and BH as chromosomes of GA. Conduct selection, crossover, and mutation as mentioned in Sect. 2.2. Merge GA and BH solutions and evaluate fitness and select top N solutions as stars. Switch back to the BH algorithm and repeat steps (iii)–(vi) until the maximum number of iterations are reached.
4 Experimental Studies and Settings Experiments are performed on publicly available benchmark datasets [23]. Details of the datasets used are mentioned in Table 1. Data transformation is one of the essential step in data mining. Some features from data consist of higher values, whereas others may have lower values. To avoid the effect on the output performance, data transformation is useful. Zero mean-unit variance scaling is used on datasets before any performance measures are calculated.
374
H. Bhosale et al.
Table 1 Dataset description Dataset
No. of features
No. of instances
No. of classes
Biodegradation
42
1055
2
Breast cancer
31
568
2
Cardiotocography
22
2126
3
Dermatology
35
366
6
Heart
14
269
2
Ionosphere
34
350
2
Spambase
58
4601
2
Steel-plates-fault
34
1941
2
Wine
14
177
3
Random forest (RF) algorithm is used to train models on the training datasets. RF is a high-performance classifier with excellent generalization capabilities. RF is a tree-based algorithm which internally builds a number of decision trees with random subsets of features. Each tree in the forest is trained on bootstrapped samples (with replacement) from training data and tested on test examples. The number of trees in a forest is a tunable parameter that varies with the dataset. Along with this, each tree is fed with a random subset of features from the data. This number of random features is also a tunable parameter. The maximum value it can take is the total number of features available in data. We have divided examples of the main dataset randomly into 70% as train data and remaining 30% as test data or unseen data to the training algorithm. The performance is estimated on test data for the sake of comparison. Accuracy and fitness are calculated using Eqs. (1) and (2). To get an unbiased estimate of existing and proposed algorithm parameters, we have used different random seed values to divide the main data into train data and test data. Existing BH algorithms, GA, and the combined algorithm are run repeatedly for five times, and average results are recorded. A feature selection algorithm is said to be effective if it maximizes the performance measure with the smaller number of feature subsets selected. Basically, we have two objectives: to increase the classification accuracy and select a smaller subset. These objectives are achieved using fitness. This fitness shown in Eq. (2) is a function of accuracy shown in Eq. (1) suitably penalized to control the feature subset sizes selected. We have used value for λ as 0.01 for all algorithms for comparison purposes. The number of stars and the number of generations are two important parameters of any evolutionary algorithm. Throughout the experiments, we have used 20 as the population size and 30 number of iterations. After finding the best subset from each algorithm for every dataset, the best tuned parameters of the RF algorithm are used for all algorithms to maximize the performance. For GA, we have used crossover probability as 0.7 and mutation probability as 0.01. Switching probability (SP) is an algorithm parameter that can be tuned. We used a grid-based method to optimize this
Improving Black Hole Algorithm Performance …
375
parameter that provides best model performance. In this study, we have found the values of SP in the range 0.6–0.7 for different datasets to be optimal. We have used an Asus Tough FX505GT laptop with i7 9th Gen processor and 16 GB RAM for all simulations.
5 Results and Discussion In the experiments, we have evaluated the performance of all the algorithms employing the publicly available benchmark datasets mentioned in Table 1 and recorded the best results. We ran the algorithms five different times and averaged the results. Experimental results for the BH algorithm, GA, and coupled BH-GA for feature selection are illustrated in Tables 2 and 3. Table 2 Black hole and genetic algorithm results BH algorithm
Genetic algorithm
Dataset
Subset size
Accuracy
Subset size
Accuracy
Biodegradation
15.2
86.81% ± 0.72%
16.8
86.88% ± 1.39%
Breast cancer
10.2
96.14% ± 0.67%
11.0
95.67% ± 1.06%
Cardiotocography
9.6
93.54% ± 1.20%
11.8
92.60% ± 1.42%
Dermatology
13.8
95.82% ± 1.99%
16.6
95.27% ± 1.75%
Heart
6
76.54% ± 2.62%
6.6
78.77% ± 2.95%
Ionosphere
12.2
94.29% ± 0.95%
12.4
93.71% ± 1.97%
Spambase
20.2
91.85% ± 1.49%
23
92.92% ± 1.07%
Steel-plates-fault
14.8
97.77% ± 1.29%
15.6
98.18% ± 1.47%
Wine
6.8
94.44% ± 1.85%
5.6
97.04% ± 1.01%
Table 3 Coupled BH-GA results Dataset
SP = 0.7
SP = 0.6
Subset size
Accuracy
Subset size
Accuracy
Biodegradation
14.6
86.62% ± 1.25%
12.6
87.13% ± 0.98%
Breast cancer
7.2
95.20% ± 0.96%
7.6
95.56% ± 0.67%
Cardiotocography
9.6
94.39% ± 0.34%
9
93.79% ± 0.83%
Dermatology
13.8
96.73% ± 1.65%
12.6
96.36% ± 0.64%
Heart
4.6
79.51% ± 2.56%
6.2
79.26% ± 2.03%
Ionosphere
9
93.33% ± 0.95%
11.2
94.29% ± 1.17%
Spambase
18.2
92.34% ± 0.81%
17.4
92.09% ± 1.57%
Steel-plates-fault
11.6
98.18% ± 1.97%
11.6
98.56% ± 1.00%
Wine
5
95.56% ± 2.48%
4.8
95.56% ± 1.66%
376
H. Bhosale et al.
i.
Biodegradation: The average subset sizes obtained with the standalone BH and the standalone GA algorithms are 15.2 and 16.8 with average accuracy of 86.81 and 86.88%, respectively, as shown in Table 2. The coupled BH-GA algorithm has selected a smaller averaged subset size of 12.6 and averaged accuracy of 87.13% with lesser variance shown in Table 3. As compared to the HACO algorithm, the coupled algorithm has also selected a much smaller average subset size with better performing accuracy. The coupled algorithm takes 38% more computational time than the standalone BH algorithm and 17% more than the standalone GA. Breast cancer: Effectiveness of the coupled algorithm can be seen in this dataset as well; the proposed algorithm produced an average accuracy of 95.56% with less variance as shown in Table 3 and with an average subset size of 7.6 while the existing standalone BH algorithm produced slightly better average accuracy 96.14% with a larger variance and a larger average subset size as shown in Table 2. The standalone GA has selected 11 features with averaged accuracy of 95.67%. There is a small difference between the accuracy of the coupled algorithm and the HACO. Also, the proposed coupled algorithm has selected a smaller subset size. The proposed algorithm takes 14% more computational time than the standalone BH algorithm and 7% less than the standalone GA. Cardiotocography: For this dataset, the accuracies produced by both the standalone BH algorithm and coupled algorithm are around 93%, but the standalone BH algorithm has much higher variance, whereas the proposed coupled BH-GA algorithm shows much smaller variance. The average subset size selected is also smaller for the proposed algorithm as shown in Tables 2 and 3. Among these three algorithms, the standalone has selected more features with lesser accuracy as shown in Table 2. The coupled algorithm outperforms the HACO algorithm selecting a smaller subset size with higher accuracy. The coupled algorithm takes 31% more computational time than the standalone BH algorithm and 15% more than the standalone GA. Dermatology: There is a small difference between the selected average subset sizes by both the standalone BH algorithm and the coupled BH-GA algorithm. The coupled algorithm shows smaller variance among both with a better average accuracy of 96.36% as shown in Tables 2 and 3. The standalone GA shows better accuracy than the standalone BH algorithm but lesser accuracy than the coupled algorithm. The standalone GA has selected a larger average subset than both the standalone BH and coupled algorithm as shown in Table 2. For this dataset as well, the proposed algorithm performs better than the HACO algorithm both in terms of accuracy and subset size. There is a small difference between computational time for the standalone BH algorithm, the standalone GA, and coupled algorithm. Heart: The coupled algorithm shows a smaller variance among all in terms of average accuracy as shown in Tables 2 and 3. There is a small difference between subset sizes of the standalone BH algorithm and the standalone GA, with GA producing better average accuracy among both as shown in Table 2.
ii.
iii.
iv.
v.
Improving Black Hole Algorithm Performance …
vi.
vii.
Viii.
ix.
377
The coupled algorithm performs better in both average accuracy and average subset size than the standalone GA and the standalone BH algorithms. FOA and HACO algorithms perform better in terms of average accuracy in comparison with coupled algorithm. The proposed algorithm holds the advantage in terms of smaller subset size as compared to both these algorithms. There is a small difference between computational time of the standalone BH algorithm and the coupled algorithm. The standalone GA takes the most computational time among three algorithms. Ionosphere: The subset size selected by the coupled algorithm is much smaller than the standalone BH algorithm and the standalone GA with small difference in average accuracies. The proposed algorithm holds advantage in terms of variance in accuracies as shown in Tables 2 and 3. FOA and HACO algorithms perform slightly better in terms of average accuracy than the proposed algorithm. The proposed algorithm has selected a smaller average subset size than the FOA as well as the HACO. For this dataset as well, there is a smaller difference between computational time of the standalone BH algorithm and the coupled algorithm. The standalone GA takes the most computational time among three algorithms. Spambase: For the spambase dataset, the proposed coupled BH-GA algorithm holds advantage in terms of the subset size selected. The standalone BH algorithm and the standalone GA have selected average subset size as 20.2 and 23 with an average accuracy of 91.85 and 92.92%, respectively, as shown in Table 2, whereas the proposed algorithm has selected average subset size of 17.4 with a better average accuracy than the standalone BH algorithm and slightly lesser average accuracy than the standalone GA as shown in Table 3. For this dataset, the coupled algorithm takes 90% more computational time than the standalone BH algorithm and 70% more than the standalone GA. Steel-plates-fault: For this dataset, the proposed algorithm shows a better average accuracy of 98.56% over the standalone BH algorithm and the standalone GA average accuracy of 97.77 and 98.18%, respectively, shown in Table 2. The coupled algorithm also shows better performance in terms of selection of a smaller averaged subset size of 11.6, whereas the standalone BH algorithm has selected 14.8 features as mentioned Table 2. The standalone GA has selected the highest average subset size among three algorithms. The coupled algorithm performs efficiently in terms of time complexity than the standalone GA and takes slightly less time than the standalone BH algorithm. Wine: The proposed coupled algorithm produced better results with an average accuracy of 95.56% as compared to the average accuracy of 94.44% produced by the standalone BH algorithm mentioned in Tables 2 and 3. The standalone GA performs slightly better than the coupled algorithm in terms of average accuracy. The standalone BH algorithm also shows a higher variance. The coupled algorithm has again selected a smaller average subset size of 4.8 as compared to the average subset size of 6.8 selected by the standalone BH algorithm and 5.6 selected by the standalone GA as mentioned in Tables 2 and 3. The proposed algorithm has selected a slightly smaller
378
x.
H. Bhosale et al.
subset size in comparison with FOA and much smaller subset size than the HACO algorithm. Both FOA and HACO algorithms perform better in terms of average accuracy than the proposed coupled algorithm. The run times are much smaller for the coupled algorithm than both the standalone BH algorithm and the standalone GA. Hybridizing the two algorithms has proven to be useful for enhancing performance [20]. In this work, we found that stochastically switching between BH and GA avoids the BH algorithm getting stuck at local optima. The properly tuned switching probability carries out this task efficiently.
These results illustrate the fact that synergistic coupling of the binary BH algorithm with the binary GA algorithm considerably provides superior feature selection performance. The accuracies are improved along with the selection of a smaller subset size. The algorithm has a lower variance for a majority of datasets which is a very important criteria for a good feature selection algorithm. The increase in computational complexity is compensated by a much-improved algorithm performance. This improvement is due the fact that the coupling the coupling the BH algorithm with the standalone GA helps to break local optima efficiently and widens the search space.
6 Conclusion In this work, we synergistically coupled the binary black hole algorithm with the binary genetic algorithm for improving feature selection capabilities. We employed nine benchmarking datasets publicly available in the literature. Our results indicate that for most of the datasets the coupled algorithm exhibits improved performance in terms of all the metrics. While selecting smaller subsets with higher accuracies, the variance in accuracies is also very minimal. This improved algorithm can be employed for feature selection for carrying out machine learning tasks in many fields of science and engineering.
References 1. Gu Q, Li Z, Han J (2012) Generalized fisher score for feature selection. arXiv preprint: arXiv: 1202.3725 2. Urbanowicz RJ, Meeker M, La Cava W, Olson RS, Moore JH (2018) Relief-based feature selection: introduction and review. J Biomed Inform 85:189–203 3. Michalak K, Kwasnicka H (2010) Correlation based feature selection method. Int J Bio-Inspired Comput 2(5):319–332 4. Jin X, Xu A, Bie R, Guo P (2006) Machine learning techniques and chi-square feature selection for cancer classification using SAGE gene expression profiles. In: International workshop on data mining for biomedical applications. Springer, Berlin, Heidelberg, pp 106–115, Apr 2006 5. Johnson KJ, Synovec RE (2002) Pattern recognition of jet fuels: comprehensive GC× GC with ANOVA-based feature selection and principal component analysis. Chemom Intell Lab Syst 60(1–2):225–237
Improving Black Hole Algorithm Performance …
379
6. Ogutu JO, Schulz-Streeck T, Piepho HP (2012) Genomic selection using regularized linear regression models: ridge regression, lasso, elastic net and their extensions. In: BMC proceedings, vol 6, no 2. BioMed Central, pp 1–6 7. Deng H, Runger G (2012) Feature selection via regularized trees. In: The 2012 international joint conference on neural networks (IJCNN). IEEE, pp 1–8, June 2012 8. Hatamlou A (2013) Black hole: a new heuristic optimization approach for data clustering. Inf Sci 222:175–184 9. Nemati M, Momeni H, Bazrkar N (2013) Binary black holes algorithm. Int J Comput Appl 79(6) 10. Khanesar MA, Teshnehlab M, Shoorehdeli MA (2007) A novel binary particle swarm optimization. In: 2007 Mediterranean conference on control & automation. IEEE, pp 1–6, June 2007 11. Azad MAK, Rocha AMA, Fernandes EM (2014) Improved binary artificial fish swarm algorithm for the 0–1 multidimensional knapsack problems. Swarm Evol Comput 14:66–75 12. Goldberg DE, Holland JH (1988) Genetic algorithms and machine learning 13. Kumar S, Datta D, Singh SK (2015) Black hole algorithm and its applications. In: Computational intelligence applications in modeling and control. Springer, Cham, pp 147–170 14. Pashaei E, Aydin N (2017) Binary black hole algorithm for feature selection and classification on biological data. Appl Soft Comput 56:94–106 15. Wu C, Wu T, Fu K, Zhu Y, Li Y, He W, Tang S (2017) AMOBH: adaptive multiobjective black hole algorithm. Comput Intell Neurosci 16. Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN’95international conference on neural network, vol 4. IEEE, pp 1942–1948, Nov 1995 17. Dorigo M, Di Caro G (1999) Ant colony optimization: a new meta-heuristic. In: Proceedings of the 1999 congress on evolutionary computation-CEC99 (Cat. No. 99TH8406), , vol 2. IEEE, pp 1470–1477, July 1999 18. Ovhal PT, Valadi JK, Sane A (2020) Twin and multiple black holes algorithm for feature selection. In: 2020 IEEE-HYDCO. IEEE, pp 1–6, Sept 2020 19. Ovhal P, Valadi JK (2020) Black hole—white hole algorithm for dynamic optimization of chemically reacting systems. In: Congress on intelligent systems. Springer, Singapore, pp 535–546, Sept 2020 20. Joshi T, Lahorkar A, Tikhe G, Bhosale H, Sane A, Valadi JK (2021) An improved ant colony optimization with correlation and Gini importance for feature selection. In: Communication and intelligent systems. Springer, Singapore, pp 629–641 21. Ghaemi M, Feizi-Derakhshi MR (2016) Feature selection using forest optimization algorithm. Pattern Recogn 60:121–129 22. Sharma P, Sharma H, Kumar S, Sharma K (2019) Black-hole gbest differential evolution algorithm for solving robot path planning problem. In: Harmony search and nature inspired optimization algorithms. Springer, Singapore, pp 1009–1022 23. Dataset resources: UCI machine learning repository https://archive.ics.uci.edu/ml/index.php
Brain Tumor Analysis and Reconstruction Using Machine Learning Priyanka Sharma, Dinesh Goyal, and Neeraj Tiwari
Abstract The enormous success of image recognition machine training algorithms in recent years is intersected with a period when electronic medical records and diagnostic imaging have been used substantially. This chapter presents the machine learning techniques for medical image analysis, which concentrate on convolutionary neural networks and highlight clinical features. Due to its record-breaking performance, deep education has lately become a solution for quantitative analytics. However, the examination of medical images is unique. The brain tumors are the most prevalent and aggressive illness that led at their greatest grade to extremely short life expectancy. MRI pictures are utilized in this project to diagnose brain tumor. But at a given moment the enormous number of data provided by the MRI scanning thwarts tumor versus non-tumor manual categorization. Automatic brain tumor identification by applying the CNN classification is suggested in this paper. The deeper design of the architecture is achieved with smaller kernels. The neuron’s weight is considered tiny. Experimental findings reveal that the 97.5% precision CNN archive rate with little complexity compared to the whole state-of-the-art methodology. Keywords Deep learning · Medical images · Convolutional neural network · Image reconstruction
1 Introduction Medical pictures are one of physicians’ most essential tools for diagnosing brain tumors. A tool with great precision might be incredibly beneficial to automate this operation. However, the technology cannot replace the professional judgments of experienced doctors because of legal liability difficulties [1]. P. Sharma (B) · N. Tiwari Poornima University, Jaipur, India e-mail: [email protected] D. Goyal Poornima Institute of Engineering and Technology, Jaipur, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. Saraswat et al. (eds.), Congress on Intelligent Systems, Lecture Notes on Data Engineering and Communications Technologies 114, https://doi.org/10.1007/978-981-16-9416-5_27
381
382
P. Sharma et al.
Medical methods of imaging such as CT, MRI, and X-ray have been utilized in medicine for the purpose of visualizing body extremities, organs, and other tissues. However, pictures obtained using these modes of imagery may suffer from a low SNR ratio and a low CNR ratio with visual artifacts. In order to overcome these obstacles and to increase picture quality, the reconstruction methods of pictures have been created to better grasp visual perception and analysis [2]. Deep learning methods, such as radionics, computer-assisted detection and diagnostics and medical imagery analysis, were employed effectively in medical imaging. In recent times, machine learning has experienced rapid breakthroughs that have led to a large number of industrial, academic, and cultural interests. These are driven by counterfeit neural organization progresses, for the most part alluded to as profound learning, a mix of techniques and calculations which permit PCs to distinguish complex examples in huge informational collections [3]. The advancement is expanding accessibility to easy to understand (‘huge information’) information programming structures, and a blast of the accessible PC power, taking into consideration the profundities of neural organizations. The latest approach to a broad range of computer vision, language modeling, and robot challenges nowadays constitutes these models. During machine study, strategies are developed and studied that enable computers to solve issues via experience learning. The objective is to build mathematical models that give meaningful results when input data are given [4]. Machine learning models give preparing information experience and are custom-made to convey exact expectations of the preparation information utilizing an enhancement interaction. The major objective of the models is to increase their knowledge and to provide precise forecasts of fresh facts that are not visible. The computer vision field was used for the analysis of the unique medical picture and various kinds of deep learning approach (DLO). Examples of supervised DL techniques include recurrent networks (RNNs) and convolutionary neural networks. Uncontrolled algorithms for the learning of medical images have been investigated, including deep belief networks (DBNs), Boltzmann restricted machines (RBMs), and autoencoder networks (GANs). DLA is often relevant to an anomaly detection and classification of a certain illness kind [5]. Convolutionary neural networks (CNNs) are appropriate for grouping, division, object acknowledgment, enlistment, and different assignments when DLA is used for medical pictures. CNN is a framework for the artificial neural visual network used in the identification of medical picture patterns based on convolution procedure. Medical pictures are shown in the profound learning (DL) applications Fig. 1 [6].
2 Literature Review Magadza and Viriri [7] Quantitative brain tumor analysis gives vital information for improved understanding and treatment planning for tumor features. The precise segmentation of lesions needs over one picture with different contrasts. Consequently,
Brain Tumor Analysis and Reconstruction …
383
Fig. 1 a X-ray image with pulmonary masses b CT image with lung nodule c digitized histopathological tissue image
a manual segmentation is not practicable for further investigations, which is probably the most accurate segmentation approach. Due to its record-breaking performance, deep education has lately become a solution for quantitative analytics. However, the examination of medical images is unique. This study gives a review of the latest profound learning calculations for the division of cerebrum tumors and shows clearly the building elements of these approaches and different tactics. Finally, we examine the open issues in analysis of medical images. Muhammad et al. [8] Brain tumor is a serious malignancy for individuals of any age and is a challenge for health monitoring and automated diagnostics radiologists in grade detection. In the brain tumor classification (BTC) literature several strategies based on profound education have been introduced recently to help radiologists do studies. In this, general gives an in-depth examination of previous and current reviews of BTC’s deep learning algorithms. Moreover, this overview provides a description of various BTC appraisal benchmarks. This audit not just glances at the past writing regarding the matter, yet in addition makes strides toward the future of this field and identifies a few research avenues that must be taken in the future, particularly for custom and smart health care. Razzak et al. [9] A challenging, hard, and time-consuming process is to manually partition the brain tumors for diagnostic MRI imaging. The precise and strong segmentation of brain tumors is therefore critical for diagnostic, therapeutic, and therapeutic results assessments. In order to prevent instabilities and overfit parameter sharing, the model implements identicalness in the CNN two-way model. At long last, we incorporate the course engineering into a two-route gathering of CNNs, where an essential CNN yield is treatment and linked in the last layer as an additional source. The Imps 2013 and BRATS 2015 informational collections showed that incorporating the CNN bunch in a two-path configuration improves by and large execution contrasted with the best in class as of now distributed while alluring computational intricacy remains. Havaei et al. [10] We describe in this research a completely automated segmentation solution for brain tumors based on deep neural networks (DNNs). The
384
P. Sharma et al.
networks described are customized to the low- and high-level glioblastomas seen in MR imaging. Our CNN concurrently leverages both neighborhood and worldwide relevant qualities. Moreover, our organizations utilize a last layer which is a coevolutionary rendition of a completely connected layer, distinct from most standard applications of CNNs and allows for a speed of 40 times. We also offer a two-phase training technique to address problems associated with tumor label imbalance. We also study a cascade design yield of a principal CNN is considered data hotspot for a later CNN. Results from the 2013 Whelps test information arrangement show that our engineering is up to multiple times faster contrasted with the present status of the workmanship discharge. He et al. [11] It is harder to train deeper neural networks. We provide a residual learning architecture to facilitate network training that is far deeper than previously employed. We analyze residual nets in the ImageNet dataset with depths of up to 152 layers—8x more profound than VGG organizations, yet at the same time less muddled. In the ImageNet test set, a group of these lingering networks accomplish 3.57% mistakes. This result was first on the classification challenge of the ILSVRC 2015. CIFAR-10 with 100 and 1000 layers will also be studied. For many visual identification tasks, the depth of representation is crucial. We achieve a relative improvement on the COCO object identification dataset of 28% alone because of our extraordinarily deep representation. Our entries for the ILSVRC & COCO 2015 competition were based on the depth of the residual networks, where we earned the first placed in the ImageNet detection, ImageNet location, and COCO segmenting tasks.
3 Proposed System Figure 2 shows the block scheme of the categorization of brain tumors based on the neural network of convolution. The categorization of brain tumors based on the CNN is separated into two steps, including training and testing. The number of pictures is separated into many categories by the use of names like the tumor picture and the brain picture of non-tumor. Preprocessing, feature exaction, and loss function classification are conducted to create the prediction model during the training phase [12]. At first, the training picture is labeled. Resizing is done to adjust the picture size in the preprocessing image.
4 Convolution Neural Network For processing 2D and 3D pictures, the CNN proved effective. The majority of CNN algorithms are taught using an approach based on gradients. Compared to the other neural network designs, the number of parameters to be modified is less. There are
Brain Tumor Analysis and Reconstruction …
385
Fig. 2 Block diagram of proposed brain tumor classification using CNN
extractors and classification elements inside CNN architecture. The function extraction layer receives the past layer info and feeds into the following layer [13]. There are three kinds of layers in CNN design: convolution, max-pooling, and arrangement. The convolution layers are equivalent numbers, and curious numbers are illustrative of the greatest pooling layers. The classification layer is a totally connected layer and the last design level. In the arrangement stage, back engendering engineering is utilized for improved exactness. There are many forms of pooling: max-pooling, minimal pooling, average pooling, and average global pooling. Feature maps are created at the convolution layer by converting inputs to kernels that include the linear or nonlinear enhancement function. Examples include sigmoid, hyperbolic tangent, softmax, linear corrected, and identified functions [14]. The pooling layer is sometimes called the sub-sampling layer, and sampling occurs here. In certain circumstances, two to four layers of LeNet, AlexNet, and VGGNet are identified, and the quantity of order layers is subject to application. The neuronal architecture of the neural network is shown in Fig. 3.
386
P. Sharma et al.
Fig. 3 Convolution neural network architecture
5 Results and Discussion 5.1 Image Acquisition In the suggested method, MRI scanning is employed for image capture and these scanned pictures are displayed with pixels as components, in a two-dimensional matrix. The matrices are determined by the matrix size and the matrix field of view. The pictures are kept in the MATLAB and shown in grayscale 256 * 256 [15]. This gray picture is inputted between 0 and 255, 0 shows pure color black, while 255 shows pure color white. The pixel outputs between the ranges fluctuate in intensity between black and white. This research is examined by 30 male and 30 female patients and was carried out at age 20–60. In the database, such patients were processed using MRI scans in the .jpeg image formats. With the help of these MRIs, the tumor pictures are produced and the scanned pictures have been exhibited in the 2D matrix that examines the principal components of these pixels. These matrices are depending on the field of vision and their size [16] (Fig. 4).
Brain Tumor Analysis and Reconstruction …
387
Fig. 4 Images of all kinds of tumors from the dataset
Table 1 Values obtained for benign tumor
Mean
0.0034
Skewness
0.6318
Standard deviation
0.0897
IDM
0.3816
Entropy
2.9949
Contrast
0.2433
RMS
0.0898
Correlation
0.1294
Variance
0.0081
Energy
0.7606
Smoothness
0.9270
Homogeneity
0.9344
Kurtosis
7.6801
Accuracy
95.6
The details of pictures are included in the grayscale MATLAB of 256 * 256 to reserve. The pixel intensity of these gray pictures ranges from two to four layers of LeNet, AlexNet, and VGGNet are identified, and the quantity of order layers is subject to application black to white [17]. The research consisted of ten men and 20 women, aged between 20 and 60 years. The results of the tests in JPEG format have been maintained (Tables 1 and 2). Table 2 Values obtained for malignant tumor
Mean
0.0062
Skewness
1.0883
Standard deviation
0.0896
IDM
0.4443
Entropy
2.9949
Contrast
0.3012
RMS
0.0898
Correlation
0.1345
Variance
0.0080
Energy
0.7520
Smoothness
0.9585
Homogeneity
0.9294
Kurtosis
10.0334
Accuracy
95.6
388
P. Sharma et al.
5.2 Image Reconstruction CNN and RNN-based image reconstruction research, pioneered by Yang at the 2016 NIPS and Wang at the 2016 ISBI is fast-growing [18]. Recent applications include, for example, convolutionary recurring neural networks for the reconstruction of dynamic MR images, the reconstruction of high-quality MR cardiac pictures from exceptionally inspected and packed, unique X-ray recreation calculations in PC intricacy, reconstructive exactness to improve the variable occurrences seen by the arrange and decrease overfits, Schemer built a profound course of numerous CNNs for dynamic MR picture recreations utilizing both inflexible and flexible information increase [19]. Chen allowed the reconstruction of images in real time (200 ms per section) and overperforms traditional parallel images and compressed sensing recreation utilizing variation networks for single-shot quick force X-rays with variable thickness examining [20]. The writers explored the transmission of learning (pretrained models) and analyzed the general use of k-space information from knee X-ray records of patients and k-space information produced from Berkeley Portion Dati Set and Benchm pictures according to picture contrast, SNR, inspecting example and picture continue [21]. A new compressed sensing framework has been constructed with the use of low-square generative adverse effect networks, GANs, which learn textural information and eliminate high-frequency noise (30 ms). The automated transform by multiple approximations (AUTOMAP) is a standard image reconstruction framework. Generally, it consists of a feed forward. A deep neural network with completely integrated layers is followed by a sparse auto-encoder. Image reconstruction as a data-driven supervised learning task that generates a mapping from the sensor to the image domain based on an approximation [22]. Figure 4 Deep study inside from image capture to image processing, the MR signal processing chain reconstruction (in a complex-valued k area), image restoration (e.g., denoise), and image logging. The column on the image on the right depicts multimodal brain MRI co-registration. sMRI, fMRI = functional BOLD MRI (in red) [23] (Fig. 5) Figure 6 shows profound learning in brain for MR image analysis. In MR image reconstruction, there are many additional methodologies and reports on deep learning, a fundamental subject that quickly advances [24].
Fig. 5 Deep learning in the MR signal processing chain
Brain Tumor Analysis and Reconstruction …
389
Fig. 6 Deep learning for MR image analysis in brain
Various modalities can be used to image the brain which is the most commonly used imaging modality for medical brain imaging; it has several advantages over other imaging modalities, including being safer and providing more information, as demonstrated in [25]. Figure 5 depicts examples of the four MRI modalities extracted from the BRATS 2013 database brain. However, it is expensive and not suitable for individuals with claustrophobia [26]. A group of 2D images acquired during MRI image acquisition can represent a 3D brain volume. T1 images can reveal healthy tissues. The differences in the images produced by these MRI modalities can be used to generate various types of contrast images [27].
5.3 Deep Learning Model for Binary Segmentation Deep learning is the learning of the neural network that contains numerous hidden layers, and unrestricted parameters, convolutions layers (cores), pooling layers, fully connected (FRC), layers are passed on to each MRI-input picture and a softmax function is eventually employed for decision-making. Classical learning and in-depth learning are methods that make deep learning more powerful than typical learning machine approaches. However, there are many features. Deep learning. The features include several free parameters that provide a solution of any relationship; concealed layers’ functions; and complicated layers’ connections [28]. Furthermore, the structure and the nature of each layer might accomplish distinct objectives. Despite the benefits of deep learning, there are typical downsides, such as the necessity to construct a complicated architecture with hidden layers, expensive computer costs, and a huge quantity of data necessary to achieve the necessary performance during the training process [29]. In addition, these drawbacks help increase training time.
390
P. Sharma et al.
There are numerous forms of deep learning network structures and nomenclature, such as the convolutions of the neural (CNN), deep residue (DRN) networks, deep feed forward networks, and deep inverse graphics networks (DN). In the realm of image treatment, CNN architecture is the most frequent. Its structure includes most of the input layer; extracting layers with convolutionary layers; activation layer of rectified linear unit (ReLU), grouping layers, and classification layers [30]. Deep learning is the free parameters. Deep learning. A number of convolution layers (cores), pooling layers, fully connected (FRC) layers are transmitted to every MRI-input image, in contrast to the usually multilayer neural network. A softmax function is ultimately used to decide. However, classical and in-depth learning are strategies that increase the strength of deep learning above traditional approaches to machines of learning. There are nevertheless numerous characteristics. Deep study. The characteristics include various free parameters that solve any connections; hidden layer functions; and complex layer connections [28]. In addition, the structure and the character of each layer might achieve different goals. Despite the benefits of deep learning, there are some common drawbacks; as the need to build a difficult, hiddenlayered architecture, exorbitant processing expenditures, and an enormous amount of data needed during the training process [29]. Moreover, such inconveniences increase the time for training. Numerous varieties of deep learning and naming network topologies exist, such as neural (CNN) convolutions, deep residue (DRNs), deep feed forward networks, and deep reverse graphic network (DN). CNN architecture is the most prevalent in the field of image processing. Its structure contains mainly the input layer, the extraction of co-layers with co-ordinate layers, the activation layer of rectified linear unit (ReLU). Apart from machine learning and deep learning-based model, some algorithms also were used for segmentation like Sharma et al. Firefly algorithm [31–33], Gupta et al. used statistical approach [34, 35] (Fig. 7). For both assignments, the model is taught in full and inadequately noted photographs (segmentation and classification). It is possible to differentiate three types of training images. The images with a tumor and the segmentation of the soil are the costliest. The second group is clear of tumors; therefore, no pixel of the tumor is associated with either. In this circumstance, the division of the soil truth is
Fig. 7 Architecture of our model for binary segmentation
Brain Tumor Analysis and Reconstruction …
391
Table 3 Segmentation accuracy of FCM-based algorithms Images
FCM
FCM_S
EnFCM
Im. 1
87.14
91.45
94.45
Im. 2
87.78
93.43
96.34
Im. 3
89.28
90.21
95.39
Im. 4
87.89
93.49
95.17
Im. 5
88.91
94.56
94.47
Table 4 Normalized GLCM feature Images
STD
Mean
Entropy
Homogeneity
Skewness
Kurtosis
Energy
Im-1
0.1839
0.0086
0.0711
0.3405
0.5757
0.2406
0.0621
Im-2
0.1112
0.0169
0.1235
0.2311
0.4411
0.2194
0.1224
Im-3
0.1292
0.0233
0.1593
0.2422
0.3448
0.8614
0.1687
Im-4
0.1211
0.0211
0.1416
0.3418
0.5282
0.4641
0.1454
Im-5
0.1808
0.0084
0.0701
0.2113
0.0826
0.0704
0.0611
just the zero matrix. Thirdly, the image is marked with a malignancy, although not as segmented. The improved fugitive C means exhibit higher visual segmentation results compared to previous algorithms, and the segmentation precision of some of the images given in Table 1 is achieved (Table 3). The EnFCM method for each picked picture is shown in Table 2 to give a higher segmentation accuracy. For example, 87.78% and 93.43% of the chosen image-2 are FCM and FCM S. The EnFCM is 96.34% enhanced in segmentation. The findings of the EnFCM are found to be more accurate than the FCM and FCM S in each picture case (Table 4).
6 Conclusion In this work, we divided brain tissues into normal tissues including tumor-infested tissues utilizing MR pictures of the brain. In illness diagnostic and treatment applications, machine learning algorithms play an unavoidable role. Deep study architectures acquire considerable significance because of their better characteristics from different ways of machine learning and are commonly utilized to segment and classify tumors. The major objective of this study is to develop a highly accurate, performance, and complicated automated brain tumor categorization. In the traditional brain tumor classification, segmentation, texture, and form extraction based on the fuzzy C means (FCM) are conducted as well as SVM and DNN bases are classified. There is little intricacy. But in the meanwhile, the computation time is considerable. A convolution-based neural network classification is implemented in the suggested
392
P. Sharma et al.
approach in order to increase its precision and minimize calculation times. The data are also reported as tumor or normal brain pictures. CNN is one of the techniques of profound learning containing a succession of forward feeds. Also, for implementation python language is utilized. For categorization, the ImageNet database is employed. It is one of the models before training. Thus, the workout is done for the last layer alone. In addition, CNN extracts the raw pixel value with depth, width, and height. Finally, the reasonable loss-based function gradient is used for high precision. The accuracy of training, validation precision, and loss of validation are determined. The accuracy of the workout is 97.5%. Similarly, accuracy of validation is excellent and loss of validation is low.
References 1. Amin J, Sharif M, Raza M, Yasmin M (2018) Detection of brain tumor based on features fusion and machine learning. J Ambient Intell Humanized Comput 1–17 2. Jain VK, Kumar S, Fernandes SL (2017) Extraction of emotions from multilingual text using intelligent text processing and computational linguistics. J Comput Sci 21:316–326 3. Shah JH, Chen Z, Sharif M, Yasmin M, Fernandes SL (2017) A novel biomechanics-based approach for person re-identification by generating dense color sift salience features. J Mech Med Biol 17:1740011 4. Liu J, Pan Y, Li M, Chen Z, Tang L, Lu C, Wang J (2018) Applications of deep learning to MRI images: a survey. Big Data Mining Anal 1:1–18 5. Masood S, Sharif M, Raza M, Yasmin M, Iqbal M, Younus Javed M (2015) Glaucoma disease: a survey. Curr Med Imaging Rev 11:272–283 6. Sharif M, Tanvir U, Munir EU, M. Khan A, Yasmin M (2018) Brain tumor segmentation and classification by improved binomial thresholding and multi-features selection. J Ambient Intell Humanized Comput 1–20 7. Magadza T, Viriri S (2021) Deep learning for brain tumor segmentation: a survey of state-ofthe-art. J Imaging 7:19. https://doi.org/10.3390/jimaging7020019 8. Muhammad K, Khan S, Ser JD, de Albuquerque VHC (2020) Deep learning for multigrade brain tumor classification in smart healthcare systems: a prospective survey. IEEE Trans Neural Netw Learn Syst 1–16 9. Razzak MI, Imran M, Xu G (2019) Efficient Brain tumor segmentation with multiscale twopathway-group conventional neural networks. IEEE J Biomed Health Inform 23:1911–1919 10. Havaei M, Davy A, Warde-Farley D, Biard A, Courville A, Bengio Y, Pal C, Jodoin PM, Larochelle H (2017) Brain tumor segmentation with deep neural networks. Med Image Anal 35:18–31 11. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778 12. Bakas S, Zeng K, Sotiras A, Rathore S, Akbari H, Gaonkar B, Rozycki M, Pati S, Davazikos C (2015) Segmentation of gliomas in multimodal magnetic resonance imaging volumes based on a hybrid generative-discriminative framework. In: Proceeding of the multimodal brain tumor image segmentation challenge, pp 5–12 13. Dvorak P, Menze B (2015) Structured prediction with convolutional neural networks for multimodal brain tumor segmentation. In: Proceeding of the multimodal brain tumor image segmentation challenge, pp 13–24 14. Chang J, Zhang L, Gu N et al (2019) A mix-pooling CNN architecture with FCRF for brain tumor segmentation. J Visual Comm Image Represent 58:316–322
Brain Tumor Analysis and Reconstruction …
393
15. Xiao Z, Huang R, Ding Y, Lan T, Dong R, Qin Z, Zhang X, Wang W (2016) A deep learning based segmentation method for brain tumor in MR images. In: 2016 IEEE 6th international conference on computational advances in bio and medical sciences (ICCABS) (IEEE 2016), pp 1–6 16. Rezaei M, Harmuth K, Gierke W, Kellermeier T, Fischer M, Yang H, Meinel C (2017) A conditional adversarial network for semantic segmentation of brain tumor. In: International MICCAI Brainlesion Workshop. Springer, Cham, pp 241–252 17. Alom MZ, Taha TM, Yakopcic C, Westberg S, Sidike P, Nasrin MS, Hasan M, Van Essen BC, Awwal AA, Asari VK (2019) A state-of-the-art survey on deep learning theory and architectures. Electronics 8(3):292 18. Pereira S et al (2016) Brain tumor segmentation using convolutional neural networks in MRI images. IEEE Trans Med Imaging 19. Yoo Y, Tang LYW, Brosch T, Li DKB, Kolind S, Vavasour I et al (2018) Deep learning of joint myelin and T1w MRI features in normal-appearing brain tissue to distinguish between multiple sclerosis patients and healthy controls. NeuroImage Clin 17:169–178 20. Wang X, Yang W, Weinreb J, Han J, Li Q, Kong X et al (2017) Searching for prostate cancer by fully automated magnetic resonance imaging classification: deep learning versus non-deep learning. Sci Rep 7:15415 21. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 234–241 22. Dong H, Yang G, Liu F, Mo Y, Guo Y (2017) Automatic brain tumor detection and segmentation using U-net based fully convolutional networks. In: Annual conference on medical image understanding and analysis, pp 506–517 23. Sharif M, Amin J, Raza M, Yasmin M, Satapathy SC (2019) An integrated design of particle swarm optimization (PSO) with fusion of features for detection of brain tumor. Pattern Recogn Lett 24. Zhao X, Wu Y, Song G, Li Z, Zhang Y, Fan Y (2018) A deep learning model integrating FCNNs and CRFs for brain tumor segmentation. Med Image Anal 43:98–111 25. Gao Y, Phillips JM, Zheng Y, Min R, Fletcher PT, Gerig G (2018) Fully convolutional structured LSTM networks for joint 4D medical image segmentation. In: 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018), pp 1104–1108 26. Nema S, Dudhane A, Murala S, Naidu S (2020) RescueNet: An unpaired GAN for brain tumor segmentation. Biomed Sig Process Control 55:101641 27. Pan Y, Huang W, Lin Z, Zhu W, Zhou J, Wong J, Ding Z (2015) Brain tumor grading based on neural networks and convolutional neural networks. In: 2015 37th annual international conference of the IEEE engineering in medicine and biology society (EMBC), pp 699–702 28. Kistler M, Bonaretti S, Pfahrer M, Niklaus R, Büchler P (2013) The virtual skeleton database: an open access repository for biomedical research and collaboration. J Med Internet Res 15:1–14 29. Bakas S, Reyes M, Jakab A, Bauer S, Rempfler M, Crimi A, Shinohara RT, Berger C, Ha SM, Rozycki M (2018) Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the BRATS challenge. Comput Vis Pattern Recogn 1–49 30. Saba T, Mohamed AS, El-Affendi M, Amin J, Sharif M (2020) Brain tumor detection using fusion of hand crafted and deep learning features. Cogn Syst Res 59:221–230 31. Sharma A, Chaturvedi R, Dwivedi UK, Kumar S, Reddy S (2018) Firefly algorithm based effective gray scale image segmentation using multilevel thresholding and entropy function. Int J Pure Appl Math 118(5):437–443 32. Sharma A, Chaturvedi R, Kumar S, Dwivedi UK (2020) Multi-level image thresholding based on Kapur and Tsallis entropy using firefly algorithm. J Interdisc Math 23(2):563–571 33. Sharma A, Chaturvedi R, Dwivedi U, Kumar S (2021) Multi-level image segmentation of color images using opposition based improved firefly algorithm. Recent Adv Comput Sci Commun (Formerly: Recent Patents on Computer Science) 14(2):521–539
394
P. Sharma et al.
34. Gupta R, Kumar S, Yadav P, Shrivastava S (2018) Identification of age, gender, & race SMT (scare, marks, tattoos) from unconstrained facial images using statistical techniques. In: 2018 international conference on smart computing and electronic enterprise (ICSCEE), July 2018. IEEE, pp 1–8 35. Gupta R, Yadav P, Kumar S (2017) Race identification from facial images using statistical techniques. J Stat Manag Syst 20(4):723–730
Ordered Ensemble Classifier Chain for Image and Emotion Classification Puneet Himthani, Puneet Gurbani, Kapil Dev Raghuwanshi, Gopal Patidar, and Nitin Kumar Mishra
Abstract Ensemble techniques play a significant role in the enhancement of machine learning models; hence, they are highly applicable in multi-label classification, a more complex form of classification compared to binary or multi-class classification. Classifier chain is the most prevalent and oldest technique that utilizes correlation among labels for solving multi-label classification problems. The ordering of class labels plays a significant role in the performance of the classifier chain; however, deciding the order is a challenging task. A more recent method, ensemble of classifier chains (ECC) solves this problem by using multiple CC’s with a different random order of labels for each CC as the base classifier. However, it requires at least ten CCs, and it is computationally expensive. Improving the prediction accuracy with less than ten CCs is a challenging task that this paper addresses and proposes a classifier chain’s ensemble model termed Ecc_Wt_Rase. It uses a weighted ensemble of only four classifier chains. The performance of Ecc_Wt_Rase is compared with the traditional CC and ECC over three standard multi-label datasets, belonging to image and emotion (music) domains using four performance parameters. On the one hand, Ecc_Wt_Rase reduces the computational cost and, on the other hand, improves the classification accuracy. The improvement in Hamming loss is approximately 6%, which is exceptional for multi-label classification; the training time is also reduced by approximately 40%, as the number of CCs in the proposed model is four, compared to ten in traditional ECC. Keywords Multi-label classification · Ensemble model · Classifier chain
P. Himthani (B) · P. Gurbani · K. Dev Raghuwanshi · G. Patidar Department of CSE, TIEIT, Bhopal, MP, India N. Kumar Mishra ABV-IIITM, Gwalior, MP, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. Saraswat et al. (eds.), Congress on Intelligent Systems, Lecture Notes on Data Engineering and Communications Technologies 114, https://doi.org/10.1007/978-981-16-9416-5_28
395
396
P. Himthani et al.
1 Introduction The problem of multi-label classification (MLC) is quite common in many areas. These areas include text categorization where a document can belong to multiple classes or labels simultaneously, sentiment or emotion analysis and scene analysis where a scenic image can be associated with multiple entities present in it [1]. MLC problems also exist in areas like classification of proteins, music categorization and classification of content in e-learning environments. All these areas involve the classification of an instance to multiple classes simultaneously. For example, an image of a drawing room can be associated with a table, a sofa set, a television and a flower vase [2]. MLC is a more complex form of classification problem, compared to binary classification or multi-class classification (MCC). Classification is a problem based on the principle of supervised learning where labeled datasets are used for training the classifiers. Label datasets contain information about the outcomes of the target attribute based on features or independent attributes. In contrast, clustering is based on unsupervised learning where unlabeled datasets are used for training the classifiers, which do not have any information about the target attribute. Machine learning techniques typically fall into two broad categories, namely supervised and unsupervised [3]. In traditional binary classification, an instance can either belong to a class under consideration or not. MCC is a more generalized form of classification, where the number of classes under consideration is more than 2(m > 2). In MCC, an instance can belong to any one of the m classes. MLC is more complicated, compared to MCC or binary classification. In MLC problems, an instance can belong to one, two or more class labels simultaneously; i.e., there can be t class labels (t ≤ m), associated with an instance at the same time. MLC is a more complex form of classification problem, compared to binary classification or multi-class classification (MCC) [4]. MLC techniques are classified as first-order, second-order and higher-order strategies, based on the amount of label dependence or correlation exploited by the MLC classifier. The first-order strategy does not consider any correlation or dependence among class labels. The second-order strategy undertakes pair-wise class dependence among class labels. The higher-order strategy considers dependence among all classes while predicting the possible set of class labels to which an instance belongs [3]. MLC techniques are also categorized into algorithm adaptation (AA) approaches and problem transformation (PT) approaches. AA approaches tend to enhance the capabilities of binary classifiers for predicting multi-label instances. These include ML-KNN, ML-SVM, ML-AdaBoost, etc. PT approaches tend to break down the MLC problem into multiple binary classification problems or MCC problems. These include binary relevance, classifier chain, RAKEL, etc. [5]. This paper proposes an ensemble technique for MLC based on classifier chain. Ensemble classifier chain (ECC) enhances the classification performance and prediction accuracy, compared to the traditional classifier chain (CC). In CC, the order of class labels in the chain plays a significant role in classification performance. By
Ordered Ensemble Classifier Chain for Image and Emotion …
397
default, the random ordering of labels is used in CC. ECC implements an ensemble of CC, where each base classifier in ECC is a CC itself with a distinct random ordering of class labels. However, it requires at least ten CCs, which makes it computationally complex. The objective of this paper is to provide a solution to reduce the computational complexity of the ECC by reducing the number of CCs, thereby resulting in a reduction in training time, but at the same time, ensuring the improvement in classification accuracy and prediction performance. The contribution of this paper is Ecc_Wt_Rase, a technique that implements an ECC with only four CCs, each with a distinct class label ordering. The first CC follows the random ordering of class labels as traditional CC (R); the second CC follows accuracy-based ordering based on hamming loss (A); the third CC follows the sequential ordering of class labels, as given in the concerned dataset (S), and the fourth CC follows ordering based on entropy among class labels (E). Experimental results demonstrate that Ecc_Wt_Rase provides better classification accuracy and prediction performance, based on performance parameters, such as hamming loss, subset accuracy, F1-score and Jaccard accuracy. Further, it reduces the computational cost as it uses only four CCs, whereas, in ECC, there are ten CCs.
2 Related Work Ensemble techniques are another class of machine learning that improves the classification accuracy and prediction performance of the classifiers. It works on the principle of composition; multiple classifiers are implemented for carrying out the same task as ensembles. Traditionally, an ensemble is a two-level classifier, where Level-0 classifiers are called base classifiers and Level-1 classifier is called meta-classifier. Multiple base classifiers predict the class label for an instance. All these predictions are used by the meta-classifier to predict the final class label for that instance. Ensemble techniques, traditionally, work on the principle of voting and weighting. In voting, all base classifiers have equal weight and the class having maximum votes in prediction is considered as the class for that instance. In weighting, the base classifiers having high accuracy are given more weight in prediction, compared to classifiers having low prediction accuracy. Bootstrapping forms the basis of ensemble techniques. Bagging and boosting are the two most common ensemble techniques [6]. Figure 1 represents the block diagram of an ensemble model with ten base classifiers. D is the dataset under consideration for training and testing of the model. Bootstrapping forms the basis for the ensemble model. It generates the datasets used for the training of each base classifier in the ensemble. Bootstrapping ensures that each instance has an equal probability of being selected in the bootstrapped dataset for a base classifier, even if that instance is already being used in another bootstrapped dataset. Each Di is a bootstrapped dataset sample used for training the classifier model M i . Pi represents the prediction generated by the model M i . Meta-classifier uses all
398
P. Himthani et al.
Fig. 1 Block diagram of ensemble model
these predictions and generates the final predictions for an unknown instance using either the voting or weighting approach [7]. Bagging is a common ensemble technique that works on the principle of voting. Boosting is another well-known ensemble technique that works on the principle of weighting. In boosting, more weightage is given to misclassified instances in the subsequent classifiers. Bagging and boosting significantly improve the accuracy of classification, compared to a single classifier; however, bagging is more robust to noise and over-fitting, while boosting is more prone to over-fitting. Due to this, the classification accuracy of boosting is lower, compared to bagging in most scenarios [7]. j=m Let L = {L j } j=1 be the set of m distinct labels of a multi-labeled dataset and D L = {Di }i=n i=1 be the n instances present in the dataset; then for MLC, each Di ∈ {2 }. k=r Let X = {X k }k=1 be the set of r attributes of the multi-labeled dataset, then each training instance TRi is characterized as [X i V L i ], where X i represents attributes and L i represents class labels of TRi , respectively, containing a total of r + m columns, while each test instance TST i is characterized as X i , where X i represents attributes of TST i containing a total of r columns; this is because the test instances do not have class labels associated with them; class labels are only used for training of the model. Here, L i is a binary vector; however, X i can be binary, numeric and categorical or a combination of these data types. Let TR and TST be the training and test datasets for the MLC model with g and h instances, respectively; then, TR and TST will be given as: TR = TR1 , TR2 , TR3 , . . . , TRg TR TR TR TR TR TR TR TR = XTR 1 , L1 , X2 , L2 , X3 , L3 , . . . ., Xg , Lg TST = {TST1 , TST2 , TST3 , . . . , TSTh } TST TST , X2 , X3 , . . . , XTST TST = XTST 1 h
Ordered Ensemble Classifier Chain for Image and Emotion …
399
Here, g < h; i.e., the number of instances in the training dataset is more, compared to the test dataset. Generally, train and test datasets are in ratio 70:30, 80:20 or 75:25. If an instance is associated with a class, then the value of L i for that instance is 1; otherwise 0 [8]. Liu et al. suggested a technique for music emotion classification using convolutional neural network (CNN). Music emotion classification (MER) is one of the most widely known MLC problems. SOFTMAX function forms the basis of the proposed CNN-based MLC classifier for MER. Experiments carried out on the CAL500 dataset show that the proposed CNN-based classifier improves the performance for predicting MLC instances [9]. Read, Pfahringer, Holmes and Frank in 2009 proposed classifier chain (CC) for MLC. CC is a PT technique that predicts the class labels for multi-label instances in a chained manner. All predicted class labels in the chain act as the features for predicting the subsequent class labels; i.e., first class label will act as a feature for predicting second-class label, first- and second-class labels will act as features for predicting third class label, and so on. It is an MLC technique that assumes partial label dependence and uses TRUE class labels of previously predicted classes, used as features for predicting subsequent class labels. CC is one of the oldest and most prevalent techniques for solving MLC problems. The order of class labels plays a significant role in enhancing the performance of classification. By default, a randomly generated order of class labels is used in traditional CC [10]. Liu, Tsang and Muller in 2017 suggested the easy-to-hard learning paradigm for ordering class labels in a CC. They suggested that the class labels which are easy to predict or have high prediction accuracy are placed toward the front end of the CC, while the class labels which are difficult to predict or having low prediction accuracy are placed toward the back end of the CC [11]. Read, Pfahringer, Holmes and Frank in 2021 suggested that an optimal order of class label can be generated by computing a suitable loss function, based on which different orders are generated and the optimal one is identified for the execution of CC [12]. Read, Pfahringer, Holmes and Frank in 2011 suggested ensemble classifier chain (ECC) for MLC. ECC optimizes the performance of CC by incorporating the principles of ensemble models with CC, thereby resulting in improved classification performance and prediction accuracy. Traditionally, ECC comprises ten CCs; each CC has a distinct random class label order. Each CC as a base classifier uses a distinct bootstrapped dataset sample. The voting classifier (bagging) is used as a meta-classifier for predicting the outcome class label. Boosting (weighting classifier) can also be used as a meta-classifier in ECC [13]. Chandran and Panicker in 2017 proposed an efficient multi-label classification system using an ensemble of classifiers. Clustering is used for generating the bootstrapped datasets used for the training of base classifiers in the ensemble model. Support vector machine (SVM) is used as a base classifier [14]. Recently, deep learning, swarm intelligence or their hybrids along with machine learning are used for developing more precise and accurate classifier models. Shekhawat et al. [15] developed new variant of spider monkey optimization for spam review identification [15] and data transformation [16]. The authors in [17] proposed a convolutional
400
P. Himthani et al.
neural network (CNN)-based technique for the detection of brain tumor cells, which is based on deep learning. In [18], the authors proposed a hybrid model for detecting COVID-19 patients using beetle antennae search; this model is based on swarm intelligence. Munjal et al. [19] proposed some strategies for emotion and sentiment analysis [20, 21]. However, the application of these hybrids and similar models is still limited to the areas of binary classification or multi-class classification. They are not applied for solving MLC problems yet; in the later time, their application may lead to a more optimized MLC model.
3 Proposed Ecc_Wt_Rase This paper proposes Ecc_Wt_Rase, a multi-label classification model based on ECC with distinct class label orderings. Similar to ECC, the proposed model implements an ensemble of classifier chains; however, the traditional ECC implements ten CCs, where each CC acts as a base classifier. Each base CC follows a randomly generated order of class labels. In Ecc_Wt_Rase, the proposed ECC implements four CCs with each CC following a distinct class label ordering. The first CC follows a randomly generated order (R) of class labels, similar to traditional CC. The second CC follows the accuracy-based ordering (A) of class labels. The third CC follows the sequential or given class label order (S), while the fourth CC follows entropy-based ordering (E) of class labels. The block diagram of the proposed Ecc_Wt_Rase (Fig. 2) is given below: For accuracy-based ordering, Hamming loss for each class label is computed, based on predictions obtained using binary relevance (BR) technique. The labels are ordered, according to hamming loss in increasing order. The lower the hamming loss, the more accurate is the prediction of the class label. For entropy-based ordering, the entropy of each class label is computed. The class labels are ordering according to the increasing order of entropy. The lower the entropy, the more accurate is the prediction of the class label, as entropy is the measure of impurity present in the data.
Fig. 2 Block diagram of proposed Ecc_Wt_Rase
Ordered Ensemble Classifier Chain for Image and Emotion …
401
Ecc_Wt_Rase works on the principle of boosting; i.e., it is a weighted ensemble model (WT). Here, the classifiers having higher prediction accuracy are given more weightage, compared to classifiers having low prediction accuracy. After the execution of each classifier, the weights associated with the misclassified instances are increased, so that the next classifier will pay more attention to these instances. In case, if the accuracy of any base classifier falls below 50%, the base classifier is rejected and the process is repeated for the development of this base classifier. At last, the weights associated with all base classifiers are computed. For an unknown instance, the weighted predictions computed by the meta-classifier are the final predictions generated by the proposed model. j=m Let L = {L j } j=1 be the set of m distinct labels, D = {Di }i=n i=1 be the n instances and k=r X = {X k }k=1 be the set of r attributes. The operation of the proposed Ecc_Wt_Rase model is given below: Steps for implementing CC with Random Order (R). • Generate a random sequence with unique numbers in RANGE [0, m − 1]. • Arrange L according to this random sequence with unique numbers referring to indices of L. • The resultant L will be the first-class label order (R). • Generate a bootstrapped dataset from D (D1). • Execute CC with R as the order of class labels and compute the predictions. • Compute the error. • If error ≥ 0.5, then reject the model and repeat the process for implementing this CC. • Compute the weight of the model. Steps for implementing CC with Accuracy-based Order (A). • FOR each L j in L, DO: – Apply binary relevance at X and compute the predictions. – Compute the Hamming loss for each Lj . • • • • • •
Arrange the L according to Hamming loss in increasing order. The resultant L will be the second-class label order (A). Generate a bootstrapped dataset from D (D2). Execute CC with A as the order of class labels and compute the predictions. Compute the error. If error ≥ 0.5, then reject the model and repeat the process for implementing this CC. • Compute the weight of the model. Steps for implementing CC with Sequential order (S).
• The given class label ordering of L will be the third order itself (S). • Generate a bootstrapped dataset from D (D3). • Execute CC with S as the order of class labels and compute the predictions.
402
P. Himthani et al.
• Compute the error. • If error ≥ 0.5, then reject the model and repeat the process for implementing this CC. • Compute the weight of the model. Steps for implementing CC with Entropy-based Order (E). • FOR each L j in L, DO: – Compute entropy of L j . • • • • • •
Arrange L according to increasing order of entropy. The resultant L will be the fourth-class label order (E). Generate a bootstrapped dataset from D (D4). Execute CC with E as the order of class labels and compute the predictions. Compute the error If error ≥ 0.5, then reject the model and repeat the process for implementing this CC. • Compute the weight of the model. Steps for generating Final Predictions using Ecc_Wt_Rase. • Normalize the weight WT of each model and compute weighted predictions WTP using actual predictions Y, as:
• Compute the final predictions, as:
• Compute the performance metrics for judging the proposed classifier model.
4 Performance Evaluation Standard multi-label datasets are used for training and performance evaluation of the proposed model and its comparison with traditional CC and ECC models. Three
Ordered Ensemble Classifier Chain for Image and Emotion …
403
60
30
30.69
19.5
40
30.4
50
52.83 45.4 53.51
70
23.02 23.35 21.96
Fig. 3 Comparison for emotions dataset
60 53.59 60.78
datasets, namely emotions [22], scenes [23] and flags [24], are used for implementation and performance analysis. The evaluation of a multi-label classifier is somewhat typical compared to a binary classifier. The performance metrics for MLC are classified into various categories, like rank-based metrics, example-based metrics and label-based metrics. Rank-based metrics are evaluated by computing the probabilities of the outcomes, rather than the outcomes themselves. Example-based metrics are evaluated for each instance to predict. Label-based metrics are evaluated for each class label across all instances. Label-based metrics have two types of evaluations, namely micro and macro [3]. The performance of the proposed Ecc_Wt_Rase model is evaluated using metrics, like Hamming loss (H_Loss), subset accuracy (Subs_Acc), F1-score and Jaccard accuracy. Hamming loss specifies the percentage of samples misclassified. The lower the hamming loss, the better are the predictions made by the classifier. Subset accuracy specifies the percentage of samples, where all class labels are correctly predicted by the model. It is a very critical performance metric, where it is difficult to achieve high values. Higher values for subset accuracy reflect the better prediction performance by the classifier. F1-score is an important metric for justifying the performance of a classifier. It is the harmonic mean of precision and recall, yet it is a more important metric compared to the precision and recall independently. Jaccard accuracy is a variant of accuracy which is mainly computed for imbalance classes and focuses on true positives only, compared to true positives and true negatives in case of accuracy [6]. Figure 3 shows that the Ecc_Wt_Rase attains better values for Hamming loss, subset accuracy, F1-score and Jaccard score over emotions dataset. Compared to traditional ECC, the improvement in Hamming loss is approximately 6%, while the improvements in F1-score and Jaccard accuracy are approximately 2%. The result also shows that there is a significant improvement in subset accuracy in the proposed model, compared to ECC. Compared to CC also, the proposed model attains better results. Figure 4 shows the comparative analysis of ECC and CC with the proposed model over the scenes dataset. The values for Hamming loss, F1-score and Jaccard accuracy
20 10 0 H_Loss Subs_Acc F1-Score Jaccard
CC ECC 10 Ecc_Wt_Rase
66.91 67.79 67.25
68.49 70.58 68.78
CC ECC 10
80 70 60
56.9 57.35 57.37
Ecc_Wt_Rase
68.68 69.44 69.8
Fig. 5 Comparison for flags dataset
80 70 60 50 40 30 20 10 0
10.48 8.08 10.02
Fig. 4 Comparison for scenes dataset
P. Himthani et al.
62.21 59.57 62.68
404
30 20
CC 15.08 13.85 15.38
40
27.82 27.56 26.63
50
ECC 10 Ecc_Wt_Rase
10 0 H_Loss Subs_Acc F1-Score Jaccard
in the proposed model are better than CC, but inferior to ECC. However, the subset accuracy of the proposed Ecc_Wt_Rase model is best, compared to ECC and CC. Figure 5 reflects that the proposed Ecc_Wt_Rase attains best hamming loss, compared to ECC and CC over flags dataset; this justifies that number of misclassified instances in the proposed model is less, compared to ECC and CC. The subset accuracy, F1-score and Jaccard accuracy are also improved using the proposed model. This justifies that the proposed model attains better classification accuracy and prediction performance, compared to ECC and CC. From the results above, it is clear that Ecc_Wt_Rase has the best values of Hamming loss for emotions and flags datasets, while ECC has the best Hamming loss for the scenes dataset. In terms of subset accuracy, Ecc_Wt_Rase outperforms both ECC and CC across all three datasets under consideration. Ecc_Wt_Rase has the best F1-score ad Jaccard accuracy for both emotions and flags datasets, while ECC has the best values of F1-score and Jaccard for scenes dataset. Table 1 represents the ranking of the three models, namely CC, ECC and Ecc_Wt_Rase across four performance metrics under consideration. The proposed Ecc_Wt_Rase outperforms both ECC and CC models by achieving the top rank among the three, followed by ECC at second position and CC at third.
Ordered Ensemble Classifier Chain for Image and Emotion … Table 1 Ranking of models based on performance metrics
Metric
405
CC
ECC
Ecc_Wt_Rase
Hamming loss
3
2
1
Subset accuracy
2
3
1
F1-score
3
2
1
Jaccard accuracy
3
2
1
AVG. rank
3
2
1
5 Conclusion The problem of multi-label classification is in existence for a long time, yet solutions are scarce. This is because this domain is untouched by most researchers, compared to other domains like artificial intelligence, Internet of things, cloud computing, etc. A lot of research work has been done and still going on in the area of classification and machine learning; still, there is a lot of scope in the domain of MLC. This paper proposes Ecc_Wt_Rase, an ensemble model for CC, based on the weighting principle. The processing time and computational cost of the proposed model are lower, compared to ECC because traditional ECC implements ten CCs, while Ecc_Wt_Rase implements four CCs. ECC follows the bagging (voting) principle, while the proposed model follows boosting principle. The time required for generating four different orders for CC is less; this is because the processing is carried out over class labels only, rather than processing the entire dataset with attributes and class labels. Overall, we can say that the proposed model requires approximately 40% less processing time compared to ECC. Results show that the Ecc_Wt_Rase achieves better performance in terms of performance metrics evaluated across three standard multi-label datasets. It is also clear from the rankings that the proposed model attains the best rank among the three models under comparison. However, the improvements obtained in the proposed model are not significant, except for the emotions dataset. Overall, the improvement in Hamming loss is approximately 6%, which is significant for MLC. At the same time, the improvement in other performance parameters is approximately 2%, which is marginal but considerable for MLC problems. This is because all multilabel datasets are different; each has a different label cardinality, a different label density, a different number of attributes, different types of attributes, different correlation coefficients among class labels, etc. This makes the task of multi-label classification more complex, compared to traditional binary classification. However, slight improvements in values of performance metrics for multi-label classifiers are considered good, due to the complex nature of the MLC problems.
406
P. Himthani et al.
References 1. Ghodratnama S, Moghaddam HA (2020) Content-based image retrieval using feature weighting and C-means clustering in a multi-label classification framework. Patt Anal Appl 24:1–10 2. Carrillo D, Lopez VF, Moreno MN (2013) Multi-label classification for recommender systems. AISC 221:181–188 3. Zhang ML, Zhou ZH (2014) A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng 26(8):1819–1837 4. Tsoumakas G, Katakis I (2009) Multi-Label Classification: An Overview. Int J Data Warehouse Min 3(3):1–13 5. Almeida AMG, Cerri R, Paraiso EC, Mantovani RG, Barbon S (2018) Applying multi-label techniques in emotion identification of short texts. Neurocomputing 320:35–46 6. Yapp EKY, Li X, Lu WF, Tan PS (2020) Comparison of base classifiers for multi-label learning. Neurocomputing 394:51–60 7. Han J, Kamber M, Pei J (2012) Data mining: concepts and techniques (3rd edn). Elsevier, Morgan Kauffmann Publications. 8. Senge R, Coz JJD, Hullermeier E (2013) Rectifying classifier chains for multi-label classification [arXiv preprint arXiv:1906.02915] 9. Liu X, Chen Q, Wu X, Liu Y, Liu Y (2017) CNN based music emotion classification. [arXiv preprint arXiv:1704.05665] 10. Read J, Pfahringer B, Holmes G, Frank E (2009) Classifier chains for multi-label classification. In: Proceedings of ECML PKDD, Springer, LNCS, vol 5782, pp 254–269 11. Liu W, Tsang IW, Muller KR (2017) An easy-to-hard learning paradigm for multiple classes and multiple labels. J Mach Learn Res 18:1–18 12. Read J, Pfahringer B, Holmes G, Frank E (2021) Classifier chains: a review and perspectives. J Artif Intell Res 70:683–718 13. Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85:333–359 14. Chandran SA, Panicker JR (2017) An efficient multi-label classification system using ensemble of classifiers. In: Proceedings of ICICICT (IEEE), pp 1133–1136 15. Shekhawat SS, Sharma H, Kumar S, Nayyar A, Qureshi B (2021) bSSA: binary Salp Swarm algorithm with hybrid data transformation for feature selection. IEEE Access 9:14867–14882 16. Shekhawat SS, Sharma H, Kumar S (2021) Memetic spider monkey optimization for spam review detection problem. Big Data. https://doi.org/10.1089/big.2020.0188 17. Bacanin N, Bezdan T, Venkatachalam K, Al-Turjman F (2021) Optimized convolutional neural network by firefly algorithm for magnetic resonance image classification of Glioma brain tumor grade. J Real-Time Process, Special Issue Paper, pp 1–14 18. Zivkovic M, Bacanin N, Venkatachalam K, Nayyar A, Djordjevic A, Strumberger I, Al-Turjman F (2021) COVID-19 cases prediction by using hybrid machine learning and beetle antennae search approach. Sustain Cities Soc 66:102669 19. Munjal P, Narula M, Kumar S, Banati H (2018) Twitter sentiments based suggestive framework to predict trends. J Stat Manag Syst 21(4):685–693 20. Munjal P, Kumar L, Kumar S, Banati H (2019) Evidence of Ostwald Ripening in opinion driven dynamics of mutually competitive social networks. Physica A 522:182–194 21. Munjal P, Kumar S, Kumar L, Banati A (2017) Opinion dynamics through natural phenomenon of grain growth and population migration. In: Hybrid intelligence for social networks, Springer, Cham, pp 161–175 22. Trohidis K, Tsoumakas G, Kallaris G, Vlahavas I (2008) Multi-label classification of music into emotions. In: Proceedings of ISMIR, pp 325–332 23. Boutell MR, Luo J, Shen X Brown CM (2004) Learning multi-label scene classification. Patt Recog 37:1757–1771 24. Goncalves EC, Plastino A, Freitas A (2013) A genetic algorithm for optimizing the label ordering in multi-label classifier chains. In: Proceedings of 25th international conference on tools with artificial intelligence (IEEE), pp 469–476
A Novel Deep Learning SFR Model for FR-SSPP at Varied Capturing Conditions and Illumination Invariant R. Bhuvaneshwari, P. Geetha, M. S. Karthika Devi, S. Karthik, G. A. Shravan, and J. Surenthernath
Abstract Face recognition systems attempt to identify individuals of interest as they appear through a network of cameras. Application like immigration management, fugitive tracing, and video surveillance dominates the technique of face recognitionsingle sample per person (FR-SSPP) which has become an important research topic in academic era. The issue of face recognition can be divided with two groups. The first is recognition of face with multiple samples per person, also known as conventional face recognition. The second method is to recognize faces using only single sample per person (SSPP). Predictions of variations like illumination, disguise in facial images have limitations with SSPP. Pose, illumination, low resolution, and blurriness are considered to be the challenges that face recognition system encounters. All these problems related to face recognition with single sample per person will be dealt with the proposed synthesized face recognition (SFR) model. The SFR model initially pre-processes the input facial image followed by the techniques like 4X generative adversarial network (4XGAN) to enhance the resolution and sharp generative adversarial network (SharpGAN) technique to sharpen the images. In image formation, 3D virtual synthetic images are generated consisting of various poses, and position regression map network technique (PRN) provides the dense alignment of the generated face images. Finally, with face detection and the deep feature extraction using convolution neural network, the proposed SFR model provides a better solution to the problems involved with recognition of face with single sample per person. Triplet loss function helps to recognize or identify aged faces which yields more importance to achieve a good functioning face recognition system which also overcomes the facial features changes. The model will achieve a greater accuracy and creates a vision to evaluate a detailed analysis which includes the conditions of environment and requirements of the application needed. Keywords Deep learning · 4XGAN · SharpGAN · Triplet loss function · PRN R. Bhuvaneshwari · P. Geetha (B) · M. S. Karthika Devi · S. Karthik · G. A. Shravan · J. Surenthernath Department of Computer Science and Engineering, College of Engineering, Anna University, Guindy, Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. Saraswat et al. (eds.), Congress on Intelligent Systems, Lecture Notes on Data Engineering and Communications Technologies 114, https://doi.org/10.1007/978-981-16-9416-5_29
407
408
R. Bhuvaneshwari et al.
1 Introduction Face recognition using single sample per person (SSPP) [1, 3, 18] is a fascinating idea and churns out an enumerated revolution in the last few years for a massive flair in realtime applications like surveillance, border crossing. Human–computer interactions [25] are also an emerging research domain in today’s era, and the photograph ID proof validation is mostly recommended in many fields to validate the applicant is him or not. Face recognition is a technique for recognizing or confirming an individual’s identity by looking at their face. Face recognition software can recognize individuals in photographs, videos, or in real time. Recognizing people from multiple samples per person can be done easily. But, it becomes a challenging one when it is single sample per person. Since there is only one sample, it has to be synthesized with varied poses of that person to identify the face. It is used in airport, railways, bus bookings, and so many. The photograph in driving licence seems to be taken at young age. The victim who disobeys traffic law may be captured by traffic controller CCTV. The photograph extracted from CCTV video is compared with existing database which has applicants who had driving licence around that area. Since it is a single sample taken at some angle, varied capturing and illumination condition with face change as the victim grew old, it is difficult to recognize who did traffic violation. 4X generative adversarial network (4XGAN) [1, 7] enhances low-resolution image to a high-resolution image. Generator network and the discriminator network are the two important main networks of GAN deep neural network. GANs focus on generating data from scratch by emulating the training data distribution. The discriminator decides where the feedback comes from a true dataset of training or false created data. In order to fit the true training data, data optimization has to be done. To produce realistic data, the discriminator guides the generator. The discriminator and the generator both learn simultaneously and have adequate knowledge of the distribution of training samples once the generator has been trained. Sharp generative adversarial network (SharpGAN) [1, 4] provides an enhanced sharpened image. Architecture of SharpGAN consists of the generator network that inputs an unclear image and produces a sharp image. The discriminating network determines whether an image is generated artificially. After training, SharpGAN improvises the blurred image. Feng et al. [10] created 3D images by processing the meshed-up data. Paysan et al. [21] generated 3D faces from the deformed model and reconstructed threedimensional faces using one image with the key points as the input. Blanz and Vetter [23] performed face recognition with different lighting aspects. Mokhayeri et al. [11] provided a framework for generating 3D images from several sets of synthesized images. For CNN model [13, 14], two models namely VGGFace [1] and FaceNet [12, 16] are used in this synthesized face recognition (SFR) model. Deep CNN model [9, 19, 20, 24] with triplet loss function is used in the SFR model in order to eliminate the disadvantage of facial feature changes due to ageing. As a factor of benefaction, the work builds a novel optimal solution for face recognition with SSPP to produce high super-resolution image using parameterized ReLU function instead of LeakyReLU function which is faster and minimize use of
A Novel Deep Learning SFR Model for FR-SSPP …
409
computational resources and facing the problem of ageing when people grew old in face recognition. The problem will be solved using triplet loss function. The problem of varied capturing conditions solved by producing more different angles of 2D face forms 3D image. Rather than comparing the dimensional vector in order to recognize face, prediction method will be used which is more efficient and reduces time. The recommended approach aids in providing a faster process than the other available methods as it does not inculcate tedious architectures. The paper is structured as follows with Sect. 2 which contains the works related to the proposed idea, Sect. 3 contains our proposed SFR method, results with implementation details are explained well in Sect. 4, Sect. 5 depicts the results experimented, and finally, the conclusion is in Sect. 6.
2 Related Works 2.1 Domain-specific Face Synthesis The domain-specific face synthesis method [3] can provide a better degree of accurately predicted results when compared to state-of-the-art methods with minimal computational complexity. The findings showed that face synthesis alone is ineffective in resolving the SSPP and visual domain change issues. Generic learning and synthesis of face operate complementarily using DSFS. For a more robust FR, the proposed DSFS technique could be improved, and synthetic faces with variations are produced.
2.2 Super-resolution GAN Face recognition drew attention to some of the shortcomings of PSNR-based image super-resolution and suggested SRGAN [7], which uses a GAN to complement the content loss function with an adversarial loss. SRGAN reconstructions for large scaled up factors (4x), as confirmed by extensive MOS testing, substantially contain more realistic photographs than reconstructions obtained with state-of-the-art reference methods.
2.3 DeblurGAN DeblurGAN is an adversarial network that is conditionally optimized using a multiple component loss function, as proposed by Orest Kupyn et al. [4]. It implements a new approach for simulating various blur sources in a practical synthetic motion
410
R. Bhuvaneshwari et al.
blur. DeblurGAN greatly aids identification on blurred images, according to a recent benchmark and validation protocol focused on object detection performance.
2.4 Pose and Illumination Normalization Face recognition through pose and illumination [8] was tested with a limited collection of training samples and a single sample per gallery. The capability of deep neural networks to learn nonlinear transformations is very suitable for pose and illumination normalization and the power of 3D models in generating Multiview and various illumination samples. This augmentation concept can be used, but training samples are difficult to acquire. The issue is that 3D GEM accuracy is not high, resulting in normalization result degradation when input identity’s actual form differs significantly from generic shape model. Reconstructing a 3D face from multiple or a single picture in an effective and accurate manner remains a difficult task.
2.5 Neural Network Method The neural network method [1] addresses the issue of limited references and overcomes the issues of pose, illumination, blurriness, and low-resolution images. Superresolution generative adversarial network (SRGAN) aids in addressing the problem of low resolution, and deblur generative adversarial network (DeblurGAN) is used to sort the problem of blurredness. When compared to face recognition techniques that use SSPP, the proposed CNN system achieved high accuracy (generic learning and face synthesizing approaches). The issue is that as people age, their facial features change.
2.6 Varied Capturing Conditions The idea was not just to provide a fair contrast between the FaceNet and OpenFace models [2] but to see how a gradual reduction in model size affects the performance. The models were tested for accuracy and scale in order to determine their applicability to situations with varying environmental conditions and specifications. To handle with very intricate scenarios involving variations among the images used to enrol users into the system and subsequent accesses, a large and well-trained model is needed. Smaller models, on the other hand, are feasible in situations where saving resources is a priority when trained with sufficiently large dataset with adequate representativeness.
A Novel Deep Learning SFR Model for FR-SSPP …
411
2.7 Cross-correlation Matching The proposed novel triplet-based loss optimization approach [5] is for learning complex and nonlinear facial representations that are robust across a variety of realworld capture conditions. CCM-CNN is a fine-tuned model which uses the face image which is generated synthetically from ROIs at still of non-target individuals to address the robustness of facial models. The accuracy of CCM-CCN with state-ofthe-art FR systems is comparable, but with a considerable reduction in complexity, according to the findings.
2.8 Deep Convolutional Neural Network Zeng et al. [6] suggested an approach for SSPP FR, in which a thoroughly trained DCNN is used to fine-tune the DCNN model, and the expanding samples are used to fine-tune the DCNN model. The recognition rates for illumination and expression were achieved when compared with state-of-the-art methods. The proposed approach and the fine-tuned model’s losses of sessions are collected by using a one sample images. As opposed to using a single sample to fine-tune the DCNN model, the proposed approach performs better on SSPP-FR.
3 Dataset The proposed SFR model can be evaluated by the most commonly used benchmarking dataset: Labelled Faces in the Wild (LFW) dataset. LFW dataset is effectively known for the problem of face recognition with unconstrained images. The dataset contains 13,000 and more images of faces collected from the web [22]. Each face has been labelled with the name of the person pictured. About 6000 face pairs are reported in the verification accuracy of a standard LFW evaluation protocol. Dataset consists of two or more distinct photographs nearly of about 1680 captured pictures. The LFW images have for one original and three of “aligned” images which contain “funnelled images”. LFW-a uses an unpublished method “deep funnelled” images. Among these, LFW-a with the deep funnelled images produces good results for most face algorithms for verification over the original images over the funnelled images.
412
R. Bhuvaneshwari et al.
4 Proposed Work: Synergized Face Recognition System for FR-SSPP The architecture diagram for the proposed method is shown in Fig. 1 which depicts various steps which are listed and explained as follows.
Fig. 1 Architecture diagram
A Novel Deep Learning SFR Model for FR-SSPP …
413
4.1 Image Pre-processing Data has been pre-processed in order to prepare it for primary processing or further study. Pre-processing steps could include extracting data from a larger collection, filtering it for different purposes and combining sets of data. It is a data mining technique that converts unstructured data into a usable format. Raw data (data from the real world) is still incomplete, and it cannot be sent via a model. This will result in a number of errors. This is why data must be pre-processed before being sent via a model. Image Resize. The images are converted to a fixed size for training using the neural networks. We are converting the images into a size of 96 * 96 * 3. BGR Conversion. The BGR represents 24 bit with blue lower-addressed 8 bits, green next-addressed 8 bits, and red higher-addressed 8 bits. RGB values are usually written as RGB(r,g,b), with r/g/b values varying from 0 to 255 inclusive, or as rrggbb, with rr/gg/bb being 8-bit hex values. In order to use OpenCV, the input image should be in BGR format. BGR images can be processed more effective than RGB images in OpenCV. Image Scaling. The image is divided by 255 in order to scale the inputs from 0 to 1. Each channel (red, green, and blue are each channel) is 8 bits, so they are each limited to 256(0–255); that is why the inputs are divided by 255.
4.2 Generative Adversarial Network (GAN) Generative adversarial networks (GANs) are algorithmic architectures that pit two neural networks against each other (hence, the “adversarial”) to produce reformed, synthetic instances of data that pass for data in real. They are widely used in image, video, and voice generation. Generative modelling is an unsupervised machine learning task that entails the automated discovery and learning of regularities or patterns in input data in order for the model to generate or produce new examples that could have been feasibly derived from the original dataset. The generalized algorithm for GAN is depicted in Fig. 2 where 4XGAN and SharpGAN generate a better image with higher resolution and sharpness. 4XGAN. The 4XGAN converts original image to four times of its image resolution. With pixels, the more information about the image will be obtained. The 4XGAN, like other GAN architectures, has two parts: initially, the generator and then the discriminator. The generator builds data on a probability distribution basis, while the discriminator attempts to guess whether data is from the input dataset or generator. After that, the generator attempts to optimize the generated data in order to deceive the discriminator. Since residual networks are simple to train and allow them to be significantly deeper in order to produce better performance, the generator architecture uses them instead of deep convolutional networks. This is due to the residual network’s use of a link form known as skip connections. In order to
414
R. Bhuvaneshwari et al.
Fig. 2 GAN algorithm
transform a low-resolution image to a high-resolution image, 4XGAN is used. It is a convolutional neural network that consists of only three layers of convolution: extraction and representation of patches, nonlinear mapping, and reconstruction. SharpGAN. The SharpGAN architecture consists of a network of generators that takes a blurred image as input and produces a sharp image, as well as a discriminating network that determines whether an input image was generated unnaturally. SharpGAN will be taught to work with blurred face pictures which can transform the blurred image into a sharpened face image. After practising in the operational domain, we use SharpGAN. If the image from the operational domain is fuzzy, SharpGAN uses it to produce a sharp image, which solves the issue of the blurred face image. It comprises two blocks of sequenced convolution, nine residual blocks, and two blocks of convolution transposed. Each unit of ResBlock includes a convolution layer, normalization layer, and the ReLU activation layer. Following the first convolution layer in every ResBlock, dropout regularization with a probability of 0.5 is applied. The model is compiled with Wasserstein loss and perceptual loss. In this algorithm, the input is a 256 * 256 * 3 blur image. This is a 28-layered network. It consists of ten convolutions, ten batch normalizations, seven ReLU activations, and finally a tanh activation. As a result, 256 * 256 * 3 sharp image and 384 * 384 * 3 vector are produced. Perceptual loss is a loss obtained by summing the squared errors between all the pixels and taking their mean. Per ceptual_loss = mean(
(y_tr ue − y_ pr ed)2 )
(1)
Wasserstein loss is obtained by taking mean of the product of true pixel values and predicted values. W asser stein l oss = mean(yt r ue ∗ y p r ed)
(2)
A Novel Deep Learning SFR Model for FR-SSPP …
415
4.3 Image Formulation Stereo Conversion (PRN Technique). The faces in the image are detected using DLib library [17]. Yao Feng and others introduced position map regression networks (PRN) technique which predicts the huge alignment and reformed three-dimensional face shapes. PRN reconstitutes the 3D structure of the face and provides the alignment in a dense manner. For this to happen, a 2D representation called UV position map is designed which records the 3D shape of a complete face in UV space, then trains a simple convolutional neural network to regress it from a single 2D image. • The image is first read. A 3D position map is generated on the image. The face landmarks are obtained from the generated 3D position map. The vertices are obtained from the 3D position map. The colours are obtained from the vertices obtained from the 3D position map. Finally, the 3D image is generated and saved in a directory.
• • • • • • •
Synthetic Faces Generation. The obtained 3D images are synthesized to faces of different angles of seven images. The input is Microsoft access table which is a database contains vertices, colours, landmarks of two-dimensional image in three-dimensional image format. Output of the synthesis technique contains seven set of images for each image in dataset. Synthesized seven images are as follows: Facing straight Facing up Facing down Facing right Facing left Right tilted face Left tilted face Synthesized seven images from the Microsoft access table(.mat) file are obtained from position regression map network (3D image generation) (Fig. 3).
Fig. 3 Image Formulation Algorithm
416
R. Bhuvaneshwari et al.
4.4 Deep Feature Extraction The major aim of deep feature extraction is to limit the number of features in a dataset by producing new features from the available ones. These new reduced features sets should be able to summarize most of the information contained in the original set of features. In order to attain an effectively constructed model, the optimized feature extraction can be the key element. Triplet Loss Function. The chore of training a neural network for recognizing face is a challenging one. Any time a new individual is added to the face database, a classifier trained to identify an instance will have to be retrained. Instead of presenting the problem as a classification problem, it can be presented as a similarity learning problem. The network is trained to produce a distance that is small if the image belongs to a known person and big if the image belongs to an unknown person (using a contrastive loss). If we want to produce the images that are the most similar to a given image, we will need to learn a ranking rather than just a similarity. In this case, a triplet loss is used. • In triplet loss function, consider three types of images which is an anchor image (the original image), a positive image which is similar to that of the anchor image, and a negative image which is contrasting to that of the anchor image. • It takes face encoding of three images anchor, positive, and negative. This is a distance-based approach in which distance between the anchor image and positive image is minimum and with the negative image is maximum. VGGFace. Simonyan and Zisserman [26] introduced the VGG network architecture, very deep convolutional networks for large-scale image recognition. The simplicity of the network is highlighted by the use of just 3 × 3 convolutional layers stacked on top of each other in increasing depth. The VGGFace refers to a collection of face recognition models created by members of the visual geometry group (VGG) at the University of Oxford and demonstrated on benchmark computer vision datasets. VGG is an innovative object-recognition model of up to 19 layers. VGG, which was built as a deep CNN, outperforms baselines on a variety of tasks and datasets outside of ImageNet. VGG is one of the most commonly used image recognition architectures today. In this algorithm, the input is a 224 * 224 * 3 vector. The VGGFace network is a 16-layered network. It consists of 11 convolutions and the count 5 max pooling layers as shown in Fig. 4. So, after performing 11 convolutions and 5 max pooling, we get a 25,088-dimensional vector as the output (Fig. 4). FaceNet. FaceNet is a system for recognizing face which was developed in 2015 by researchers at Google that achieved the state-of-the-art results on a range of face recognition benchmark datasets. About the FaceNet face recognition system developed by Google and open-source implementations and pre-trained models, FaceNet is a one-shot model that learns a mapping from face images to a compact Euclidean space where distances are directly proportional to face similarity. A triplet is a series of three images: an anchor image, a matching image, and a non-matching image to the anchor image. In this algorithm, the input is a 250 * 250 * 3 vector. The FaceNet
A Novel Deep Learning SFR Model for FR-SSPP …
417
Fig. 4 VGGFace layers
Fig. 5 Facenet layers
network is a 22-layered network. It consists of 11 convolutions, 4 max pooling, 3 batch normalizations, and 3 fully connected layers. Finally, a L2 normalization is performed on the output. As a result, we get a 128-dimensional vector (Fig. 5).
4.5 Face Embedding The feature extracted from the face is a vector, and process is face embedding. This vector is collated with the images present in the database and is used for face recognition.
4.6 Recognized Face A softmax activation function is used to recognize the faces. Softmax, e Zi − → σ ( Z i ) = K j=1
e Zi
(3)
418
R. Bhuvaneshwari et al.
5 Results and Discussion 5.1 Experimental Results VGGFace has produced a training accuracy of 99.91% and a test accuracy of 72%. FaceNet has produced a training accuracy of 99.75% and a test accuracy of 99%. This shows that VGGFace algorithm overfits (Fig. 6). When comparing other algorithms, methods like deep convolutional networks perform an accuracy of 74%. The deep face recognition method provides an accuracy of 98.95%. The proposed SFR model produces a better result and hence concluded that FaceNet algorithm in the proposed SFR method is the best among the other algorithms which is fast and has the capacity of producing the high accuracy of 99% shown in Table1.
Fig. 6 Graph for the produced accuracy of SFR method
Table 1 Comparing produced accuracy of other methods with SFR method
Method
Algorithm used
Accuracy %
Deep CNN
TDL
74
Deep face recognition
VGGFace
98.95
Synthesized face recognition
VGGFaceNet
72
FaceNet
99
A Novel Deep Learning SFR Model for FR-SSPP …
419
5.2 Implementation Details 4XGAN. In 4XGAN, a 96 × 96 image is converted into a 384 × 384 image. Therefore, the 4XGAN upsamples the images by a factor of 4 producing high-resolution images shown in Fig. 7. SharpGAN. In SharpGAN, a blur image is given as input, and a sharp image is produced as an output shown in Fig. 8. It produced a perceptual loss of 0.49 and Wasserstein loss of 491. 3D Faces Generation. The vertices of 3D images are acquired with their respective colours from a single image, then the result of mesh data(.obj) is saved, which can be accessed with Microsoft 3D Builder shown in Fig. 9. Image Synthesis. The 3D images thus obtained are synthesized to faces of different angles shown in Fig. 10. Fig. 7 High resolution image
420
Fig. 8 Sharp image
Fig. 9 Constructed 3D face from 2D face
Fig. 10 Synthesis of 7 set of 2D faces from 3D face
R. Bhuvaneshwari et al.
A Novel Deep Learning SFR Model for FR-SSPP …
421
VGGFace. The VGGFace algorithm is used for the classification of faces. This is done using Python 3.7. This model was trained for 5 epochs and produced an accuracy of 99.91% (Fig. 11). FaceNet. The FaceNet algorithm is used for classification of faces. This is implemented using Python 3.7. This model was trained for 150 epochs and produced an accuracy of 99.75% (Fig. 12). Requirements. The following libraries are used to build the synthesized face recognition (SFR) model,
Fig. 11 Recognized face using VGGFace
Fig. 12 Recognized face using FaceNet
422
R. Bhuvaneshwari et al.
Keras, Keras-VGGFace, TensorFlow, NumPy, Face recognition, Skimage, SciPy, OpenCV,
6 Conclusion Synthesized face recognition system is developed that can recognize the people’s name more accurately. The main advantage of this work is to recognize faces which are trained by using single sample per person, and the dataset used in the SFR model is LFW dataset. While training, the SFR model faced some main problems which are low resolution, blurriness, and different illumination pose changes that are being solved using modern GAN techniques like 4XGAN and SharpGAN. And, to solve the facial features change which are occurred when people grew old, it will be hard to detect faces since it has single sample for training. So that, the SFR model has been trained which addresses the issue of pose limitations using position map regression network (PRN) technique to remodel a 3D face from the source image and constructed 3D faces for seven different poses by facing straight, facing up, facing down, facing right, facing left, right tilting face, and left tilting face. Because, when people looking at some object, there will be the angle of looking up, down, left, and right which will be more probably 20° or 30°. And by training like this, the SFR model obtained a training accuracy of 99.75% using the FaceNet algorithm, and it produced a test accuracy of 99%. So that, the SFR model is able to detect faces more accurately.
References 1. Abdelmaksoud M, Nabil E, Farag I, Hameed HA (2020) A novel neural network method for face recognition with a single sample per person. IEEE Access 8 publication, 1 June (2020) 2. Sanchez BR, Costa da Silva D, Yuste NM, Sanchez C (2019) Deep learning for facial recognition on single sample per person scenarios with varied capturing conditions. Applied Sciences Publication 13 Dec 2019 3. Mokhayeri F, Granger E, Bilodeau GA (2019) DomainSpecific face synthesis for video face recognition from a single sample per person. IEEE Trans Inf Forens Secur 14 Mar 2019 4. Kupyn O, Budzan V, Mykhailych M, Mishkin D, Matas DeblurGAN J (2018) Blind motion deblurring using conditional adversarial networks. In: IEEE/CVF conference on computer vision and pattern recognition 5. Parchami M, Bashbaghi S, Granger E (2017) CNNs with cross-correlation matching for face recognition in video surveillance using a single training sample per person. In: 14th IEEE international conference on advanced video and signal based surveillance (AVSS) 6. Zeng J, Zhao X, Qin C, Lin Z (2017) Single sample per person face recognition based on deep convolutional neural network. In: 3rd IEEE international conference on computer and communications 7. Ledig C, Theis L, Huszar F, Caballero J, Cunningham A, Acosta A, Aitken A, Tejani A, Totz J, Wang Z, Shi W (2017) Twitter: photo-realistic single image super-resolution using a generative adversarial network. In: IEEE conference on computer vision and pattern recognition
A Novel Deep Learning SFR Model for FR-SSPP …
423
8. Wu Z, Deng W (2016) One-shot deep neural network for pose and illumination normalization face recognition. In: IEEE international conference on multimedia and expo (ICME) 9. Zeng J, Zhao X, Gan J, Mai C, Zhai Y, Wang F (2018) Deep convolutional neural network used in single sample per person face recognition. Comput Intell Neurosci 2018. Aug 2018 10. Feng Y, Wu F, Shao X, Wang Y, Zhou X (2018) Joint 3D face reconstruction and dense alignment with position map regression network. Proc Eur Conf Comput Vis (ECCV) 11. Mokhayeri F, Granger E, Bilodeau GA (2015) Synthetic face generation under various operational conditions in video surveillance. In: Proceedings of the IEEE international conference on image process (ICIP), Sep 2015 12. Schroff F, Kalenichenko D, Philbin J (2015) FaceNet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference computer vision pattern recognition (CVPR), Jun 2015 13. Parkhi O, Vedaldi A, Zisserman A (2015) Deep face recognition. In: Proceedings of the BMVC 1(3) 14. Sun Y, Chen Y, Wang X, Tang X (2014) Deep learning face representation by joint identificationverification. In: Proceedings of the advantages neural information processing systems 15. Zhu P, Yang M, Zhang L, Lee IY (2014) Local generic representation for face recognition with single sample per person. In: Proceedings Asian conference on computer vision Cham. Springer, Switzerland 16. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of the advanced neural information processing systems 17. Beel J, Gipp B, Langer S, Genzmehr M, Wilde E, Nurnberger A, Pitman J (2011) Introducing Mr. DLib: a machine-readable digital library. In: Proceedings of the 11th annual international ACM/IEEE joint conference digital libraries (JCDL), Jun 2011 18. Su Y, Shan S, Chen X, Gao W (2010) Adaptive generic learning for face recognition from a single sample per person. In: Proceedings of IEEE computer social conference on computer vision pattern recognition, Jun 2010 19. Devi MK, Fathima S, Baskaran R (2020) CBCS-Comic Book Cover Synopsis: Generating synopsis of a comic book with unsupervised abstractive dialogue. Procedia Comput. Sci. 172, 701–708 20. Devi K, Fathima S, Baskaran R (2020) SYNC—Short, Yet Novel Concise Natural Language Description: Generating a Short Story Sequence of Album Images Using Multimodal Network. In: ICT Analysis and Applications, Springer, Singapore, pp. 235–245 21. Paysan P, Knothe R, Amberg B, Romdhani S, Vetter T (2009) A 3D face model for pose and illumination invariant face recognition. In: Proceedings of the 6th IEEE international conference on advanced video signal based surveillance, Sep 2009 22. Huang GB, Mattar M, Berg T, LearnedMiller E (2008) Labeled faces in the wild: a database for studying face recognition in unconstrained environments. In: Learned-Miller E, Ferencz A, Jurie F(eds) Proceedings of workshop faces ‘real-life’ images, detection, alignment, recognition. Marseille, France, Oct 2008 23. Blanz V, Vetter T (2003) Face recognition based on fitting a 3D morphable model. IEEE Trans Patt Anal Mach Intell 25(9) 24. Smitha E, Sendhil Kumar S, Hepsibah SC, Mahalakshmi GS, Bhuvaneshwari R (2020) Effective emotion recognition from partially occluded facial images using deep learning. In: International conference on computational intelligence in data science, Springer 25. Bhuvaneshwari R, Kavin Pragadeesh K, Kailash JP (2020) Video vision based browser action control using hand gestures. J Critic Rev 7(04) 26. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition, ICLR, pp 1–14
Attention-Based Ensemble Deep Learning Technique for Prediction of Sea Surface Temperature Ashapurna Marndi and G. K. Patra
Abstract Blue economy is slowly emerging as an integral part of overall economic projection of a country. Significant portion of the world’s population relies on the marine resources for their livelihood. Prediction of sea surface temperature (SST) has many applications in the field of forecasting ocean weather and climate, fishing zones identification, over exploitation of ocean environment and also strategic sectors like defence. Over the years, many approaches based on dynamic models and statistical models have been attempted to predict sea surface temperature. Generally, dynamic models are compute and time-intensive. On the other hand, as statistical approaches are lightweight, sometimes they may fail to model complex problems. Recently, considerable success of artificial intelligence in many applications, especially deep learning (DL) technique, motivates us to apply the same for prediction of sea surface temperature. We have built an attention-based ensemble model over a set of basic models based on different DL techniques that consume uniquely prepared variant datasets to produce better predictions. Outcomes from this experiment and the comparative result with existing techniques justify the efficiency of the proposed methodology Keywords Ensemble forecasting · Attention network · Convolutional neural network · Long short-term memory · Sea surface temperature prediction · Deep learning
A. Marndi (B) · G. K. Patra Academy of Scientific and Innovative Research, Uttar Pradesh, Ghaziabad 201002, India e-mail: [email protected] Council of Scientific and Industrial Research-Fourth Paradigm Institute, Bengaluru, Karnataka 560037, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. Saraswat et al. (eds.), Congress on Intelligent Systems, Lecture Notes on Data Engineering and Communications Technologies 114, https://doi.org/10.1007/978-981-16-9416-5_30
425
426
A. Marndi and G. K. Patra
1 Introduction Oceanography is one of the well-established research domains having significant impact on science, economy and politics. With the advancement of tools and technology, several countries are actively expanding their blue economy as marine resources for business and livelihood. For countries like India, surrounded by sea and ocean from three sides, it is essential that importance should be given to understand the potential of these water resources and provide scientific projections for economic planning. Among the various processes in the sea, the air–sea interactions play an important role in ecological balance which stimulates productivity in sea. The sea water temperature referred as sea surface temperature (SST) significantly influenced by the process of air–sea interaction. Apart from daily change in wind speed, SST changes diurnally due to surface air, but with lesser impact for its underneath [1]. In fact, it has been observed that the temperature can vary by 6 °C on a single day. There are also multi-decadal variations in SST due to ocean currents [2]. Due to Ekman transport [3], the costal winds move warm water towards the shores and thereby replacing them with cooler water from below, which in turn increases the nutrients that feed the marine life. River deltas also influence SST such as it heats faster due to the limited vertical mixing of freshwater flowing over the denser sea water. Tropical cyclones also influence sea surface temperature. In general, the SST decreases after passing of the tropical cyclones [4]. There are short-term SST fluctuations due to rapid influx of glacier freshwater [5], concentrated phytoplankton blooms [6], seasonal cycles, agricultural run-off [7], etc. In addition to these variations, there are regional SST variations as well. SST affects many processes in the sea such as EL-Nino which is defined by prolonged higher sea surface temperature, when compared with the average value in Pacific Ocean [8]. The occurrence of EL-Nino severely affects the rainfall as well as fishing activities, not only in the pacific region, but also in the tropical Indian region [9]. SST also affects the behaviour of the earth’s atmosphere above it. SST helps in identifying potential fishing zones [10]. Considering such several societal and atmospheric benefits, it is important to design efficient model to predict SST in reasonable advance time. The effort to predict the SST can be broadly divided into two categories. One is based on physics used in numerical models where dynamical equations define the evolution. The other one is based upon mainly on data referred as data-driven model. The data-driven models are again sub-divided into statistical model and artificial intelligence (AI) model. The statistical models are based on assumption of hypothesis, while the AI models are purely based on learning from data. Physics-based dynamic models are typically complex and need huge computational infrastructure and longer time for processing. On the other hand, the statistical approaches are lightweight though sometimes they may fail to produce satisfactory outcome from modelling complex problems. Deep learning-based network has the automatic feature selection capabilities without requiring any domain knowledge. In this paper, we have proposed a novel design of an ensemble-based prediction model by combining the
Attention-Based Ensemble Deep Learning Technique …
427
capabilities of two different deep learning models such as long short-term memory (LSTM) and one-dimensional convolutional neural network (1D CNN).
2 Related Work Hitherto, there are several research works that have been performed in both modeldriven and data-driven approaches. As part of conventional approach, i.e. modeldriven approach, many researchers attempted to use coupled general circulation model (GCM) to forecast Atlantic SST anomalies [11, 12]. In case of SST prediction at smaller scale, statistical approaches are more appropriate as these approaches are lightweight in nature without any complex physical equation. Few researchers used statistical methods like empirical canonical correlation [13], regression [14], Markov model [15], empirical orthogonal function [16] to predict SST at different time scale. ANN was used to predict five important components of SST in the tropical Pacific by considering SST anomalies and sea-level pressure as inputs [17]. Later, monthly averages of various meteorological parameters like mean sea level pressure, wind, temperature and cloud cover were used as input to evaluate seasonal and inter-annual SST in the Western Mediterranean Sea from 1960 to 2005 using neural networks [18]. Multi-modal system which consisted of four models: one dynamic, two statistical and one persistence model, combining with a simple composite ensemble, was proposed to predict SST at six months advance [19]. Nonlinear autoregressive neural network was used to predict the monthly mean of SST at six different locations in the North Indian Ocean [20]. SST anomalies over six different locations were predicted at three different time scales, daily, weekly and monthly using hybrid network which consists of numerical and neural technique [21]. Wavelet neural network was proposed to predict SST at six selected locations for five days in advance [22]. SST and SST fronts were predicted using artificial neural network by considering spatial and temporal variability of SST [23]. Along with artificial neural network approach, few attempts have been taken to provide solution for SST prediction using deep learning techniques. Spatiotemporal model and convolutional long short-term memory (ConvLSTM) are been used as the building block to predict SST [24]. As discussed above, several approaches have been attempted for prediction of SST with almost basic models. However, due to complexity of data and more randomness, it might require to solve this problem using multiple models and then pick the best outcomes from them using another model. In this study, we have proposed a new ensemble method based on deep learning technique to enhance the prediction capability of SST.
428
A. Marndi and G. K. Patra
3 Methodology Attention-based ensemble algorithm for SST prediction is proposed in this work. Ensemble model is designed by utilizing the capabilities of two set of DL models that process data in multiple ways. While designing this model, we have tried to extract temporal features using two different models such as 1D convolutional neural network (CNN) and LSTM. As CNN has gained success in extracting temporal information [25] from data, CNN is been used here to assess effect of temporal information for predicting future SST. We have also used LSTM due to its effectiveness in dealing with temporal information of data that has ability to remember long past significant patterns leading to better outcomes [26]. The proposed methodology includes extracting temporal features using two different DL techniques such as 1D convolutional neural network (CNN) and long short-term memory (LSTM) followed by attention network that pays special attention to important features. To understand the prior background technologies, these three algorithms are explained briefly.
3.1 Long Short-term Memory (LSTM) LSTM is efficient in handling time series with help of its different components. Mainly, two states, e.g. cell state (C n ) and hidden state (hn ) are main components of long short-term memory (LSTM) [26]. Both the states are updated in every timestamp, and updated states are propagated to next timestamp. Again, three gates such as forget gate (f n ), input gate (in ) and output gate (on ) as shown in Fig. 1 are responsible for updating these two states. The forget gate helps in removing unwanted information from the cell state, whereas input gate aids in adding new information in the cell state. Output gate is responsible for updating hidden state. i n = σ (Wi In + Ui h n−1 + bi )
Fig. 1 Architecture of a LSTM network
(1)
Attention-Based Ensemble Deep Learning Technique …
429
f n = σ W f In + U f h n−1 + b f
(2)
on = σ (Wo In + Uo h n−1 + bo )
(3)
Cn = f n ∗ Cn−1 + i n ∗ tanh(Wc In + Uc h n−1 + bc )
(4)
h n = tanh(Cn ) ∗ on
(5)
W f , W i , W o , W c and U f , U i , U o , U c represent weight matrices of current and previous timestamps, respectively. The bias vectors of the gates f n , in , on , C n are denoted by bf , bi , bo , bc , respectively, and current cell state is expressed by C n . Also, hn−1 represents the hidden state of previous timestamp, and σ and tanh are the sigmoid and hyperbolic tangent activation functions, respectively. I n represents input to the LSTM network.
3.2 Convolutional Neural Network (CNN) CNN has few functional layers such as convolutional layer, pooling layer and fully connected layer apart from input layer and output layer [27] as shown in Fig. 2a. Functionalities of each layer are mentioned subsequently. Convolutional layer is responsible for extracting high-level features from data by performing convolution operation between the kernel, i.e. matrix of weights and input data matrix. Kernel moves across the input data matrix as per stride (number of step movement size specified) in order to extract relevant features from input data. Concept of padding is used in order to maintain input and output sizes as same when the output data of convolution operation shrinks.
Fig. 2 a Architecture of a CNN model and b architecture of an attention model
430
A. Marndi and G. K. Patra
Pooling Layer: CNN reduces the dimension of a feature map by using two types of pooling methods, i.e. max pooling and average pooling. Maximum value of convoluted matrix is used in the max pooling and similarly average value in the average pooling. Pooling layer is followed by flattening layer which is responsible for converting the data into a one-dimensional array for connecting to the next layer. Fully Connected Layer: Output of flattening layer is connected to fully connected layer in which overall resulting volume is reduced compared to original data.
3.3 Attention Model Attention model consists of encoder, context vector and decoder with attention weights that are generally represented by “alpha” [28]. The encoder encodes the input sequences into context vector. The decoder receives contextual information through attention mechanism which will enable decoder to function with better efficiency. Relationship between context vector, target and the entire input sequence is represented by attention network as shown in Fig. 2b. Context vector contains three types of information, i.e. encoder hidden states, decoder hidden states, alignment between source and target. As per attention mechanism defined in neural machine translation, the two sequences, i.e. input sequence with n components and target sequence with m components are represented by x = [x 1 , x 2 , x 3 …, x n ] and y = [y1 , y2 , y3 …, ym ]. Encoder consists of bidirectional RNN with forward and backward ← − − → hidden states represented by h i and h i , respectively, where i = 1, 2, …, n. Hidden state of decoder is represented as st = f (st−1 , yt−1 , ct ) for the output at position t, where t = 1, 2, …, m. Context vector ct is measured by weighted sum of hidden states of the input sequence. ct =
n
αt,i h i
(6)
i=1
exp(et,i ) αt,i = n k=1 exp(et,k )
(7)
et,i = a (st−1 , h i )
(8)
The alignment model assigns a score αt ,i based on how well matching happens between a pair of input at position i and output at position t, (yt , x i ). The alignment model a is parameterized as feedforward network as described in [28].
Attention-Based Ensemble Deep Learning Technique …
431
3.4 Proposed Method The principles of the new method are based on ensemble-based model network having heterogenous models in multiple levels. This can be briefly described as below three aspects that shape the final solution. • Variation of input dataset to be fed into models of ensemble model network. • Attention network helps in prioritizing features to be used in subsequent models in the model network. • Processing outcomes from heterogenous models lead to better results. Variant Input Dataset: it has been observed that when variant datasets are inputted into multiple models and later on their outputs are fed into ensemble model, the outcome improves [29]. However, we have prepared the input datasets here by changing the length of input sequence. Usually, based on variation in input datasets, the overall performance differs. It has been observed that a balanced variation, i.e. not too high nor too less leads to better outcomes of the ensemble model. In this experiment, six weeks of SST data are considered as input to predict a week ahead value. To ensure better variability in input dataset, one week difference of inputs is considered; e.g. if the first model takes input of six weeks, the second model takes for five weeks and third one for four weeks. Attention Network: use of attention network improves the overall efficiency of model for processing sequential input patterns [30]. This has been introduced along with the models in first level. Ensemble of Heterogenous Models: when same data is fed into heterogenous models, sometimes they produce different output sets. In fact, while one type of model may recognize and prioritize one set of patterns, the other model type may give priorities to different set of patterns or may give different priorities. This observation motivated us to ensemble outputs from heterogenous models. With the above aspects in mind, the new approach was devised as a model network having various models with variant input dataset. Major aspects of this method are elaborated further below. Variant Inputs: There are two sets of models, i.e. 1D CNN and LSTM that take inputs of different length. The first model takes input for six weeks data with one data per day. Second model takes five weeks data which is one week less than previous one. The third model takes four weeks data again one week less than the second model. To generalize this model of inputs, it may be defined as input sub-subsequence (I i ) as the series of input data (x ij ), having same data length as the interval between dataset of two subsequent models. This can be expressed mathematically as below: Ii = xi1 , xi2 , xi3 . . . ., xiq
(9)
where q is the length of the sub-subsequence and i varies from 1 to p where p is the number of sub-sequences in the input sequence. Further, all such input sub-sequences are joined together to form complete input sequence, denoted as I = I 1 , I 2 , I 3 …, I p .
432
A. Marndi and G. K. Patra
Fig. 3 Block diagram of proposed model
In this experiment, value of q is 7 (days) and the value of p is 6 (weeks) as number of sub-sequences contained in the biggest input sequence which is considered as 6 weeks. For first model, the input sequence is I = I 1 , I 2 , I 3 …, I p , for the second model, the input sequence is I = I 2 , I 3 …, I p , and for the third model, the input sequence is I = I 3 …, I p . In other words, for first model, length of input data sequence is p × q, i.e. in this case, 6 × 7. Then, in subsequent model, the input sequence reduces by 7 data points. It may be noted that all these input sequences have same endpoint as the intention is to predict at a fixed advanced time from this endpoint by all these models. Model Network: in this model network as shown in Fig. 3, there are two levels of models feeding outputs of first models to the model from second level. The first level has one set of CNN models and one set of LSTM models. In this case, three CNN models and three LSTM models are used that take data from I 1 … I p , I 2 … I p , and then I 3 … I p for CNN and also for LSTM, essentially input sequence of different length. The first-level models are clubbed with attention network layer to accelerate the overall performance. Then finally, the outputs of all the models combined with attention network from first level are fed into one ensemble model in the second level to predict at 7 days advance, referred as adv in Fig. 3. For validation of the proposed model, even the outputs of these three CNN models are fed into a separate ensemble model and similar another ensemble model is also used for three LSTM models. This has been represented as “temporary validation” layer in Fig. 3.
Attention-Based Ensemble Deep Learning Technique …
433
3.5 Data and Study Area Sea surface temperature (SST) data used in the experiment was collected from mid IR band of “Moderate Resolution Imaging Spectroradiometer” (MODIS) aqua satellite platform at wavelength of 11 with spatial resolution of 4 km. The data was obtained from NASA’s Earth Observing System Data and Information System Physical Oceanography Distributed Active Archive center at the Jet Propulsion Laboratory. These SST data were acquired as Level 3 gridded data product with three dimensions (i.e. latitude, longitude and time) in temporal resolution of one day in Network Common Data Form (NetCDF) format. SST data in NetCDF format was converted to ASCII format i.e.dat extension using Ferret software and then converted to CSV format to use in the proposed algorithm. Data considered for this experiment was collected at Lon-71E, Lat-19 N of Arabian Sea.
3.6 Experimental Setup We have used daily SST data for the period of 2004–2016 from mentioned location of Arabian Sea. Training data consists of data during 2004–2012, and testing data consists of data during 2013–2016. There are two sets of models having different model types but each having three models of same type that consume three different input sequences in level one. Each of these models is also combined with attention layer. In second level, there is an ensemble model based on LSTM that takes output from all these six models of first level. For all the LSTM models, we have started training the network with 10 neurons at first layer and continued to increase number of neurons in the same layer till network is optimized with first layer’s neurons. Similarly, we continue to add layers and fix the number of neurons as done in first layer till whole network is optimized. We have observed that LSTM models are optimized with 8 hidden layers and 100 neurons in each layer. We have determined the epoch size as 100 by plotting training validation graph. In this experiment, CNN was configured with two convolutional layers, two max pooling layers and two dense layers. Convolution layer was built with filter size of 64. We have used “Adam” as optimization algorithm for all the models. The experiment was conducted on Intel(R) Xeon(R) Gold 6148 CPU @ 2.40 GHz with 80 cores and 376 GB RAM. The models were implemented in Python using Keras module of TensorFlow platform.
434
A. Marndi and G. K. Patra
4 Results and Analysis For this experiment, the SST data captured once per day, the peak data over a day, may not be sufficient to predict the future pattern with greater accuracy. Probably, more granular data such as few hours interval data could have been better data source to work such prediction problem. On the other hand, other factors like spatial information such as data values from surrounding locations also influence significantly. Without considering all these aspects, the daily peak data usually remains highly random and difficult to derive inherent pattern among such data. In fact, as the data is collected from coastal area, i.e. at Lon-71E, Lat-19 N of Arabian Sea, several uncertain factors including rapid change in nearby land temperature also lead to more uncertain change in SST. The experiment was performed step by step to ascertain the logical steps taken to formulate the complete solution. For temporal sequence data, LSTM and 1D CNN usually work better than other models. Also, it has been found that attention network works better for long sequence data. Thus, the experiment started with one of these two models, i.e. CNN and then attention network layer was added into them. The performance of adding attention network layer was verified by comparing results with this layer and without this layer. The results came as shown in Fig. 4a where it is compared with prediction results of CNN versus CNN combined with attention layer mentioned as Atn-CNN in the graph. Towards both peaks and troughs, the AtnCNN graph has stretched to match the observed data graph, and thus, it gives better result. The RMSE result reduced from 1.21 to 1.04, and also CC value improved from 0.68 to 0.74. From this experiment, it is found that execution through attention network has improved performance. The next step in this experiment was to varying inputs and then executing them through multiple models and then combine their outputs through an ensemble model. As discussed earlier, the input arrays were prepared to differ by around one week data out of original six weeks data to ensure sufficient difference between any two
Fig. 4 a Graphs comparing results after CNN without attention network and CNN with attention network. b Graphs comparing results after CNN with attention network and ensemble model consuming outputs from three CNN models
Attention-Based Ensemble Deep Learning Technique …
435
input datasets, and thus, three such datasets were prepared to execute for CNN. In fact, as shown in Fig. 3, the ensemble model was introduced at second level. So, the experiment was conducted to validate our logical hypothesis and confirmed the improvement of result after combining all outputs through an ensemble model. This is in fact one of the recently used techniques but needs a balanced approach to devise input data and participating models to produce better results. After executing ensemble model, the outputs were compared with best output obtained in previous layer. The result is shown in Fig. 4b, where the green coloured graphs of ensemble CNN, i.e. marked as En-CNN has stretched beyond brown coloured graph of CNN with attention network layer, to catch up with peaks and troughs of observed data. The performance metrics such as RMSE decreased from 1.04 to 0.80 and CC value increased from 0.74 to 0.76 indicating better performance by ensemble CNN model. However, again with the attempt of further improvement, we continued improvising the method by adding different set of model outputs for ensemble model in Level 2. We did same steps for LSTM models as done in Level 1 and prepared a new set of inputs for ensemble model in Level 2. Now, the new ensemble model denoted as (En-CNN + LSTM) takes six inputs, i.e. three from earlier used CNN with attention layer and next three from newly introduced LSTM model with attention layer. Now, with more heterogenous sources, in fact with varying characteristics of model outputs, the output is improved as shown in Fig. 5a, b. The improvement in performance is probably because of deficiency by earlier CNN models was overcome by LSTM models or they work complementary to each other. From the shown graph, it is evident that in both the graphs, orange-coloured patterns of ensemble model marked as En-CNN + LSTM, having outputs of CNN with attention layer and LSTM with attention layer, stretched up towards peaks and troughs to match with observed data. The CC value of combined ensemble model has improved to 0.81 from 0.76 for CNN and from 0.75 for LSTM. Similarly, RMSE has decreased to 0.76 from 0.80 in case of CNN and from 0.86 in case of LSTM. The granular view of comparison graphs for 75 days is shown in Fig. 6a, where it is clearly visible better matching capability of En-CNN + LSTM over all other models.
Fig. 5 a Graphs comparing results after ensemble model with CNN and LSTM models versus ensemble model of CNN models. b Graphs comparing results after ensemble model with CNN and LSTM models versus ensemble model of LSTM models
436
A. Marndi and G. K. Patra
Fig. 6 a Graphs comparing results after ensemble model with CNN and LSTM models versus ensemble model of CNN models as well as ensemble model of LSTM models for shorter period. b Graphs comparing results after ensemble model with CNN and LSTM models versus basic models of CNN, LSTM, ANN and SVR
The proposed methodology was also compared with multiple existing AI base models such as support vector regressor (SVR), artificial neural network (ANN), standard LSTM, CNN with same configuration as proposed methodology in terms of hyper-parameters, input sequence and lead time. The comparison graph is shown in Fig. 6b, from where it is clearly evident that our network model consisting of multiple levels and different combinations models and inputs leads to significantly better outcome than other models. Efficiency of proposed algorithm was also quantified using different performance metrics such as mean absolute error (MAE), root mean square error (RMSE), correlation coefficient (CC) and mean absolute percentage error (MAPE) and symmetric mean absolute percentage error (SMAPE). Comparison results are presented in Table 1 that depicts gradual improvement in our solution by addition of extra step and outperformed usual standard models. Following steps might be responsible for producing such better performances: Table 1 Summary of RMSE, MAE, MAPE, SMAPE, CC between predicted and observed sea surface temperatures obtained from different models RMSE
MAE
MAPE
SMAPE
CC
SVR
1.75
1.57
ANN
1.47
1.29
5.85
5.87
0.57
4.61
4.63
0.65
LSTM
1.31
CNN
1.21
1.13
3.72
3.86
0.67
1.03
3.66
3.83
Atn-LSTM
0.68
1.12
1.02
3.59
3.66
0.72
Atn-CNN
1.04
0.86
3.02
3.07
0.74
En-LSTM
0.86
0.68
2.27
2.30
0.75
En-CNN
0.80
0.62
2.21
2.24
0.76
En-CNN + LSTM
0.76
0.58
1.73
1.74
0.81
Attention-Based Ensemble Deep Learning Technique …
437
• The proper variation of input dataset might have been useful to prepare a well variant datasets for multiple models which are combined later during ensemble process. • The use of attention network layer for long sequence might be helpful in detecting important trends which are preserved to be used effectively. • Instead of single base model CNN, we used another equivalent model suitable for temporal dataset to provide varying model inputs for ensemble model.
5 Conclusion SST plays crucial role in the air–sea interactions, and its changes can cause adverse result on different applications such as marine ecosystem, climate change and sometimes may cause extreme weather events such as tropical storms, floods and droughts. Our proposed model can perform mid-term SST prediction by considering temporal information. The detail experiment results reveal that the proposed model network outperforms other existing methods in predicting SST, which may be due to its capability to capture variant of patterns with different priority among the models of the network and intelligent use of appropriate models in multiple levels having LSTM, CNN and attention model. The fine-tuning of data and model along with hyper-parameters might have helped in producing significantly better results. This even might work better with the addition of spatial information and more granular data such as data at few hours intervals. In coming days, this new technique will be deployed to devise better solutions with more precise prediction capability for problems in different domains and disciplines.
References 1. Barale V (2010) Oceanography from space: revisited. Springer, p. 263 2. McCarthy GD, Haigh ID, Hirschi JJ-M, Grist JP, Smeed DA (2015) Ocean impact on decadal Atlantic climate variability revealed by sea-level observations. Nature 521(7553):508–510 3. Knauss J (2005) Introduction to physical oceanography. Waveland Press 4. Earth observatory “passing of hurricanes cools entire gulf”. National Aeronautics and Space Administration. Archived from the original o. 5. Boyle EA (1987) North Atlantic thermohaline circulation during the past 20,000 years linked to high-latitude surface temperature. Nature 330(6143):35–40 6. Beaugrand G, Brander KM, Lindley JA, Souissi S, Reid PC (2003) Plankton effect on cod recruitment in the North Sea. Nature 426(6967):661–664 7. Beman JM, Arrigo KR, Matson PA (2005) Agricultural runoff fuels large phytoplankton blooms in vulnerable areas of the ocean. Nature 434(7030):211–214 8. El Niño/Southern Oscillation (ENSO) June 2009 (2009) National Climatic Data Center, National Oceanic and Atmospheric Administration 9. WW2010 (1998–04–28). “El Niño”. University of Illinois at Urbana-Champaign (n.d.) 10. Solanki HU (2005) Evaluation of remote-sensing-based potential fishing zones (PFZs) forecast methodology. Cont Shelf Res 25:2163–2173
438
A. Marndi and G. K. Patra
11. Stockdale T, Balmaseda N, Magdalena A, Vidard A (2006) Tropical Atlantic SST prediction with coupled ocean–atmosphere GCMs. J Clim 19(19):6047–6061 12. Hu ZZ, Kumar A et al (21013) Prediction skill of monthly SST in the North Atlantic Ocean in NCEP climate forecast system version 2. Clim Dyn 40(11–12):2745–2759 13. Collins DC (2004) Predictability of Indian ocean sea surface temperature using canonical correlation analysis. Clim Dyn 22(5):481–497 14. Kug JS, Kang I-S, Lee J-Y, Jhun JG (2004) A statistical approach to Indian ocean sea surface temperature prediction using a dynamical ENSO prediction. Geophys Res Lett 31(9) 15. Xue YA (2000) Forecasts of tropical Pacific SST and sea level using a Markov model. Geophys Re Let 27(17):2701–2704 16. Neetu RS (2011) Data-adaptive prediction of sea-surface temperature in the Arabian Sea. IEEE Geosci Remote Sens Lett 8(1):9–13 17. Wu AW (2006) Neural network forecasts of the tropical Pacific seasurface temperatures. Neural Netw 19(2):145–154 18. Garcia-Gorriz EAS (2007) Prediction of sea surface temperatures in the western Mediterranean Sea by neural networks using satellite observations. Geophys Res Lett 34(11) 19. Kug J. S, J.-Y. L.-S.: Global sea surface temperature prediction using a multimodel ensemble. Monthly Weather Rev 135(9):3239–3247 20. Patil KM (2013) Predicting sea surface temperatures in the North Indian Ocean with nonlinear autoregressive neural networks. Int J Oceanograp 21. Patil KM (2016) Prediction of sea surface temperature by combining numerical and neural techniques. J Atmos Oceanic Tech 33(8):1715–1726 22. Patil KA (2017) Prediction of daily sea surface temperature using efficient neural networks. Ocean Dyn 67:357–368 23. Gandhi A, Saieesh SA (2018) Prediction of daily sea surface temperature using artificial neural networks. Int J Rem Sens 39(12) 24. Changjiang Xiao NC (2019) A spatiotemporal deep learning model for sea surface temperature field prediction using time-series satellite data. Environ Modell Softw 120. Oct 019 https://doi. org/10.1016/j.envsof 25. Tsantekidis A, Passalis T, Tefas A, Kanniainen J, Gabbouj M, Iosifidis A (2017) Forecasting stock prices from the limit order book using convolutional neural networks. In: 2017 IEEE 19th Conference on Business Informatics (CBI), pp 7–12. https://doi.org/10.1109/CBI.2017. 23(2017) 26. Hochreiter S, Schmidhuber J (1997) Long short-term memory neural computation 9(8):1735– 1780. https://doi.org/10.1162/neco.1997.9.8.1735 27. LeCun Y, Bengio Y et al (1995) Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks 3361(10) 28. Dzmitry B, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:1409.0473 29. Marndi A, Patra GK, Gouda KC (2020) Short-term forecasting of wind speed using time division ensemble of hierarchical deep neural networks. Bull Atmos Sci Tech 1:91–108. https:// doi.org/10.1007/s42865-020-00009-2 30. Yao Q et al (2017) A dual-stage attention-based recurrent neural network for time series prediction. In: International joint conference on artificial intelligence (IJCAI)
Political Optimizer-Based Optimal Integration of Soft Open Points and Renewable Sources for Improving Resilience in Radial Distribution System D. Sreenivasulu Reddy
and Varaprasad Janamala
Abstract A novel and simple meta-heuristic optimization technique viz., political optimizer (PO) is proposed in this paper to identify the size and optimal location of solar photovoltaic (SPV) system. The main objective is to minimize the distribution loss and is solved using proposed PO. The computational efficiency of PO is compared with the literature, and its superiority is highlighted in terms of global solution at initial stage. The physical requirements of SPV system via soft open points (SOPs) among multiple laterals are solved considering radiality constraints in second stage. The proposed concept of interoperable-photovoltaic (I-PV) system has been applied on standard IEEE 69-bus system and has shown the effectiveness in performance enhancement of the system. Keywords Interoperable-photovoltaic system · Political optimizer · Resilience · Soft open points · Loss minimization · Radial distribution system
1 Introduction Given the depletion of traditional power production fuels and their contribution to rising greenhouse gas emissions, the integration of renewable energy-based (RE) distribution generation (DG) sources has emerged as a viable option for creating long-term distribution networks. The DGs have the potential to significantly improve distribution performance in terms of loss reduction and grid dependency [1]. Appropriate placement and size of DGs can significantly improve network performance, and various investigations have been conducted and effectively contributed in this research area [2]. Automatic network reconfiguration (ANR), on the other hand, is preferred for smoothing feeder loadings at different time intervals by changing D. Sreenivasulu Reddy (B) · V. Janamala Department of Electrical and Electronics Engineering, School of Engineering and Technology, Christ (Deemed to be University), Bangalore, KA 560074, India e-mail: [email protected] V. Janamala e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. Saraswat et al. (eds.), Congress on Intelligent Systems, Lecture Notes on Data Engineering and Communications Technologies 114, https://doi.org/10.1007/978-981-16-9416-5_31
439
440
D. Sreenivasulu Reddy and V. Janamala
network configuration. Several researchers have proposed various approaches for solving this problem, including simultaneous allocation of DGs for the base case, faulty conditions, and different loading levels [3]. At this point, it is critical to recognize that the ANR requires all network branches to be equipped with remotely controllable switches, which may not be economically or practically feasible in some areas due to the need for hardware, software, and associated infrastructure maintenance. In order to address this, the concept of soft open points (SOPs) was introduced in 2010 [4, 5]. The SOPs are typically power electronics-based AC-DC converters at the beginning and DC-AC inverters at the end, preferably connected at the distribution system’s normally open points. The main advantages of SOPs are not only that they are fast and dynamic, but they also allow for the exchange of real and reactive powers among multiple feeders, which contributes to improved resilience and performance by improving voltage profile and lowering losses. There are several devices that fall under the category of flexible AC transmission system (FACTS), such as unified power flow controllers (UPFC) [6] and interline power flow controllers (IPFC) [7], which are primarily used in transmission systems. Similarly, the interlinephotovoltaic (I-PV) system is another type of distribution infrastructure that has been proposed for exporting power to multiple feeders from a single solar photovoltaic system via inverters [8]. SOPs operate similarly to back-to-back converters between two feeders in terms of power balance at both ends. Cao et al. [9] examine the benefits of SOPs in distribution system operation by proposing a steady-state model for them. Power extraction at the starting point (AC-DC conversion), power injection at the ending point (DC-AC conversion), and internal losses in the power flow exchange between these two points should all be equal to zero. The optimal power ratings of SOPs, their operational states, and optimal reconfiguration are determined using a combined approach that incorporates Powell’s direct set (PDS) and genetic algorithm (GA) methods while minimizing loss and balancing feeder load. Long et al. [10] disregard the internal losses of SOPs in favor of converter ratings and derives the optimal ratings of SOPs via non-linear optimization from the operating region derived from Jacobian matrix-based sensitivity analysis for variable DG power and feeder voltages. Wang et al. [11] propose mixed-integer second-order cone programming (MISOCP) for optimizing SOP sizing and reconfiguration while taking technical and economic aspects of distribution system operation into account. In [12], ant colony optimization (ACO) is used to determine the optimal network reconfiguration, and later, a multi-objective function is formulated and solved for minimizing loss, load balancing, and maximization of DG penetration level. In [13], the authors propose active and reactive power compensation via DGs and capacitors, as well as effective redistribution of power flows within the network, by determining optimal ratings for DGs, capacitors, and SOPs via a bi-level programming technique. In [14], the authors propose active and reactive power compensation via DGs and capacitors, as well as effective redistribution of power flows within the network. Yao et al. [15] propose energy storage system-integrated (ESS) SOPs for maximizing the benefits of SOPs when WT and SPV systems are present in the distribution network. Using a primal–dual interior-point algorithm, a sequential optimization model is proposed for minimizing distribution losses under various load profiles.
Political Optimizer-Based Optimal Integration …
441
According to the aforementioned studies, the simultaneous optimal allocation of renewable DGs, ANR, and SOPs can result in increased performance and resilience in the operation and control of distribution systems. Additionally, the problem is more complex and optimization intensive due to the presence of multiple objective functions, multiple location choices, and their ratings, as well as operational and planning constraints. The paper proposes a new meta-heuristic political optimizer (PO) [16] for solving the simultaneous optimal allocation of renewable DGs, ANR, and SOPs while taking into account the distribution system’s radiality constraint. The proposed PO is used to formulate an objective function for loss minimization and voltage profile improvement on the IEEE 69-bus system.
2 Problem Formulation Loss minimization and voltage profile improvement are the two most critical operational requirements for enhancing RDS resilience through SOPs, and the proposed multi-objective function is as follows: min f (Ploss ) =
I(r2 m)r(mn) +
mn
nb 1 (1 − |Vi |)2 nb i=1
(1)
The proposed objective function is solved taking into account the following voltage, current, DGs power, and radiality constraints. |Vi |min ≤ |Vi | ≤ |Vi |max
i = 1, 2, . . . , nb
(2)
|Ii | ≤ |Ii |max i = 1, 2, . . . , nl
(3)
Pdg(i) ≤ Pload(T )
(4)
2 PSOP(i) + Q 2SOP(i) ≤ SSOP(i)
(5)
PSOP(i) + PSOP( j) ≤ Pdg(i)
(6)
n br = n b − 1
(7)
where Ploss is the total active power loss in the feeder distribution, mn, nb, and nl are the branch index, number of buses and number of branches, respectively; Pdg(T ) , Pload(T ) , and |V (i) | are the DG active power, total active power demand, and magnitude
442
D. Sreenivasulu Reddy and V. Janamala
of voltage for nth bus, respectively; PSOP(i) , QSOP(i) , and S SOP(i) are the real, reactive, and apparent powers of SOP at bus-i, respectively.
3 Political Optimizer Strategic political gaming during the election phase in western countries served as the primary inspiration for the development of this newly developed meta-heuristic political optimizer (PO) [16]. By decomposing the entire election process into major phases, the heuristic algorithm can represent the following: (i) party formation and constituency allocation represent population initialization, (ii) election campaign represents exploration and exploitation phases, (iii) interparty election, party switching represents balancing between exploration and exploitation phases, which includes population updating, and (iv) election representation. PO is a highly competitive optimization technique for solving non-linear complex optimization problems in real-time engineering. The first phase, as with any heuristic algorithm for population initialization, PO considers the process of nomination by the number of political parties (n), with its number of candidates (n) in a constituency, and the same process in multiple constituencies/dimensions (d) simultaneously. P = {P1 , P2 , . . . , Pi , . . . , Pn }
(8)
j Pi = pi1 , pi2 , . . . , pi , . . . , pin
(9)
T j j 1 2 n pi = pi,1 , pi,2 , . . . , pi, j , . . . , pi,d
(10)
Now, for each electoral district, there may be n election candidates representing various political parties, and the jth candidate may represent a single solution, as suggested by. C = C1 , C2 , . . . , C j , . . . , Cn
(11)
j j j C j = p1 , p2 , . . . , p j , . . . , pnn
(12)
The best fitness candidate for a particular party can serve as the constituency’s leader, and is evaluated by, q j pi∗ = pi , where q = arg min f pi , ∀i = {1, . . . , n} 1≤ j≤n
(13)
Political Optimizer-Based Optimal Integration …
443
Following the election phase, all of the party leaders in each constituency can now be chosen as a solution vector, as specified by, P ∗ = p1∗ , p2∗ , . . . , pi∗ , . . . , pn∗
(14)
Members of parliament can be chosen from a variety of constituencies, and grouped as, C ∗ = c1∗ , c2∗ , . . . , c∗j , . . . , cn∗
(15)
In the second phase, the exploration and exploitation phases are developed using the strategies employed by each candidate during the election campaign. When fitness improves, Equation (16) is used to update the variables; otherwise, Equation (17) is used when fitness deteriorates.
∗ Both methods of updating the position make reference to the party leader pi and the constituency winner c∗j . Here, it is a random number with a uniform distribution in the range [0, 1], and m ∗ is used to address the party leader position first and then the constituency winner position. ⎧ j ∗ ∗ ⎪ m + r m − p (t) i,k ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ m ∗ + (2r − 1) ∗ − p j (t) m j i,k pi,k (t + 1) = ⎪ ⎪ ⎪ ⎪ ⎪ ∗ j ∗ ⎪ ⎪ m + (2r − 1) − p (t − 1) m ⎪ i,k ⎪ ⎩
if pi,k (t − 1) ≤ pi,k (t) ≤ m ∗ or j
j
pi,k (t − 1) ≥ pi,k (t) ≥ m ∗ j
j
if pi,k (t − 1) ≤ m ∗ ≤ pi,k (t) or j
j
pi,k (t − 1) ≥ m ∗ ≥ pi,k (t) j
j
if m ∗ ≤ pi,k (t − 1) ≤ pi,k (t) or j
j
m ∗ ≥ pi,k (t − 1) ≥ pi,k (t)
⎧ j ⎪ m ∗ + (2r − 1)m ∗ − pi,k (t) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ p j (t − 1) + r p j (t) − p j (t − 1) j i,k i,k i,k pi,k (t + 1) = ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ m ∗ + (2r − 1) ∗ − p j (t − 1) ⎪ m ⎪ i,k ⎪ ⎪ ⎪ ⎩
j
j
(16) j j if pi,k (t − 1) ≤ pi,k (t) ≤ m ∗ or j j pi,k (t − 1) ≥ pi,k (t) ≥ m ∗ j j if pi,k (t − 1) ≤ m ∗ ≤ pi,k (t) or j j pi,k (t − 1) ≥ m ∗ ≥ pi,k (t) j j if m ∗ ≤ pi,k (t − 1) ≤ pi,k (t) or j j m ∗ ≥ pi,k (t − 1) ≥ pi,k (t)
(17) In the third phase, a proper balance between exploration and exploitation is proposed using the scenario of an electoral candidate switching parties. While party switching may occur concurrently with the election campaign in generation, it is implemented in PO after the election campaign phase via an adaptive parameter called the party switching rate (λ), which is defined as linearly decreasing from one to zero over the course of the iteration phase.
444
D. Sreenivasulu Reddy and V. Janamala
q = arg max f prj
(18)
1≤ j≤n
The fourth stage maps the election phase for the purpose of determining the fitness value using winners from all constituencies, as specified by, j q = arg min f pi , where c∗j = pqj
(19)
1≤ j≤n
In the fifth phase, the exploitation and convergence stages are modeled by using Equations. (13) and (14) to create a synergistic scenario of parliamentary affairs following elections (19). When the winner of the jth constituency (c∗j ) improves its fitness, it updates its position and fitness by selecting the reference of a random parliamentarian (cr∗ ). Additionally, it causes the position of that parliamentarian (c∗j ) in the vector of winning party members (Pi ) to be updated appropriately. By emulating this simple strategic gaming for election victory, PO stands uniquely and has been developed in a way that makes it suitable for determining a global solution.
4 Results and Discussion The data for the IEEE 69-bus system are considered from [17]. There are 68 sectionalizing switches on it. The principal feeders in this system are buses 1–27, with lateral feeders being buses 28–35, 36–46, 47–50, 51–52, 53–65, 66–67, and 68–69. The overall load on the system is (3802.1 kW + j 2694.7 kVAr), while the total loss on the system is (225.0007 kW + j 102.1648 kVAr). With a magnitude of 0.9092 p.u., the 65th bus has the lowest voltage. The proposed PO is used to optimize the location and size of the PV type DG. The DG size is [0, 3802 kW], and the location search space is [2, 69]. According to the proposed PO, the photovoltaic system at bus-61 should be 1873 kW in size. (83.224 kW + j 40.536 kVAr) is the best loss solution. The lowest voltage, 0.9683 p.u., is found on the 27th bus. In comparison to the uncompensated or base situation, total losses are reduced by 63.01 percent. Table 1 compares the PO solution to a number of different ways. WIPSO [18] and SADE [18] were shown to be superior, while WOA [19, 23], ALO [20], DA [21], ALGA [22], HGWO [24], and EA-OPF [25] were found to be substantially identical. The PO is a powerful rival to various HSAs, according to the comparison, and has fine-tuned the decision variables toward global optima regardless of the size of the search field. At least one location on each lateral feeder must be identified to link the PV system to multi-laterals. In this example, the PO must determine the best PV position and size across the whole search space (i.e., [bus-2 to bus-69]), as well as the best I-PV buses on lateral feeders. The I-PV system’s new branch parameters are chosen to correlate to the IEEE 69-bus standard’s 68th branch [40]. A large number of case
Political Optimizer-Based Optimal Integration …
445
Table 1 Comparison of PO results for single PV allocation Method
Location
Size (kW)
Ploss (kW)
Qloss (kVAr)
V min (p.u)
Base case
–
–
225.0007
102.1648
0.9092
WIPSO [18]
61
1890
83.2345
40.5144
0.9684
SADE [18]
61
1890
83.2345
40.5144
0.9684
WOA [19]
61
1872.82
83.2279
40.5381
0.9683
ALO [20]
61
1872.82
83.2279
40.5381
0.9683
DA [21]
61
1872.7
83.224
40.5361
0.9685
ALGA [22]
61
1872
83.2
40.5371
0.9683
WOA [23]
61
1856.1
83.2336
40.5651
0.9682
HGWO [24]
61
1872
83.2
40.5371
0.9683
EA-OPF [25]
61
1870
83.22
40.57
0.9685
Proposed PO
61
1872.7
83.224
40.5361
0.9683
studies are carried out. Case 3 corresponds to one optimal lateral feeder, Case 4 to two optimal lateral feeders, Case 5 to three optimal lateral feeders, and Case 6 to four optimal lateral feeders, as indicated in Table 4. The optimal PV system size at the 61st bus, the optimal integration sites on lateral feeders, the optimal open branches for radiality, as well as the accompanying losses, and voltage profile are included in each situation. Case 6’s findings are summarized here. The photovoltaic array, which has a capacity of 3340 kW, is best located on the 61st bus. I-PV integration locations on laterals are buses 2, 17, 11, 50, and the open branches 52 (9–53), 11 (11–12), 8 (8–9), and 48. (48–49). Under these conditions, the losses (3.747 kW + j 1.875 kVAr) and minimum voltage at the 69th bus are 0.9934 p.u. The losses are decreased by 98.33% when compared to the base case, effectively separating the network from the grid. Figure 1 shows a single line schematic of a 69-bus system with four lateral feeders and SOPs. Tables 2 and 3 indicate the ideal PV size and SOPs radiality constraint rating. The optimal PO findings are as follows: PV installation on the 61st bus, 3314 kW PV capacity, I-PV integration locations on laterals equal to those in Case 6. Under these conditions, the losses (2.2159 kW + j 1.3208 kVAr) and minimum voltage at the 69th bus are 0.9971 p.u. The voltage profile in each scenario is depicted in Figure 2. The voltage profile is essentially flat across the feeder in Cases 5 to 7. It can be mentioned at this point that the more SOPs connected to lateral networks, the better the system’s performance will be in terms of reduction in losses and enhanced voltage profile. The convergence features of PO are also shown in Fig. 3. PO is an effective and efficient algorithm for more frequently locating global optima, according to the boxplots.
446
D. Sreenivasulu Reddy and V. Janamala
Fig. 1 Optimal PV location and SOPs with four lateral feeders in 69-bus system
Table 2 Optimal I-PV configuration results using PO Case #
I-PV buses
Open branches
PV (kW)
Ploss (kW)
V min (p.u.)
1
–
–
–
225
0.9092 (65)
2
61
–
1872.7
83.224
0.9683 (27)
3
61, 2
52
1924
23.814
0.9716 (27)
4
61, 2, 17
52, 11
2429
7.406
0.9934 (69)
5
61, 2, 17, 11
52, 11, 8
2645
5.914
0.9934 (69)
6
61, 2, 17, 11, 50
52, 11, 8, 48
3340
3.747
0.9934 (69)
7
61, 2, 17, 11, 50
–
3314
2.2159
0.9971 (69)
5 Conclusion The required energy demand due to the huge growth of power demand in the system, the required generation sources to be interoperable. The optimal size of SPV and the proper location for installation is proposed in this article using SOPs, while maintaining the operational constraints. Considering the minimal loss as an objective function, a novel and an efficient optimization technique political optimizer (PO) has been proposed to evaluate the global solution. According to the comparison study, the proposed PO outperformed several other techniques available in the literature. Additionally, numerous case studies on the IEEE 69-bus system have demonstrated the ability of the interoperable-photovoltaic (I-PV) system concept to enhance the resilience of RDS through optimal SOPs and performance enhancement of the system and the requirement for real-time implementation.
Political Optimizer-Based Optimal Integration …
447
Table 3 Optimal ratings of SOPs using PO under different cases Case Branch PSOP(i) (kW) QSOP(i) (kVAr) PSOP(j) # (i)−(j) (kW)
QSOP(j) (kVAr) S SOP (kVA) Pl(i−j) (kW)
3
61−2 154.92
−1227.22
−154.87
1227.23
1236.97
0.04
4
61−2 150.46
−1607.97
−150.38
1607.99
1615.01
0.08
61−17 560.76
380.75
−560.74 −380.74
677.79
0.01
5
6
61−2 127.44
−1778.97
1779.00
1783.56
0.09
61−17 560.76
380.75
−560.74 −380.74
677.79
0.01
61−11 239.02
171.01
−239.02 −171.01
293.89
0.00
52.90
−2328.26
2328.32
2328.91
0.16
61−17 560.76
380.75
−560.74 −380.74
677.79
0.01
61−11 239.02
171.01
−239.02 −171.01
293.89
0.00
549.29
−769.51 −549.28
61−2
61−50 769.54 7
−127.35
−52.74
945.44
0.03
2287.96
2288.19
0.15
251.61
−379.01 −251.61
454.93
0.01
61−11 503.18
340.19
−503.17 −340.19
607.38
0.01
61−50 724.11
502.59
−724.09 −502.59
881.42
0.02
32.87
−2287.91
61−17 379.02
61−2
Fig. 2 Voltage profile for different case studies
−32.72
Voltage magnitude (p.u.)
1 0.98 case 1 case 2 case 3 case 4 case 5 case 6 case 7
0.96 0.94 0.92
10
20
40 30 Bus number
50
60
448
D. Sreenivasulu Reddy and V. Janamala
Fig. 3 Box plot of PO results for 25 trails of each case
80
Fitness function
70 60 50 40 30 20 10 0
Case-2
Case-3
Case-4
Case-5
Case-6
Case-7
References 1. Ismael SM et al (2019) State-of-the-art of hosting capacity in modern power systems with distributed generation. Renew Energy 130:1002–1020 2. P Prakash DK Khatod 2016 Optimal sizing and siting techniques for distributed generation in distribution systems: a review Renew Sustain Energy Rev 57 111 130 3. Badran O et al (2017) Optimal reconfiguration of distribution system connected with distributed generations: a review of different methodologies. Renew Sustain Energy Rev 73:854–867 4. Bloemink JM, Green TC (2010) Increasing distributed generation penetration using soft normally-open points. In: IEEE power and energy society general meeting, pp 1–8 5. JM Bloemink TC Green 2013 Benefits of distribution-level power electronics for supporting distributed generation growth IEEE Trans Power Deliv 28 911 919 6. Gyugyi L et al (1995) The unified power flow controller: a new approach to power transmission control. In: IEEE Trans Power Del 10(2):1085–1097 7. L Gyugyi KK Sen CD Schauder 1999 The interline power flow controller concept: a new approach to power flow management in transmission systems IEEE Trans Power Del 14 3 1115 1123 8. Khadkikar V, Kirtley JL (2011) Interline photovoltaic (I-PV) power system—a novel concept of power flow control and management. In: 2011 IEEE power and energy society general meeting. IEEE 9. W Cao Wu Jianzhong N Jenkins C Wang T Green 2016 Benefits analysis of soft open points for electrical distribution network operation Appl Energy 165 36 47 https://doi.org/10.1016/j. apenergy.2015.12.022 10. C Long Wu Jianzhong L Thomas N Jenkins 2016 Optimal operation of soft open points in medium voltage electrical distribution networks with distributed generation Appl Energy 184 427 437 https://doi.org/10.1016/j.apenergy.2016.10.031 11. C Wang G Song P Li H Ji J Zhao 2016 Optimal configuration of soft open point for active distribution network based on mixed-integer second-order cone programming Energy Procedia 103 70 75 12. Qi Qi Wu Jianzhong Lu Zhang M Cheng 2016 Multi-objective optimization of electrical distribution network operation considering reconfiguration and soft open points Energy Procedia 103 141 146 13. Lu Zhang C Shen Y Chen S Huang W Tang 2017 Coordinated optimal allocation of DGs, capacitor banks and SOPs in active distribution network considering dispatching results through bi-level programming Energy Procedia 142 2065 2071
Political Optimizer-Based Optimal Integration …
449
14. Lu Zhang C Shen Y Chen S Huang W Tang 2018 Coordinated optimal allocation of DGs, capacitor banks and SOPs in active distribution network considering dispatching results through bi-level programming Appl Energy 231 1122 1131 https://doi.org/10.1016/j.apenergy.2018. 09.095 15. Yao C, Zhou C, Yu J, Xu K, Li P, Song G (2018) A sequential optimization method for soft open point integrated with energy storage in active distribution networks. Energy Procedia 528–533 16. Askari Q, Irfan Y, Mehreen S (2020) Political optimizer: a novel socio-inspired meta-heuristic for global optimization. Knowledge-Based Syst 195:105709 17. Duong MQ et al (2019) Determination of optimal location and sizing of solar photovoltaic distribution generation units in radial distribution systems. Energies 12(1):174 18. S Rajeswaran K Nagappan 2016 Optimum simultaneous allocation of renewable energy dg and capacitor banks in radial distribution network Circ Syst 7 3556 3564 19. Dinakara Prasad Reddy P, Veera Reddy VC, Gowri Manohar T (2018) Optimal renewable resources placement in distribution networks by combined power loss index and whale optimization algorithms. J Electr Syst Inf Tech 5:175–191 20. Dinakara Prasad Reddy P, Veera Reddy VC, Gowri Manohar T (2018) Ant lion optimization algorithm for optimal sizing of renewable energy resources for loss reduction in distribution systems. J Electr Syst Inf Tech5:663–680 21. Suresh MCV, Belwin EJ (2018) Optimal DG placement for benefit maximization in distribution networks by using Dragonfly algorithm. Renew Wind, Water, Solar 5:4 22. Hassan AA, Fahmy FH, Abd El-Shafy AN, Abuelmagd MA (2015) Genetic single objective optimisation for sizing and allocation of renewable DG systems. Int J Sustain Energy 23. Dinakara Prasad Reddy P, Veera Reddy VC, Gowri Manohar T (2017) Whale optimization algorithm for optimal sizing of renewable resources for loss reduction in distribution systems, Renew: Wind, Water, Solar 4:3 24. M Dixit P Kundu HR Jariwala 2017 Incorporation of distributed generation and shunt capacitor in radial distribution system for techno-economic benefits Eng Sci Tech Int J 20 482 493 25. K Mahmoud N Yorino A Ahmed 2016 Optimal distributed generation allocation in distribution systems for loss minimization IEEE Trans Power Syst 31 2 960 969
Face and Emotion Recognition from Real-Time Facial Expressions Using Deep Learning Algorithms Shrinitha Monica
and R. Roseline Mary
Abstract Emotions are faster than words in the field of human–computer interaction. Identifying human facial expressions can be performed by a multimodal approach that includes body language, gestures, speech, and facial expressions. This paper throws light on emotion recognition via facial expressions, as the face is the basic index of expressing our emotions. Though emotions are universal, they have a slight variation from one person to another. Hence, the proposed model first detects the face using histogram of gradients (HOG) recognized by deep learning algorithms such as linear support vector machine (LSVM), and then, the emotion of that person is detected through deep learning techniques to increase the accuracy percentage. The paper also highlights the data collection and preprocessing techniques. Images were collected using a simple HAAR classifier program, resized, and preprocessed by removing noise using a mean filter. The model resulted in an accuracy percentage for face and emotion being 97% and 92%, respectively. Keywords Face recognition · Emotion recognition · Neural networks · Deep learning
1 Introduction An emotion is a sudden change in the behavior of an individual that arises due to various situations and involves two states, i.e., physiological and mental state which is also both subjective and personal. Emotions involve a good deal of feelings, thoughts, actions, and behaviors toward oneself and others. Humans possess emotional intelligence which helps to build stronger relationships in both personal and professional lifestyles. Similarly, an artificially intelligent machine such as a S. Monica (B) · R. Roseline Mary CHRIST (Deemed to be University), Bangalore, India e-mail: [email protected] R. Roseline Mary e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. Saraswat et al. (eds.), Congress on Intelligent Systems, Lecture Notes on Data Engineering and Communications Technologies 114, https://doi.org/10.1007/978-981-16-9416-5_32
451
452
S. Monica and R. Roseline Mary
robot must learn emotions to be efficient and gain a better understanding while interacting with humans. The process of giving emotional intelligence to artificially intelligent machines is known as affective computing. The basic emotions are neutral, happiness, anger, surprise, fear, sadness, and disgust. These emotions are further classified into a range of positive and negative emotional expressions such as guilt, amusement, excitement, shame, satisfaction, and so on. Emotions can be detected using a multimodal approach which involves a combination of facial expressions, body language, gestures, body movements, and tone of voice. Out of which, face is considered as the primary channel of emotions, and this paper uses facial expressions to predict emotions. The capability of a machine to recognize emotions can be applied in various fields such as e-learning applications to adjust the presentation style based on users’ mood, in the medical field, especially for people having autism, in robots for improving the human–computer interaction (HCI); it can also be used to identify the impact of an advertisement based on users’ emotion while viewing it. Typical drawbacks in emotion recognition are that it depends on external factors such as lighting, quality of the camera whereas internal factors such as dimensions and color. Hence, it is necessary to preprocess the images to predict the emotion better. This research work ensures that the data collected are preprocessed to an extent where it can produce efficient results by re-dimensioning and also with noise removal. Deep learning methods such as deep convolutional neural networks are used for face and emotion recognition which is the best known and widely used method by most researchers because of its high efficiency and robustness.
2 Literature Review Julina [1] developed a model that detects emotions from faces in a video. Facial emotions are identified and recognized with convolutional neural networks (CNN). The various steps involved in this process are as follows: accepting video as input, separating frames from the video, preprocessing the frames, landmark detection also extracting the various features of the face such as eyes, mouth, and nose and finally classifying the emotions using the JAFFE dataset. The model only classifies three emotions, i.e., angry, happy, and sad. Lee et al. [2] implemented an application that recognizes various emotions such as happy, sad, angry, fear, disgust, surprise, and neutral along with classifying the emotions into neutral, positive, and negative. The model captured the user’s image and emotions portrayed in the captured image developed using convolutional neural networks (CNN). The author [3] proposed a CNN-based deep learning architecture for detecting emotions from images. Features such as eyes, eyebrows, mouth, and lips are extracted, and classification of emotions is efficiently achieved with the help of Keras using JAFFE and facial emotion recognition (FERC-2013) datasets. Pranav E [4] created an emotion recognition model using a deep convolutional neural network (DCNN) capable of classifying emotions from a manually collected dataset with an accuracy percentage of 78.04. In [5], spatial CNN for processing static images and temporal CNN for optical flow images have
Face and Emotion Recognition from Real-Time Facial Expressions …
453
been developed using deep convolutional neural networks which is then integrated into the deep belief network model with a deep fusion network, and the final results are sent to linear support vector machine for classifying emotions. The above papers mainly use deep neural networks for classifying emotions with various datasets. A drowsiness detection model was developed in [6] to detect face, emotion, and drowsiness using k-nearest neighbor to classify the emotions. Verma et al. [7] use the Karolinska Directed Emotional Faces (KDEF) dataset to categorize seven different emotions such as afraid, angry, disgusted, happy, neutral, sad, and surprised using CNN framework named Venturi design architecture. Kim et al. [8] proposed a new method for facial emotion recognition that is based on hierarchical deep learning that combines two features in a hierarchical manner, namely—appearance feature-based network and geometric feature. In appearance feature-based network, overall features are extracted from the face using preprocessed LBP image, while the geometric feature-based network learns the coordinate change (i.e., face muscle movement) from action units (AUs) landmarks of facial expressions. Hasani [9] has proposed a 3D convolutional neural network method for facial emotion recognition in videos and used facial landmark points with the help of deep neural networks (DNNs) for feature extraction using various datasets including the FER dataset. Singh [10] has analyzed the limitations in the current emotion recognition systems and concludes that the face reflects brain activities. The emotions that are produced by brain activities can be captured to a great extent with facial expressions recognition. Identifying emotions based on brain activities can give better results, but it is complex, timeconsuming, and expensive as well, while facial emotion recognition overcomes these drawbacks. The paper aims at extracting facial emotions such as happiness, sadness, anger, surprise, and fear using neural networks. The emotion expressed is captured using a simulator, and the accuracy percentage of each emotion is detected. The papers stated in the paragraph tries to achieve better accuracy by combining two or more frameworks with deep learning algorithms to identify the maximum number of emotions. Jeong et al. [11] have proposed a novel facial landmark detection (FLD) method for detecting face landmarks while driving. The proposed framework consists of a weighted random forest regressor (WRFR) to detect driver’s drowsiness. The model fails when there is an intense change in face movement and posture. Salih and Kulkarni [12] have compassed a comparative study on video-based facial and emotion recognition using multiple strategies. The outcome points out the challenges and research gaps that are yet to be researched. Shojaeilangari et al. [13] have proposed a model from video frames for identifying facial expressions and constructing an emotion recognition model to obtain facial emotions from real-world situations by extreme sparse learning (ESL) technique. The model performs with better accuracy on acted and unconstrained expressions. Varghese et al. [14] have implemented a real-time emotion recognition model with the advanced techniques and methodologies employed for recognizing emotional expressions. A comparative study of research models such as complete automated recognition of temporal phases from facial actions, facial expression recognition using AAM, emotion recognition using facial expressions, speech and multimodal knowledge, expressive face, body
454
S. Monica and R. Roseline Mary
gestures, and speech recognition for multimodal emotion recognition was thoroughly examined, and a relative study of the same was performed. Dinculescu et al. [15] implemented an algorithm analyzing facial expression on a novel methodology that focuses to identify valence and intensity of emotion utilizing two kinds of textures from face and eight facial characteristic points for better accuracy and to improve the time complexity which is an alternative for pattern recognition using statistical methods. The reviews in this paragraph mainly focus on the drawbacks of the existing technique and also the alternatives that can be used to outperform the current algorithms.
3 Proposed Framework The proposed model is designed specifically for the organization or enterprise-level individuals. Hence, the data are firstly collected, preprocessed, and stored. The stored data are then used for analysis. The first step in the proposed framework is data collection that is done in two phases: collecting images with faces and collecting images with emotions. The face of an individual is captured, resized, preprocessed, and saved along with their name in separate folders for easy classification (see Fig. 1). Each emotion of that particular individual is stored as .png images in separate folders along with the name (see Fig. 2) which ensures better accuracy and also reduces the time complexity of the algorithm. The images are preprocessed by resizing them to a 48 × 48 size and removing noise using Gaussian filters. Histogram of gradients method is employed for face detection and emotion is recognized using a deep learning algorithm. The model is capable of detecting faces in real time and identifies the emotions dynamically (see Fig. 3).
Fig. 1 File organization structure to show how the detected faces are stored
Face and Emotion Recognition from Real-Time Facial Expressions …
455
Fig. 2 Folder structure for storing emotions
Happy Identify faces using HOG
Facial Data Collection Preprocessing
Emotion Data Collection
Recognize emotions using Deep Learning Shrini
Fig. 3 Proposed framework
4 Implementation Details 4.1 Phase I: Data Collection and Preprocessing Since the proposed method uses a face for recognizing emotions, facial images are collected by using a simple HAAR classifier program. The program is capable of detecting the region of interest (i.e., face) and storing it as an image. Since the
456
S. Monica and R. Roseline Mary
Fig. 4 Sample of the facial data collected based on the region of interest
Fig. 5 Snapshot of various emotions used in the model
region of interest varies from person to person, it was necessary to resize the images uniformly followed by removing noise using a Gaussian filter. The program is capable of capturing 24 frames per second. 30 images of each individual were captured for better face recognition, and 150 images for each emotion were captured. The images were resized and normalized (see Fig. 4). Figure 5 consists of images expressing emotions such as anger, disgust, fear, happiness, neutrality, sadness, and surprise, respectively. The collected data were then preprocessed further by removing noise using a simple mean filter. Removing noise can improvise the process of recognizing faces and emotions (see Fig. 6).
4.2 Phase II: Implementation Using Deep Neural Networks Face Detection using Histogram of Gradients. Face detection is carried out using histogram of gradients (HOG) in which every pixel of the image is replaced by an arrow pointing to a darker pixel. The arrows as a collection are known as gradients. The entire process is tedious and time-consuming. Hence, the image is split into windows of 16 × 16 pixels. The highest count of gradients in each window is
Face and Emotion Recognition from Real-Time Facial Expressions …
457
Fig. 6 Sample figure depicting the noise removal process
considered to reduce the arrows in the resulting gradient pattern. At last, the region of interest is the gradient pattern that looks like a face. Face Recognition using Neural Networks and Linear Support Vector Machine. Each face differs in posture and position; hence, firstly, the face is straightened before recognizing. Secondly, we find the facial features that help in differentiating one person from another. The steps are performed using the OpenFace library which is capable of extracting 128 different features of a face that allows distinguishing two different individuals by a neural network model. Finally, the linear support vector machine (SVM) classifier takes input from the Webcam in real time and then compares it with the stored facial features of the dataset. measurement that is closely related is classified and recognized as the person by returning the folder name (name of the person) as depicted in Fig. 7. Emotion Recognition using Deep Learning. Emotions are hard to analyze as they vary from person to person yet deep learning methods overcome this problem
Fig. 7 Face recognition
458
S. Monica and R. Roseline Mary
Fig. 8 Proposed model with face and emotion recognition
Fig. 9 Proposed model with face and emotion recognition
to some extent. The dataset containing emotions is split into 70–30 ratios for training and validation, respectively. The images are then passed to a convolutional network. Max pooling is used for feature extraction. The activation functions used are ReLU and softmax. The model was trained for 100 epochs. The result is shown in Fig. 8.
5 Result and Analysis The accuracy performance for face detection is about 97% as the HOG model is capable of identifying faces in itself accurately. In Fig 9, it is evident at the end of 100 epochs the accuracy percentage of emotion recognition is about 92% for the training set and 62.50% for the validation set. The model is working well on training data but is not very accurate in the validation set. The performance of the model is better between 40 and 60 epochs as we can see that the validation and training set is closely related at 50 epochs. It can be depicted that the model is not generalized enough for different types of emotions and should be trained with more data to perform well on the real-life dataset. The model accuracy and loss for the training and validation dataset can be seen in Figure 10. Collecting more data, preprocessing it, and using a multimodal approach such as tone of voice and gestures along with facial and emotion recognition would lead to better accuracy.
Face and Emotion Recognition from Real-Time Facial Expressions …
459
Fig. 10 Model accuracy and loss for training and validation set
6 Conclusion The proposed framework can recognize the faces and emotions of an individual with an accuracy of approximately 97% and 92%, respectively. Emotions and expressions differ in humans; it can be used in customizable applications, such as playing videos or songs based on users emotions or moods. So, the proposed model can be used which collects images from an individual and then tunes itself to identify the person and then identify the emotion related to them. In this case, the accuracy level of identifying the emotions is high. The model can be used by educational institutions to analyze the emotions of students during class and moderating the teaching content. It can also be used in companies to monitor the employee’s stress level and thereby take necessary actions to reduce their stress. The model can be further improved by using state-of-the-art methods such as multimodal emotion detection for better accuracy.
References 1. Julina J, Sharmila JT (2019) Facial emotion recognition in videos using HOG and LBP. In: 2019 4th international conference on recent trends on electronics, information, communication and technology (RTEICT) 2. Lee H, Hong K (2017) A study on emotion recognition method and its application using face image. In: 2017 international conference on information and communication technology convergence (ICTC) 3. Jaiswal A, Krishnama Raju A, Deb S (2020) Facial emotion detection using deep learning. In: 2020 international conference for emerging technology (INCET) 4. Pranav E, Kamal S, Satheesh Chandran C, Supriya M (2020) Facial emotion recognition using deep convolutional neural network. In: 2020 6th international conference on advanced computing and communication systems (ICACCS) 5. Zhang S, Pan X, Cui Y, Zhao X, Liu L (2019) Learning affective video features for facial expression recognition via hybrid deep learning. IEEE Access. 7:32297–32304
460
S. Monica and R. Roseline Mary
6. Uppal A, Tyagi S, Kumar R, Sharma S (2019) Emotion recognition and drowsiness detection using Python. In: 2019 9th international conference on cloud computing, data science and engineering (Confluence) 7. Verma A, Singh P, Rani Alex J (2019) Modified convolutional neural network architecture analysis for facial emotion recognition. In: 2019 international conference on systems, signals and image processing (IWSSIP) 8. Kim J, Kim B, Roy P, Jeong D (2019) Efficient facial expression recognition algorithm based on hierarchical deep neural network structure. IEEE Access. 7:41273–41285 9. Hasani B, Mahoor M (2017) Facial expression recognition using enhanced deep 3D convolutional neural networks. In: 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW) 10. Singh D (2012) Human emotion recognition system. Int J Image, Graph Sign Process 4:50–56 11. Jeong M, Ko B, Kwak S, Nam J (2018) Driver facial landmark detection in real driving situations. IEEE Trans Circuits Syst Video Technol 28:2753–2767 12. Salih H, Kulkarni L (2017) Study of video based facial expression and emotions recognition methods. In: 2017 international conference on I-SMAC (IoT in social, mobile, analytics and cloud) (I-SMAC) 13. Shojaeilangari S, Yau W, Nandakumar K, Li J, Teoh E (2015) Robust representation and recognition of facial emotions using extreme sparse learning. IEEE Trans Image Process 24:2140–2152 14. Varghese A, Cherian J, Kizhakkethottam J (2015) Overview on emotion recognition system. In: 2015 international conference on soft-computing and networks security (ICSNS) 15. Dinculescu A, Vizitiu C, Nistorescu A, Marin M, Vizitiu A (2015) Novel approach to face expression analysis in determining emotional valence and intensity with benefit for human space flight studies. In: 2015 E-health and bioengineering conference (EHB)
A Real-Time Traffic Jam Detection and Notification System Using Deep Learning Convolutional Networks Sedish Seegolam and Sameerchand Pudaruth
Abstract Mauritius faces traffic jams regularly which is counterproductive for the country. With an increase in the number of vehicles in recent years, the country faces heavy congestion at peak hours which leads to fuel and time wasting as well as accidents and environmental issues. To tackle this problem, we have proposed a system which consists of detecting and tracking vehicles. The system also informs users once a traffic jam has been detected using popular communication services such as SMS, WhatsApp, phone calls, and emails. For traffic jam detection, the time a vehicle is in the camera view is used. When several vehicles are present at a specified location for more than a specified number of seconds, a traffic jam is deemed to have occurred. The system has an average recognition accuracy of 93.3% and operates at an average of 14 frames per second. Experimental results show that the proposed system can accurately detect a traffic jam in real time. Once a traffic jam is detected, the system dispatches notifications immediately, and all the notifications are delivered within 15 s. Compared to more traditional methods of reporting traffic jams in Mauritius, our proposed system offers a more economical solution and can be scaled to the whole island. Keywords Vehicle detection · Traffic jam detection · Traffic notification
1 Introduction Traffic congestion has become a major problem that affects the transport network in many countries in the world on a daily basis. Traffic congestion is regarded as a physical condition where the traffic system is paralyzed due to limited movements, prolonged delays, and lower speeds [1]. Traffic congestion is one of the major urban S. Seegolam · S. Pudaruth (B) ICT Department, University of Mauritius, Reduit MU, Mauritius e-mail: [email protected] S. Seegolam e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. Saraswat et al. (eds.), Congress on Intelligent Systems, Lecture Notes on Data Engineering and Communications Technologies 114, https://doi.org/10.1007/978-981-16-9416-5_33
461
462
S. Seegolam and S. Pudaruth
concerns due its impact on citizens, environment, and the economy [2]. Being stuck in traffic brings stress to drivers and makes country lose a lot of money. In the 2019–2020 financial year, the road development authority (RDA) in Mauritius spent a total of Rs 1962 million in the implementation of different projects to alleviate traffic congestion [3]. Currently, it is not possible to reliably estimate the amount of traffic on the different roads of Mauritius. Some attempts have been made to try and provide people with an accurate estimation. The attempts were mostly manual methods which brought forward issues such as accuracy and safety. There are many junctions in Mauritius and placing someone at those junctions to manually feed the data to the system is not feasible or sustainable for long periods of time. To alleviate the congestion problem, an automated road traffic analysis system is being proposed. The system will apply computer vision and artificial intelligence techniques on traffic videos and provide drivers and other road users with updates on the traffic situation. This will allow drivers coming to that area to take another path and at the same time prevent the congestion from increasing. Thus, saving both time and money for the country as well as reduce the stress for drivers. The notifications will be provided in real time within reasonable delays by making use of services like WhatsApp, short message service (SMS), calls, and emails. This paper proceeds as follows. Related works on traffic congestion systems and vehicle counting systems are provided in Sect. 2. Section 3 describes state-of-theart object-detection algorithms. The design of the proposed system is described in Sect. 4. Implementation details are provided in Sect. 5, and experimental results to validate the proposed system are provided in Sect. 6. We conclude the paper in Sect. 7.
2 Literature Review Related works on traffic jam detection systems as well as vehicle counting systems are reviewed in this section. Over time, several approaches have been made to detect traffic jams. The oldest and most dependable being employing someone on important roads to report the issues. With the advancement in technology and the increase in the number of vehicles, several new automated approaches have been proposed. Wei and Hong-ying [4] made use of texture differences between a congested image and an unobstructed image for vehicle density estimation. Although the system had good accuracy, it did not relay the information to users. Gholve and Chougle [5] proposed an embedded wireless sensor network (WSN) system to detect congestion. The system made use of a magnetic sensor to detect vehicles passing over nodes on the road. Each WSN node consists of a microcontroller and a transceiver module. The system detects congestion by comparing the count from each node. When the counts from the sensors are different, there is congestion, and if they are constant for each node, then there is no congestion. However, this type of system can produce unreliable results if vehicles are parked on the sensors.
A Real-Time Traffic Jam Detection and Notification System …
463
Roopashree et al. [6] implemented a model that used ultrasonic sensors to detect vehicles on the road. If a vehicle is detected, the system proceeds to inform the ambulance drivers before they reach that point via a mobile application so that they can take another route. Somayajula [7] proposed a system that captures images from live traffic feeds at different locations and then runs a deep learning algorithm to detect whether the image shows a traffic jam. The system also identifies the start and end time of the congestion. However, the system had a very low accuracy of 45%. Lam et al. [8] came up with a real-time traffic congestion detection system using online images provided by the local government. Haar-like features are used for vehicle detection. For traffic congestion estimation, a threshold of the image correlation coefficient of successive images is used together with a threshold for the number of detected vehicles. There are two levels of congestion: normal and congested. The system correctly estimates congestion but lacks a module to send data to users. Nidhal et al. [9] developed a system that counts vehicles by detecting and pairing the vehicles backlights from images captured in real time. The density is then estimated and compared to a threshold. If the threshold is exceeded for ten consecutive images, then a traffic jam is deemed to have occurred. The system is fast enough to be used on highways and has an average detection accuracy of 96%. Zhang et al. [10] have used mask region-based convolutional neural networks (R-CNN) to identify motorbikes, pedestrians, and vehicles from traffic videos in relatively crowded spaces. However, it was found that it is very challenging to detect motorcycles and pedestrians in crowded metropolitan areas. Khan et al. [11] made use of a drone to collect aerial data. The video was processed using a variety of methods like stabilization and calibration. The vehicles in motion were then detected and tracked by making use of a series of algorithms implemented in C++ and the OpenCV library. The information was then used to create an origin destination (OD) matrix for different legs of a roundabout by placing virtual counters at each exit and entry points. The framework was tested using data filmed by a drone on a roundabout in Belgium. The number of vehicles were counted, and an OD matrix was generated. The number of vehicles were also counted manually. The system was found to be very accurate. However, the drone had a limited flight time due to its battery, and the law prevents the flying of drones over specific areas. Zulfikar and Suharjito [12] made use of Twitter as a source of information for realtime traffic detection in Indonesia. Support vector machines (SVM) with a sigmoid kernel were used for classification. An accuracy of 96% was obtained. The system was compared to the Google Maps traffic jam detection feature. It was able to detect 17 occurrences of traffic jams out of 25. Robert [13] detected and tracked vehicles from videos by using a framework based on a hierarchical approach. The first layer extracts image features, and the second fuses the image features to detect vehicle features like headlights and windshields. The final layer fuses the vehicle features to gain more confidence for vehicle detection. This framework is independent of illumination allowing detection both during the day and night. It had a detection rate of 80% during the day and 97% at night. Despite all the existing systems, which detect traffic jams to a decent accuracy, handling traffic jams remains a challenging problem. Hence, innovative and improved
464
S. Seegolam and S. Pudaruth
methods should be considered for traffic detection. Adding a functionality that notifies users via popular communication platforms about any traffic jam that occurs would be a crucial improvement. Different approaches to relay the information to the users are also considered in this study.
3 Object Detection Algorithms Object detection is the most important component of real-time traffic detection systems as every vehicle passing on the road needs to be detected. Object detection is very easy for humans; however, for machines, it is a very complex task. To cater for this, several algorithms have been formulated in the literature. Some of the most recent ones are described in this section.
3.1 Yolo You Only Look Once (YOLO) unifies all the different components of object detection into a single convolutional neural network that predicts bounding boxes and their respective class probabilities. The system makes use of the entire image to predict bounding boxes at the same time. YOLO takes as input an image and separates it into k-by-k grids. If the center of an object lies in a grid cell, the object is detected. Each cell then predicts N bounding boxes and makes five different predictions for the following: height of the image (h), width of the image (w), the (x, y) coordinates of the center of the bounding box, and a confidence score. Finally, the object is located using a threshold value. Any bounding boxes with confidence scores lower than the threshold value are eliminated. YOLO [14] is open source, and it is very fast as it can process around 45 frames per second. Thus, YOLO can process videos in real time. However, it has difficulty in detecting smaller objects. Redmon and Farhadi [15] came up with YOLOv2, an improved version of YOLO which can process 67 frames per second. YOLOv2 can also detect smaller objects. YOLO9000 is another version of YOLO which can detect over 9000 object categories [15]. YOLOv3 is yet another improved version with even better recognition accuracy [16]. Recently, Bochkovskiy et al. [17] came up with YOLOv4 which is a significant upgrade from YOLOv3 with an increase of 10% in the mean average precision (mAP). YOLOv4 employs CSPDarknet53 for object detection. YOLOv4 makes use of a spatial pyramid pooling and a path aggregation network (PAN). This fusion generates a stabilized network that delivers an ideal balance between speed and accuracy.
A Real-Time Traffic Jam Detection and Notification System …
465
3.2 Single Shot Multiplebox Detector (SSD) Single shot multiplebox detector (SSD) is an algorithm that also offers a good balance between speed and accuracy [18]. SSD runs a convolutional network once on an image while calculating a feature map. The feature map is then used by a 3 × 3 sized convolutional kernel to predict bounding boxes and classification probability. It also makes use of different anchor boxes of different aspect ratios and then learns the offset rather than learning the absolute location of the boxes. SSD predicts many bounding boxes after several convolutional layers. Having different convolutional layers operating at different scales allows it to detect objects at different scales. SSD is very performant for low resolution images. For a 300 × 300 input image, SSD has a mAP of 77.2% at 46 fps which is better than faster R-CNN which has a mAP of 73.2% [18]. SSD can be used for real-time video processing.
3.3 MobileNet-SSD MobileNet was created as a lightweight deep neural network architecture designed principally for mobiles and embedded vision applications to allow recognition tasks to be done very fast [19]. Liu et al. [18] made use of the VGG-16 as the base network in the original SSD. To tackle the problem of running high-end neural networks on low-powered laptops and smartphones, MobileNet was integrated in the SSD framework as a replacement for VGG-16.
3.4 R-CNN Algorithm Regional-convolutional neural network (R-CNN) makes an image go through a region proposal algorithm to get the smaller parts of the original image where the object being looked for is thought to be [20]. This creates about 2000 regions. A CNN is used to create a 4096-dimensional feature vector. These features are then fed into a machine learning algorithm to classify each object within that candidate region. The algorithm also predicts four offset values which increases the precision of bounding boxes. For each class, a score is given, and a greedy non-maximum suppression algorithm is used on all the regions to locate an object. R-CNN achieves a mean average precision (mAP) of 53.3%. However, it is not fit for real-time video processing systems as it takes about 53 s to process an image [20]. It also takes a large amount of time to train the model as the convolutional network is applied 2000 times.
466
S. Seegolam and S. Pudaruth
3.5 Fast R-CNN Algorithm Girshick [21] improved R-CNN by feeding the original image directly in the CNN to create a convolutional feature map. Regions of proposal are then identified from this map and then warped into squares through a pooling layer. They are then reshaped into a fixed size and fed into a fully connected layer. A softmax layer is used to predict the class of the proposed region and the offset values for the bounding boxes. Fast R-CNN takes about 2.3 s to process an image and has a mAP of 63%, which is a significant improvement over R-CNN [21].
3.6 Faster R-CNN Algorithm Both R-CNN and fast R-CNN make use of selective search to locate region proposals, but this is a very slow process that affects the overall object detection network [22]. In faster R-CNN, the selective search algorithm is eliminated. Like in fast R-CNN, the image is fed to a convolutional network which generates a convolutional feature map. A different network is used to predict the region proposals. A ROI pooling layer is used to reshape the predicted region proposals; then, it is used to classify the image in the proposed region, and the offset values for the bounding boxes are predicted. Faster R-CNN takes about 0.3 s to process an image. However, this high performance is still not sufficient for real-time video processing systems.
3.7 Mask R-CNN Algorithm Mask R-CNN extends faster R-CNN via a new approach for instance segmentation [23]. Faster R-CNN has two outputs for each object, the class label, and the offset values. To these two, another branch is added which outputs the object mask. The first part is the region proposal network (RPN). In the second stage, an ROI aligned network outputs multiple bounding boxes and warps them into a specific dimension. Softmax is used for classification, and boxes are predicted. The warped features are simultaneously fed into a mask classifier consisting of two CNNs which outputs a binary mask for each ROI. The mask classifier allows the network to generate masks independently for each class. Mask R-CNN only adds a small overhead to faster R-CNN while providing instance segmentation which makes it better for object detection. However, since it is based on faster R-CNN, it is still not suitable for detecting fast moving objects.
A Real-Time Traffic Jam Detection and Notification System …
467
4 Methodology This section describes the functionalities of the different components of the proposed system. OpenCV is used to read the video frames for processing. First, a video capture object is created. Frames are then read and resized to a more appropriate size. The frames are then processed using the MobileNet-SSD algorithm [17, 19, 24]. If objects with a confidence score above a given threshold are detected, the object is considered as a vehicle, and the count for vehicles is incremented. The vehicles that are accepted are cars, buses, and trucks. All other detected objects are ignored. The time spent by a vehicle in the video is also recorded. If more than a given number of vehicles exceed a given amount of time at a specified location, then a traffic jam is deemed to have occurred, and notifications are dispatched. These threshold values can be adjusted as required.
4.1 Object Tracker All detected vehicles are tracked. The tracker is updated in each frame, and bounding box coordinates are used to derive the centroid of each vehicle. If a vehicle is seen for the first time, it is registered, i.e., it is assigned a new unique id and tracking starts. The vehicles are classified in two different classes: cars and heavy vehicles. Buses and trucks are considered as heavy vehicles. If a vehicle is no longer seen for a certain number of consecutive frames, it is deregistered.
4.2 Traffic Jam Detection The time that a vehicle is in a video is recorded. This duration is used for traffic jam detection. The number of vehicles that are present in the frames for more than a certain amount of time is also determined. A traffic jam is detected when more than ten vehicles remain in the frames for more than a certain amount of time. In this case, we are using a threshold value of 300 s. These threshold values can be adjusted depending on which road we are using the system, as the requirements would be different for different locations.
4.3 User Notification When a traffic jam is detected, the system notifies users via emails, short message service (SMS), phone calls, and WhatsApp. To implement this functionality, a Twilio account must be created [25]. Users who want to receive notifications must provide
468
S. Seegolam and S. Pudaruth
their personal phone number to the system to receive SMS and calls. To receive messages from WhatsApp, the user needs to join a Twilio Sandbox. Moreover, the system can initiate calls to users, and play voice messages to inform them about traffic jams.
5 Implementation The system is implemented using an Intel® Core™ i5-1035G1 CPU @1.00 GHz with 8 GB memory and a 10th Gen Intel® Processor graphics card. The programming language used is Python and the main libraries used to develop the system are OpenCV, Imutils, and NumPy. The library date time is used to calculate the frame rate.
5.1 Object Detection OpenCV is used to read the video frames for processing. First, a video capture object is created using cv2.VideoCapture with the video path specified. Frames are then read using cap.read(). The frame is then resized using Imutils.resize(). The MobileNetSSD network is then loaded using OpenCV’s deep neural network module. The path to the prototxt file which defines the model’s architecture as well as the Caffe model file’s path which contains the weight of the actual layers are initialized. The module takes the paths as input. The frames are converted to the blob format by making use of cv2.dnn.blobFromImage(). The blobs are fed to the detector which returns a NumPy array with confidence scores, the indices, and box coordinates. Only, blobs with a confidence score equal to 0.9 or higher are processed. The others are eliminated. A relatively high threshold value is used to maintain a good balance between recall and precision. The indices are extracted to categorize the objects. If it is a car, it is then appended to the car list else it is appended to the heavy vehicles list.
5.2 Traffic Jam Detection and Notification Dictionaries are used to keep track of the amount of time a vehicle has been in the camera view. If the amount of time a vehicle has been in the camera field exceeds 300 s, it is appended to a time exceeded list. This value was chosen as traffic lights stay on red for an average of 120 s; hence, a value greater than that was chosen to properly confirm that a traffic jam has occurred. The value can be adjusted depending on which road the system is being used.
A Real-Time Traffic Jam Detection and Notification System … Table 1 Counting vehicles (video 1)
Type of vehicle
Manual
469 MobileNet-SSD
Accuracy
Cars
71
95
74.7
Heavy vehicles
20
8
40.0
Total
91
103
88.3
A list is then constructed by removing all object IDs that have been deregistered from the time exceeded list. By doing so, we ensure that previous vehicles that have been added to the time exceeded list are not taken into consideration once they leave the video sequence. The condition on which a traffic jam is detected is when more than ten vehicles which are currently in the frames have remained there for more than 300 s. This value of 10 vehicles can be altered too depending on the place where the system is being used. The value largely depends on the width of the highway where the system has been deployed and also on the average number of vehicles that transit there per minute. The simple mail transfer protocol (SMTP) is used to send emails in Python. A Twilio account must also be created using a valid phone number. An account ID and an authentication token are generated. A client object is also created. The Twilio account is then used to send SMS, WhatsApp messages, and initiate calls [25]. The Twilio Sandbox is used to allow devices to receive WhatsApp messages.
6 Experiments and Results 6.1 Scenario 1 A two minutes and thirty seconds video was shot at the Caudan junction in Port Louis during the day. The number of vehicles was manually counted, and the results were compared with the values from our proposed system. As shown in Table 1, the manual count was 91, while our system gave a score of 103 vehicles. For this scenario, our system had an accuracy of 88.3% and an fps of 15. Figure 1 shows the vehicles which have been detected and are being tracked.
6.2 Scenario 2 Another four minutes video was shot from the same location. As shown in Table 2, the manual count was 141, while our system gave a score of 148 vehicles. From this scenario, the system had an accuracy of 94.3% and an fps of 13.
470
S. Seegolam and S. Pudaruth
Fig. 1 Tracked vehicles
Table 2 Counting vehicles (video 2)
Type of vehicle
Manual
MobileNet-SSD
Cars
114
134
85.1
27
14
51.9
141
148
95.3
Heavy vehicles Total
Accuracy
6.3 Scenario 3 A third video of 2 min and 30 s duration was shot from the same location. As shown in Table 3, the manual count was 103 vehicles, while our system gave a score of 107 vehicles. For this scenario, our system had an accuracy of 96.2% and an fps of 13. The mean accuracy of the system from all the three scenarios is 93.3%. The mean accuracy for cars is 83.7%, while the mean accuracy for heavy vehicles is 56.6%. This is probably so because there is more variation in the sizes and shapes of heavy vehicles than in cars, and thus, it is more difficult to recognize them. The mean fps is 14. Some of the errors in the counting also occur due to the tracker losing the vehicle and detecting it again as a new vehicle. Table 3 Counting vehicles (video 3)
Type of vehicle Cars Heavy vehicles Total
Manual 85
MobileNet-SSD
Accuracy
93
91.4
18
14
77.8
103
107
96.3
A Real-Time Traffic Jam Detection and Notification System …
471
6.4 Traffic Jam Detection and Notification When a vehicle is in the video for more than 300 s, it is added to the traffic jam list, and when this list exceeds a predefined number of vehicles, a traffic jam is deemed to have occurred. For testing purposes, ten vehicles are required to declare a traffic jam. As shown in Figs. 2 and 3, users are notified when a traffic jam occurs. This is done via phone calls, emails, SMS, and WhatsApp messages. All notifications are sent immediately as soon as congestion occurs. Table 4 shows the mean time delay to receive each type of notification. These values may vary depending on network conditions. Fig. 2 WhatsApp message
472
S. Seegolam and S. Pudaruth
Fig. 3 SMS received at 22:24
Table 4 Notification delays
Notification
Mean time delay (s)
Email
11.5
Phone call
10.3
SMS
8.3
WhatsApp
4.0
6.5 Comparison with Existing Works Our proposed system has an average accuracy of 93.3% when tested on the three videos. Our system also informs drivers of traffic conditions via different channels in real time. Shamrat et al. [26] have developed a traffic detection system for Bangladesh
A Real-Time Traffic Jam Detection and Notification System …
473
and achieved an accuracy of 69%. Chowdhury et al. [27] have developed a system for counting the number of vehicles at road junctions. The system had an accuracy of 95%. Qi et al. [28] have proposed a vehicle detection system using a deep learning model based on SSD. The system had a precision of 76% for cars and a precision of 71% for heavy vehicles. The recall for cars was 70 and 66% for heavy vehicles. However, a direct comparison with existing works is not feasible because the road conditions are different everywhere.
7 Conclusion A rise in the number of cars in Mauritius and the amount of construction work going on in the country creates traffic jams daily on the highways, especially at peak hours. Thus, in this work, we have proposed a traffic detection system which counts and categorizes vehicles into two classes with an accuracy of 93.2%. It was also found that it is easier to recognize cars than heavy vehicles from traffic videos. Furthermore, the system also informs users of traffic jams that have occurred in real time. Notifications are sent via four different channels at the same time. All notifications, including calls, are received within 15 s. This system can be combined with existing traffic monitoring systems to provide more useful information to all road users. The system is currently a localized one, but it can be adapted for other locations too. Processing cannot be centralized for such systems as this will entail delays in the transfer of information from one location to another. However, a distributed system may be used for sending the notifications.
References 1. S Jain S Jain G Jain 2017 Traffic congestion modelling based on origin and destination Proc Eng 187 442 450 2. A Mondschein B Taylor 2017 Is traffic congestion overrated? Examining the highly variable effects of congestion on travel and accessibility J Transp Geogr 64 65 76 https://doi.org/10. 1016/j.jtrangeo.2017.08.007 3. Road Development Authority: Annual Report Financial Year 2019–2020. http://rda.govmu. org/English/Publications/Documents/ANNUAL_REPORT%202019-2020.pdf. Last accessed 2021/12/08 4. L Wei D Hong-ying 2016 Real-time road congestion detection based on image texture analysis Proc Eng 137 196 201 5. Gholve MH, Chougule S (2013) Traffic congestion detection for highways using wireless sensors. Int J Electron Commun Eng 693:259–265 6. Roopashree V, Nikitha Bai E, Shashikala Malavika DN, Suman A (2020) Traffic congestion detection and alerting ambulance using IoT. Int J Eng Res Tech 9(7):1339–1343 7. Somayajula RA (2018) Real time traffic congestion detection using images. MSc Thesis, Iowa State University, Iowa, USA
474
S. Seegolam and S. Pudaruth
8. Lam C, Gao H, Ng B (2017) A real-time traffic congestion detection system using on-line images. In: 17th international conference on communication technology, pp 1548–1552, IEEE, Chengdu, China 9. Nidhal A, Ngah U, Ismail W (2014) Real time traffic congestion detection system. In: 5th international conference on intelligent and advanced systems, pp 1–5, IEEE, Kuala Lampur, Malaysia 10. Zhang H, Liptrott M, Bessis N, Cheng J (2019) Real-time traffic analysis using deep learning techniques and UAV based video. In: 16th international conference on advanced video and signal based surveillance, pp 1–5, IEEE, Taipei, Taiwan 11. Khan M, Ectors W, Bellemans T, Ruichek Y, Yasar A, Janssens D, Wets G (2018) Unmanned aerial vehicle-based traffic analysis: a case study to analyze traffic streams at urban roundabouts. Proc Comput Sci 130:636–643 12. MT Zulfikar 2019 Suharjito: detection traffic congestion based on Twitter data using machine learning Proced Comput Sci 157 118 124 13. Robert K (2009) Video-based traffic monitoring at day and night vehicle features detection tracking. In: 12th international conference on intelligent transportation systems, pp 1–6, IEEE, St Louis, MO, USA 14. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Conference on computer vision and pattern recognition, pp 779–788, IEEE, Las Vegas, NV, USA 15. Redmon J, Farhadi A (2017) YOLO9000: Better, faster, stronger. In: Conference on computer vision and pattern recognition, pp 6517–6525, IEEE, Honolulu, HI, USA 16. Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement. https://arxiv.org/pdf/ 1804.02767.pdf 17. Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: optimal speed and accuracy of object detection. https://arxiv.org/pdf/2004.10934.pdf 18. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) SSD: single shot MultiBox detector. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision. ECCV 2016. Lecture Notes in Computer Science, pp 21–37, Springer. https://doi.org/10.1007/978-3319-46448-0_2 19. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) MobileNets: efficient convolutional nNeural networks for mobile vision applications. https://arxiv.org/pdf/1704.04861.pdf 20. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Conference on computer vision and pattern recognition, pp 580–587, IEEE, Columbus, OH, USA 21. Girshick R (2015) Fast R-CNN. In: International conference on computer Vision, pp 1440– 1448, IEEE, Santiago, Chile. https://doi.org/10.1109/ICCV.2015.169 22. Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: 28th international conference on neural information processing systems, pp 91–99, ACM, Montreal, Canada 23. He K, Gkioxari G, Dollar P, Girshick R (2018) Mask R-CNN. https://arxiv.org/pdf/1703.068 70v3.pdf 24. Chuanqi: MobileNet-SSD. https://github.com/chuanqi305/MobileNet-SSD. Last accessed 2021/12/08 25. Twilio: https://www.twilio.com/. Last accessed 2021/12/08. Accessed 10 July 2021 26. F Shamrat I Mahmud A Rahman A Majumder Z Tasnim N Nobel 2020 A smart automated system model for vehicles detection to maintain traffic by image processing Int J Sci Technol Res 9 2 2921 2928
A Real-Time Traffic Jam Detection and Notification System …
475
27. Chowdhury M, Biplob M, Uddin J (2018) Real time traffic density measurement using computer vision and dynamic traffic control. In: Joint 7th international conference on informatics, electronics and vision (ICIEV) and 2nd international conference on imaging, vision and pattern recognition, pp 353–356, IEEE, Kitakyushu, Japan 28. Qi B, Zhao W, Zhang H, Jin Z, Wang X, Runge T (2019) Automated traffic volume analytics at road intersections using computer vision techniques. In: 5th international conference on transportation information and safety, pp 161–169, IEEE, Liverpool, UK
Design of a Robotic Flexible Actuator Based on Layer Jamming Kristian Kowalski and Emanuele Lindo Secco
Abstract This research paper provides an insight into one of the most promising fields of robotics, which brings together two main elements: the traditional or rigid robotics and the soft robotics. A branch of soft–rigid robots can perform and modulate soft and rigid configurations by means of an approach called jamming. Here we explore how to use layer jamming, namely a set of layers within a flexible membrane, in order to design soft robotics. The paper introduces a quick overview of the history of soft robotics, then it presents the design of a functional prototype of soft–rigid robotic arm with the results of preliminary trials and discussion of future advances where we show the capability of the system in order to lift up possible loads. Keywords Layer jamming · Robotic actuators · Flexible actuators
1 Introduction In the recent years, we can observe the dawn of new sub-branch of soft robotics. This new field tries to combine the precision of the traditional, rigid robotics with the flexibility of soft robotics into the same robotic device. An intriguing and potentially expanding technique to this aim is jamming. The most advanced research which has been done in this area—i.e., on the possibility of changing the state of robots—looked at different designs such as granular jamming or layer jamming. Robert D. Howe—from the Harvard John A. Paulson School of Engineering and Applied Sciences—said that such technology will eventually lead to soft–rigid robots which will combine the benefits of soft and rigid robotics [1, 2]. K. Kowalski · E. L. Secco (B) Robotics Lab, School of Mathematics, Computer Science and Engineering, Liverpool Hope University, Liverpool, UK e-mail: [email protected] K. Kowalski e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. Saraswat et al. (eds.), Congress on Intelligent Systems, Lecture Notes on Data Engineering and Communications Technologies 114, https://doi.org/10.1007/978-981-16-9416-5_34
477
478
K. Kowalski and E. L. Secco
Nowadays, the automatization of enterprises is increasing year by year and robots are replacing the traditional human workforce. Most of these machines—who are replacing the workers—are stiff, and they are designed to work on a particular task within a protected environment or work cell without human beings around. Other companies are trying to minimize the impacts of robotization versus people employability, by introducing collaborative robot or cobots, which are designed to physically interact with humans in the workplace [3]: This has a clear and strong implication on the safety of the employers as well. In recent years, some researches pointed out that soft robots could be introduced into factories where they can be working alongside humans [4]. However, the soft robots have a big drawback, which is stopping them from being widely used in industries as cobots are. This is due to a fact that they lack precision versus traditional robotics. This is the place where new technology is coming to aid. The idea behind the soft–rigid robots is quite simple and relies on those devices which combine the benefits of the two branches of robotics, namely the standard robotics and the soft robotics, in order to get a versatile machine. They can assume a soft-bodied state, which allows them to comply with the safety constraints of the environment, to a rigid state in which they can hold a specific potion and orientation with higher accuracy and precision [5]. In this context, the main purpose of this work is to design and manufacture a prototype which will allow to test the capabilities of soft–rigid robots and, in particular, to check the advantages of using the granular or the layer jamming approach [6]. In this paper, in particular, we take a closer look at layer jamming and we manufacture a functional prototype of a soft–rigid robotic arm. This will allow to test its capabilities and to foresee possible areas of applications for a human-friendly robotics approach. In order to achieve this objective, the following steps are needed: a research on the field of soft robotics and on the granular and layer jamming techniques, the design of a prototype of the robotic arm, the manufacturing and integration of the arm structure, and a preliminary validation with experiments. Accordingly, the paper is organized as it follows: Sect. 2 presents a brief overview of soft robotics, Sect. 3 focuses on the case study and on examples of soft–rigid robotics. Section 4 refers to the materials and methods, where detail of the design is reported. Then Sect. 5 reports the results with a description of experiments and results. Finally, on Sect. 6 conclusions and future directions are reported.
2 Soft Robotics Modern classification of robotics divides robots according to the compliance of the materials which are used to build their structure. Therefore, we can define two main types of robotics, the soft and the hard—or traditional—robots.
Design of a Robotic Flexible Actuator Based on Layer Jamming
479
Looking at the history of robotics, we can observe that the first robots were mainly made of rigid and kinematically non-redundant materials such as copper, magnet, or steel [7]. We can call this type of design as the ‘traditional’ approach. Most of the time these robots are used in well-defined environments where they safely perform pre-programmed repetitive tasks with a high precision of their movement and pose. In fact, these machines are designed to be inherently stiff in order to preserve (and avoid decreasing of) their accuracy of movements which could be affected by the multiple vibrations of their structure. Thanks to their performance, the hard robots are highly exploited in manufacturing. On the other hand, the rigid design introduces several issues especially within undefined or changing work cells such as the manufacturing lines where human workers may be involved. Here bio-inspired soft devices—which are designed in a way that the tip of the robotic arm can achieve any point in three-dimensional space—can be used in order to generate little resistance vs the compression forces [8]. In other words, soft robotics can take inspiration from the biological system and, for example, defines continuous and deformable structures which are similar to physiological structures, such as the octopus’ arms or the elephants’ trunks. The properties of these solutions are very different with respect to the mechanical characteristics of the traditional chains of rigid links and sliding or rotational joints [9, 10]. Soft mechanism is widely used in medical field to assists in surgeries, as their shape-shifting abilities helps to navigate inside the human body. They also can be used in rehabilitation, as for example, the soft exoskeleton suit (exosuit) developed by the Wyss Institute at Harvard University, which can assist human walking [11]. Moreover, while traditional industry robots are mostly isolated from the workers due to the safety concerns, soft-bodied robots would minimize the risk of injuries in case of collision between the human and the machine. Thanks to their compliant nature, soft robotics are a wonderful opportunity to develop human-friendly manufacturing robots.
3 Granular Jamming In soft–rigid robotics, two design techniques can be used: granular jamming and layer jamming. In this section, both these approaches are discussed on particular case studies. Jamming is a physical process in which an enlarged number of particles are used to increase the viscosity of some mesoscopic materials [12, 13]. In simple words, the elements of the structure act like a fluid or semifluid in normal condition but— when an external condition is applied—they lock into a solid-like state. Such an external condition can be, for example, the removal of the air from the container of the particles (Fig. 1).
480
K. Kowalski and E. L. Secco
Fig. 1 Rational of the granular jamming approach
3.1 Granular Jamming The case study of Fig. 1 shows a physical process where the granular jamming is performed—for example—in a vacuum-packed coffee bag: when an external negative pressure is applied, the coffee granulates get locked in a solid-like state. According to this approach, stiffness-controllable octopus-like robot arm can be designed for applications such as robotics minimally invasive surgery (MIS). Moreover, these designs can be inspired by the biological system, such as, for example, the behaviors of octopus arms which can naturally alter their body from soft to rigid configuration [13, 14].
3.2 Layer Jamming Layer jamming is another strategy on most recent research which is performed by using the jamming process in order to develop soft robotics. In 2018, researchers from the Wyss Institute for Biologically Inspired Engineering at Harvard University and the Harvard John A. Paulson School of Engineering and Applied Sciences (SEAS) build a simple device made of multiple stacks of layers which are wrapped into a closed plastic envelope and connected to a vacuum pump. Similar to what happens in the granular jamming, when the vacuum is applied in such structure, then the envelope becomes rigid and it can be shaped into different forms (Fig. 2). On the other hand, when the negative pressure is removed, then the arm recovers its property such as it can bend and twist with a similar behavior observed on the octopus’ tentacles. In some applications, the device has shock absorbent capabilities, and therefore, it can serve as a landing assistance system for drones [15].
Design of a Robotic Flexible Actuator Based on Layer Jamming
481
Fig. 2 Rational of the layer jamming approach
These examples or case studies show that soft–rigid robots are a good alternative to soft robots. Most of these soft–rigid robots may be used in medical applications and in other contexts where precision and flexibility are simultaneously required.
4 Materials and Methods The main objective of the proposed design is to provide a prototype of a soft–rigid robotic arm, which is able to change its configuration from a soft state to a rigid state. The device should also enhance precision at the cost of flexibility. In order to design the device, some decisions had to be made during the process. This section focuses on the main aspects of these decisions and presents the functional diagram and components which have been adopted.
4.1 Design A.
Vacuum controllable stiffness
The design has to ensure that the machine will be able to change its states. The prototype is then designed with a layer jamming approach as it was discussed in the previous section. To get s proper functioning of the structure, several layers of a compliant material are used: Long and thin paper sheets are enclosed within an airtight envelope made of a transparent and flexible film. In this way, the arm will have the flexibility of a soft robot. On the other hand, thanks to the increased friction between the paper elements of the layer jamming—when the vacuum is applied—it will be possible to stiffen the structure. In order to test and monitor the system behavior, clear and transparent plastic film has been chosen as the external material of the envelope, as it will help to observe the layers during the testing phase. Such a material also ensures the airtightness of the structure as it is not permeable to the air.
482
K. Kowalski and E. L. Secco
Fig. 3 Overall design of the system: an electrical vacuum pump (1) feeds the connecting pipe (2) into the (3) laminar-jamming robotic actuator
To build the overall system, it will be necessary to use a vacuum pump. A double staged vacuum pump will provide the congenial vacuum pressure. A plastic film sealer will be used to seal the sides of the aforementioned envelope as well as to seal any possible gaps that could cause any leakage. A4 paper sheets will serve as layers of the compliant material. To provide a sufficient stiffness of the structure when vacuum is applied, at least 10 stripes of paper will be embedded within the envelope. As other studies in layer jamming have shown, the higher is the number of layers, the higher is the achievable stiffness. Precisely, the stiffness increases when more layers are added to the structure by a factor of the amount to the power of two. It is also important to mention that the structure will sustain this greater rigidness only for small loads [16]. Figure 3 shows the functional diagram of the proposed design, where a vacuum pump, connected through a pipe, is feeding an envelope with the layers. B.
Pros and Cons
The proposed design of Fig. 3 has inherently some advantages and disadvantages. The main advantages are: • The structure can be switched between a soft and a rigid state. • When in soft state, the structure can be shaped according to the operator’s will. • The system is simple to be implemented, thanks to the low number of components which are needed for the design. • The used material is unexpansive. • The device has a lightweight. At the same time, some of the drawbacks are: • The proposed design needs to be manually operated by switching a vacuum pump on and off. This is one of the major weaknesses as the robot arm is hard to operate and switch between states flawlessly.
Design of a Robotic Flexible Actuator Based on Layer Jamming
483
• The design does not integrate—at this stage—any form of input device which would allow to modulate the pressure and—as a consequence—the values of the stiffness of the arm. Therefore, it is possible to switch between a maximum rigidness state and a maximum flexibility state without any condition in between. • The system requires to be connected through the pipe to the vacuum pump at all times, which limits the movement of the operator. C.
Integration
One of the possible improvements versus the design of Fig. 3 is to add a controllable valve and then make the structure more flexible and easier to operate. In order to be able to control the valve from, for example, a personal computer, the system would require a sensor which measures the effective value of the vacuum pressure. A data acquisition card (DAQ) would be responsible of collecting the sensor readings and of sending the parameters to the computer in a digitalized form. Then the operator, by using a customized software interface, would be able to control the valve accordingly. Such an architecture has not been developed in this project; however, it is worth to mention as it could be organized for future development of the proposed design. An overview of the system is shown in Fig. 6 (right panel, scenario B). An electronic vacuum or pressure regulator is a device to control the pressure in the system, whose value is proportional to an electric input signal of the regulator which is provided by, for example, a personal computer. For example, the ITV pressure regulators by SMC which have been explored in this project are lightweight and small size devices. Their monitor output is available by either analog or switch output, and additionally, they have a very good response time and they deliver high stability. National Instruments DAQ™mx is another device explored in this project: It is a data acquisition card. The process of data acquisition consists of sampling the signals that measure the physical conditions and converting the results into numeric values which allow the computer interpreting the incoming signals. The NI-DAQ™mx model is easy to use and has improved performance compared to other traditional NI-DAQ drivers. This device acts like an interface between the vacuum regulator and the computer. It digitalizes the received analog signals and codes or digitalizes them for the computer interface. After the digitalized signals are sent to the computer, the user could control the operation of the DAQ device. The overall system would also allow to process, store, and visualize all the data. The application software would play a role of a communicator between the user and the computer, allowing to acquire, analyze, and present measurement data: Laboratory Virtual Instrument Engineering Workbench (LabView)—developed by National Instruments—is a system design and development environment for visual programming language, which is commonly used for instrument control and data acquisition. A brief overview of the system is reported in Fig. 6 (right panel, scenario B). The programming language used in LabView is called G, and it is high-level graphical programming language. It allows to program with the benefit of visual
484
K. Kowalski and E. L. Secco
expressions, and it is designed to develop applications which are interactive, multicore and can be executed in parallel. The data input and results can be manipulated and displayed directly in the graphical user interface (GUI) window.
4.2 Implementation This section shows step by step how the proposed solution has been manufactured and implemented. In order to manufacture the basic construction, it is important to gather all the necessary elements. The components used in the structure are the following ones: • • • • •
VPUMP® model VPB-1D 2CFM double stage vacuum pump PFS plastic film sealer Flexible connection pipe FF2440 clear film APAC Packaging Limited 1 mm thick sheets of paper.
To construct the airtight envelope, the clear plastic film has been cut into a rectangle with the dimensions of 9 cm × 19 cm. Next, the film has been folded in half and, by using the plastic film sealer, two sides have been sealed in the form of envelope with a gap on one of the 4.5 cm wide side for the paper layers. The paper sheets are used to cut 10 rectangular layers of 3 cm by 15 cm. Following this step, the sheets are placed in the envelope and sealed by leaving only some room on one side for the connection of pipe. Figures 4 and 5 show the main steps of this process.
Fig. 4 Manufacturing of the actuator flexible pocket
Design of a Robotic Flexible Actuator Based on Layer Jamming
485
Fig. 5 Preparation of the layers: (1) 3 × 15 cm paper strips, (2) layers housing, (3) layers and flexible pocket, (4) sealed jacket connected to the vacuum pump
Fig. 6 On the left and right panels, the setup a and b, respectively: a vacuum pump directly connected to the actuator and b vacuum pump connected to the pressure regulator—which is controlled through the National Instruments DAQ card—and then to the actuator
486
K. Kowalski and E. L. Secco
Finally, the elements are connected with the pipe and the vacuum pump according to Fig. 5. The following picture (Fig. 6) shows the overall system and the improved version of the design with the pressure regulator and the DAQ card. In order to prepare such a system, additional equipment is needed, namely: • DAQ™mx 16.0 driver • ITV0090-2BN SMC electronic vacuum regulator. To communicate with the DAQ device a DAQ™mx 16.0 software has also to run on the personal computer.
5 Results and Validation Testing is one of the most important steps in the development process. It allows to check the capabilities of the device and to prove the utility of the vacuum-controlled soft–rigid robots. In the presented design, the layer jamming construction is supposed to be able to change the state from soft to rigid. Before the other experiments were conducted, the integration was tested several times to ensure that the materials used to assemble the envelope were not leaking any air, making then the arm unable to stiffen when the vacuum would have been applied. After these trials gave a positive result, different capabilities of the structure were tested. Before the pressure is applied, the arm displays a low bending stiffness (Fig. 7, left panel). However, when the vacuum pump is switched on and the air is being sucked from the envelope, then the friction between the sheets of paper increases: This phenomenon makes the arm rigid as expected (Fig. 7, central panel). To additionally ensure that the system is working properly, simple tests were conducted in order to check the bending properties. In its base state, the arm is prone to deformation (Fig. 8, left panel) and bends without forces applied from outside. However, when the vacuum is applied the structure becomes rigid thanks to increased friction between the elements. This allows to form the arm in the preferable way and later stiffen it in this position (Fig. 8, right panel).
Fig. 7 On the left, central, and right panels, respectively: the baseline or rest configuration, the actuator under the vacuum pressure, and a freezing’ postural configuration of the actuator
Design of a Robotic Flexible Actuator Based on Layer Jamming
487
Fig. 8 On the left and right panels, respectively: imposition of the shape and preservation of the shape as soon as the vacuum pressure is applied
Moreover, the bending properties of the structure were additionally examined using a small calibration weight applied in the middle of the arm which was constrained on two blocks on its extremities. Thanks to the increased friction between the layers—after the vacuum is switched on—the structure is able to withhold the weight and shows lower bending tendencies (Fig. 9). The biggest difference between the vacuum on and off configurations is that while in the first scenario the layers bended independently, after the pressure has been applied the elements of the structure flexed all together as a cohesive unit. Finally, it was noted, during the bending experiments, that when the force applied to the structure exceeds its critical point, then the layers lost their cohesiveness. The value of the critical point can change depending on the type of material and on the number of layers. Each one of the experiment which is reported in the previous section has been performed 5 times to ensure that all the components were working properly, and all the trials were given the same results. Tests showed that the robotic arm is working as it was expected, and there is no need for main changes of its structure. Fig. 9 Preliminary trial of the actuator while holding a weight
488
K. Kowalski and E. L. Secco
The structure can withstand relatively higher external force. It is easy to modify the arm into different shapes and to freeze such shapes in a particular configuration which is desired by the user. This property makes the device effortless to adjust versus different tasks. The robot arm is made of light materials: Even with strong vacuum pressure, it cannot carry too heavy pieces of equipment. This fact may have some implications on the limited applications of the robot. Nevertheless, with a proper number of layers the structure can be modulated in order to withstand bigger weights. Moreover, the bending test showed that the structure itself could be used as a quite useful gripper. Improved version of the design—such as the proposed Scenario B in Fig. 6—would be very helpful in this field, as electronic vacuum regulator would allow for rigidness regulation and an overall better control over the arm.
6 Discussion and Conclusion This paper presents the design of a flexible layer jammed actuator for soft robotics, in the attempt of connecting the advantages of traditional robotics and soft robotics. The proposed design is simple and can be furtherly improved by adding the possibility to modulate the stiffness of the device. Using LabView software, a DAQ driver, and a pressure regulator, it may be possible to achieve a more reliable system which will allow the real-time control of the vacuum pressure applied to the arm. This design clearly shows the potential laying behind the soft–rigid robotics, and, in particular, the layer jamming method. The system is able to bend when it is configurated into its ‘floppy’ state, and then it can freeze and preserve the desired configuration when it is set in the ‘rigid’ state. The soft–rigid robots can become a great tool in different areas thanks to their precision and flexibility, with clear benefits toward the safety of humans in the working environment. Acknowledgements This work was presented in dissertation form in fulfillment of the requirements for the BSSH in Computer Science for the student Kristian Kowalski under the supervision of E. L. Secco from the Robotics Lab, School of Mathematics, Computer Science and Engineering, Liverpool Hope University.
References 1. Burrows L (2018) Robot transitions from soft to rigid. Harvard JA Paulson School of Engineering and Applied Sciences 2. Secco EL, Maereg AT (2019) A two-fingers exoskeleton for robot-assisted minimally invasive surgery. In: 8th EAI international conference on wireless mobile communication and healthcare (MobiHealth 2019), 2019
Design of a Robotic Flexible Actuator Based on Layer Jamming
489
3. Colgate EJ, Wannasuphoprasit W, Peshkin M (1996) Proceedings of the international mechanical engineering congress and exhibition, Atlanta, pp 433–439 4. Secco EL, Maereg AT (2019) A wearable exoskeleton for hand kinesthetic feedback in virtual reality. In: 8th EAI international conference on wireless mobile communication and healthcare (MobiHealth), 2019 5. Secco EL, McHugh D, Reid D, Nagar AK (2019) Development of an intuitive EMG interface for multi-dexterous robotic hand. In: 8th EAI international conference on wireless mobile communication and healthcare (MobiHealth), 2019 6. Chiramal AJJ, Secco EL (2021) Design of a granular jamming universal gripper. In: Intelligent systems conference (IntelliSys), Advances in intelligent systems and computing, Springer 7. Kim S, Laschi C, Trimmer B (2013) Soft robotics: a bioinspired evolution in robotics 8. Trivedi D, Rahn CD, Kier WM (2008) Walker ID—soft robotics: biological inspiration, state of the art, and future research. Appl Bionics Biomech 5(3):99–117 9. Lida F (2011) Laschi C—soft robotics: challenges and perspectives. Procedia Comp Sci 7:99– 102 10. Baxendale M, Nibouche M, Secco EL, Pipe AG, Pearson M (2019) Feed-forward selection of cerebellar models for calibration of robot sound source localization. In: The living machines conference, Nara, July 2019 11. Lee S, Karavas N, Quinlivan BT et al (2018) Autonomous multi-joint soft exosuit for assistance with walking overground 12. Brioli G (2007) Jamming: a new kind of phase transition? Nat Phys 13. Jiang A, Secco EL, Wurdemann HA, Nanayakkara T et al (2013) Stiffness-controllable octopuslike robot arm for minimally invasive surgery. In: 3rd joint workshop on new technologies for computer/robot assisted surgery (CRAS 2013), Verona, Italy 14. Jiang A, Xynogalas G, Dasgupta P, Althoefer K, Nanayakkara T (2012) Design of a variable stiffness flexible manipulator with composite granular jamming and membrane coupling. In: IEEE/RSJ international conference on intelligent robots and systems, Vilamoura, 2922–2927 15. Narang YS, Degirmenci A, Vlassak JJ, Howe RD (2018) Transforming the dynamic response of robotic structures and systems through laminar jamming, Harvard University 16. Kawamura S, Kawamura T, Yamamoto D et al (2002) Development of passive elements with variable mechanical impedance for wearable robotics. In: 2002 IEEE international conference on robotics and automation (ICRA), 248–253
Sentiment Analysis on Diabetes Diagnosis Health Care Using Machine Learning Technique P. Nagaraj, P. Deepalakshmi, V. Muneeswaran, and K. Muthamil Sudar
Abstract Sentiment analysis is a natural language processing technique that extricated data from the text to identify the positive and negative polarity of information. This work aims at analyzing sentiments in healthcare information on diabetes. This automatic analysis assists in better understanding of the patient’s health condition. Machine learning-based sentiment analysis is proposed in this work which uses SVM classifier for classifying the sentiments based on the medical opinion for diagnosis. The probability for getting diabetes is estimated using Gaussian distribution based on the health condition of patients. Experimental evaluation shows that the SVM classifier achieves high performance with greater accuracy. Keywords Natural language processing · Sentiment analysis · Polarity · Probability · Gaussian distribution
1 Introduction Sentiment analysis is the way toward recognizing and arranging assessments from an entity and classifies emotions like positive, negative, and neutral. The entity can be events, topics, or individuals. Sentiment analysis is applied to the voice of client materials like reviews and survey responses, online social media, and healthcare P. Nagaraj (B) · P. Deepalakshmi · K. Muthamil Sudar Department of Computer Science and Engineering, Kalasalingam Academy of Research and Education, Krishnankoil, Virudhunagar, India e-mail: [email protected] P. Deepalakshmi e-mail: [email protected] K. Muthamil Sudar e-mail: [email protected] V. Muneeswaran Department of Electronics and Communication Engineering, Kalasalingam Academy of Research and Education, Krishnankoil, Virudhunagar, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. Saraswat et al. (eds.), Congress on Intelligent Systems, Lecture Notes on Data Engineering and Communications Technologies 114, https://doi.org/10.1007/978-981-16-9416-5_35
491
492
P. Nagaraj et al.
materials. The objective of sentiment analysis is to capture opinions and detect the sentiment opinions express and finally classify the polarity of the opinions [1]. In [2], sentiment analysis is representing an ample of patients concerning their medications, disease, and medical issues. SentiWordNet lexicon-based sentiment analysis is used to calculate the divergence of aspect in diabetes tweets. Traditionally, the researchers used surveys and questions for the purpose of sentiment analysis which consumes more time and money. Analysis of data gives various applications and services in EEG and EOG [3, 4], medical image processing [5–16], image compression techniques [17–19], healthcare systems [20–22], Big data [23], IoT [24, 25], data mining [26], cyber-attack analysis [27–32], security threats [33–36], artificial intelligence, and deep learning [37–41]. Sentiment analysis uses two techniques, namely machine learning approach and lexicon-based approach. SentiStrength, Emoticons, Linguistic Inquiry, Word Count, SentiWordNet, and SenticNet are the tools used for lexicondriven sentiment analysis [42]. Extreme entropy, stochastic gradient descent, random forest, SailAil sentiment analyzer, multilayer perceptron, Naïve Bayes, and support vector machine are the tools used for machine learning-based sentiment analysis [43]. In health care, sentiment analysis is the process of classifying medical opinions into binary categories, namely positive or negative. The content from the text is extracted using various natural language processing techniques [44, 45].
2 Related Work In this section, the traditional methodologies and approaches followed to analyze the sentiments have been discussed. Medhat et al. [1] analyzed the algorithms used in sentiment analysis and surveyed the fields where sentiment analysis is applied. Sentiment classification techniques are of two types, namely machine learning-based approach and lexicon-based approach. Machine learning approach is composed of supervised and unsupervised learning. Supervised learning includes decision tree classifiers, rule-based classifiers, linear classifiers, and probabilistic classifiers. Lexicon-based approach composed of dictionary-based and corpus-based approach. Statistical and semantic are two types of corpus-based approach. Naïve Bayes and support vector machine (SVM) well suits for sentiment classification problems. Georgiou et al. [46] examined the tools that suit for healthcare data. In this paper, there are a number of tools which is examined for the importance of health care. There are commercial and non-commercial tools such as Semantria, TheySay, WEKA, and Google Prediction API. In this, each tool will give different response such as positive, negative, or neutral. Also, single-sentence reactions will be tried in a disconnection process for the extent of finding a single extremity of the data generated for health care. Ahmad et al. [43] insisted that the data was generated by the user in the basis of micro-blogging Web sites and social media platforms. Taking a very large amount of data and analyzing would be a difficult task, so an intelligent and effective system
Sentiment Analysis on Diabetes Diagnosis …
493
for the polarity of the textual data is fixed. This data created by the user will make efficient techniques by using multiple tools. The approach is used by combining the lexicon-based machine learning techniques for automatic sentiment analysis. Salas-Zárate et al. [47] projected a characteristic-based sentiment analysis which considers the words using the N-gram technique. This approach used ontology to detect aspects related to diabetes tweets. Gabarron et al. [48] analyzed the sentiments with diabetes tweets for type 2. It found that type 2 got negative sentiment, and type 1 had got positive sentiment. The tweets were analyzed for automatic privacy.
3 Proposed Methodology This section illustrates machine learning-based sentiment analysis methodology to analyze the sentiments from medical opinions. Besides, machine learning classifiers can classify the opinions into binary classification, namely positive and negative sentiments. The progression of the proposed approach is outlined in Fig. 1. Fig. 1 Block diagram of proposed methodology
494
P. Nagaraj et al.
Table 1 Dataset description S. no.
Dataset
No. of instance
No. of class
Positive
Negative
1
PIDD
768
2
500
268
3.1 Dataset Pima Indians Diabetes dataset (PIDD) is collected from the National Institute of Diabetes and Digestive and Kidney diseases to analyze the sentiments using machine learning. The size of the dataset is 23.31 KB. The description of dataset is given in Table 1.
3.2 Feature Extraction The dataset consists of features, namely pregnancies, glucose, blood pressure, skin thickness, insulin, body mass index (BMI), diabetes pedigree function, age, and outcome. The important features are extracted by analyzing the sentiment from the diabetes dataset are BMI, glucose, and blood pressure.
3.3 Sentiment Analysis Using the three parameters, namely BMI, blood pressure, and blood glucose level, the sentiment is analyzed for the medical opinion as positive and negative sentiments. BMI is the measure of body fat, which is the ratio of the person’s weight which is restrained in kilograms to the height which is measured in meters. As per the report [49], BMI range between 18.5 and 24.9 is treated as the normal weight; those below 18.5 are treated as underweight, BMI range between 25 and 29.9 is treated as overweight, and BMI range above 30 is treated as obesity. If the glucose level is less than 140 mg/dL (normal value), then there is no chance for diabetes. Otherwise, if the glucose level is greater than 140 mg/dL, then there is a chance for diabetes. If the blood pressure level is less than 120/80 mmHg (normal value), then there is no chance for diabetes. Otherwise, if the blood pressure level is greater than 120 mmHg, then there is a chance for diabetes. If BMI is identified as “underweight” and “normal weight” and both blood glucose and the blood pressure level is normal (i.e., 140 mg/dL and 120/80 mmHg), then there is no chance of diabetes and hence considered as neutral. If BMI is identified as “overweight” and “obesity” and both blood glucose and blood pressure level is above normal (i.e., 140 mg/dL and 120/80 mmHg), then there is a more chance of diabetes and hence considered as negative. If BMI is “Underweight” and “Normal weight,” blood pressure and blood glucose level is above normal, then there is a chance for
Sentiment Analysis on Diabetes Diagnosis …
495
diabetes. If BMI is “Overweight” and “Obesity,” blood pressure and blood glucose level is normal, then there is a chance for diabetes. Both the opinions are considered as positive. The algorithm for analyzing the sentiment of medical opinions present in diabetes dataset is explained as follows: Input: BMI, Blood Pressure, Blood Glucose Output: Sentiment; 0—Positive; 0.5—Neutral; 1—Negative Step 1: Estimation of the Body Mass Index (BMI): if BMI ≤ 18.5 then set range as underweight if BMI = 18.5 to 24.9 then set range as Normal weight if BMI = 25 to 29.9 then set range as Overweight if BMI ≥ 30 then set range as Obesity Step 2: Estimation of the Blood Glucose level: If glucose level < 140 mg/dL then there is no chance for diabetes if glucose level ≥ 140 mg/dL then there is a chance for diabetes Step 3: Estimation of Blood Pressure level: if Blood Pressure level < 120/80 mmHg then there is no chance for diabetes if Blood Pressure level > 120/80 mmHg then there is a chance for diabetes Step 4: Sentiment Analysis: if BMI is “Underweight” and “Normal weight”, Blood Pressure & Blood Glucose level is normal then there is no chance for diabetes return “neutral” if BMI is “Overweight” and “Obesity”, Blood Pressure & Blood Glucose level is above normal then there is a chance for diabetes return “negative” if BMI is “Underweight” and “Normal weight”, Blood Pressure & Blood Glucose level is above normal then there is a chance for diabetes return “positive” if BMI is “Overweight” and “Obesity”, Blood Pressure & Blood Glucose level is normal then there is a chance for diabetes return “positive” Step 5: End
496
P. Nagaraj et al.
4 Classification Model After extracting the features and analyzing the sentiments from the dataset, it is provided to the SVM classifier for categorizing opinions as positive, negative, and neutral. The effective performance of the SVM classifier is appraised using evaluation metrics such as precision and accuracy.
4.1 SVM Classifier SVM classifier is the process of determining linear separators which can separate the distinct classes in a better manner. SVM suits well for binomial classification. SVM can be used as sentiment polarity classifier. An approach for supervised machine learning the classification and decay functions could be handled by SVM. SVM is predicated on the discovery of a hyperplane that first divides a dataset into two classes. The hyperplane clearly separates the two things. A dataset’s key elements are the data points outside the hyperplanes. The line that geometrically separates and classifies a group of data is called a hyperplane. Two attributes are used to classify the dataset in simple examples. The hyperplanes are the lines that accurately differentiate items and classify them into groups. When the distance between the hyperplane and the data point is large, the hyperplane divides the data appropriately. The margin is the distance between the hyperplane and the closest data point from either set. The main goal is to find hyperplanes among the scattered data points with the smallest possible margin and at any point in the training set, allowing new data to be classified optimally. When the datasets are picked freely and are not distributed in order, the hyperplanes are chosen with perfect case. The following example shows a dataset that is not linearly split. Hyperplanes are utilized when the dimensions of the data are large, and the hyperplane separates and classifies the data points. SVM uses the hyperplane to classify data and takes advantage of the difference between two separate classes. The support vectors are the vectors (cases) that define the hyperplane, as seen in Fig. 2. Algorithm: 1. 2. 3.
The hyperplanes are ideal for classifying high-dimensional data in the following ways: optimize the profit margin The hyperplanes are unique to nonlinearity separate databases; otherwise, the data points would be misclassified. When high-dimensional data is properly mapped in free space, linear decision surfaces are easier to classify.
Sentiment Analysis on Diabetes Diagnosis …
497
Fig. 2 Model of SVM
5 Experimental Study 5.1 Performance Evaluation The efficiency of the classifier is evaluated based on the following evaluation metrics: Accuracy is the estimation of the sum of correctly classified instances to the sum of all classifications and is given in Eq. (1), Accuracy =
TP + TN TP + TN + FP + FN
(1)
Precision is the proportion of applicable occurrences among the reclaimed instances and is given in Eq. (2), Precision =
TP TP + TN
(2)
In Eqs. (1) and (2), TP, TN, FP, and FN indicate true positive, true negative, false positive, and false negative. True positive denotes the total of accurate classifications out of the positive instances. True negative denotes the total of accurate classifications out of the negative instances. False positive denotes the total number of misclassifications of positive instances as a negative instance. False negative denotes the total of misclassification of negative instances as a positive instance.
498 Table 2 Performance evaluation of classifiers in terms of precision and accuracy
P. Nagaraj et al. Classifiers
Precision
Accuracy (%)
Naïve Bayes
0.76
79.22
Logistic regression
0.80
81.81
SVM
0.81
82.5
5.2 Experimental Results In this section, the experiment is evaluated on the dataset using classifiers, namely Naïve Bayes, logistic regression, and SVM. Out of which, SVM achieves high precision and accuracy of about 0.81 and 82.5%. The experimental result is given in Table 2.
5.3 Polarity of Outcome Polarity of the outcome indicates the chance for the patient to be affected by diabetes or not. It is estimated by calculating the probability using the Gaussian distribution. This distribution is a continuous function whose probability range is from 0 to 1. The value 0.5 indicates an equiprobable situation, where BMI, blood pressure, and blood glucose level are normal. This indicates the sentiment to be neutral; the patient is in good health. The value above 0.5 to 1 indicates high blood pressure and blood glucose level. This indicates the sentiment to be negative; the patient has more chance to be affected by diabetes. The value below 0 to 0.5 indicates two choices. • Low blood pressure and blood glucose level along with BMI as “overweight,” and “obesity” • High blood pressure and blood glucose level along with BMI as “underweight,” and “normal weight.” This indicates the sentiment to be positive; the patient has a chance to be affected by diabetes. The probability distribution function of BP, glucose, and BMI for diabetes is illustrated in Fig. 3.
6 Conclusion The main intention of this work is to analyze the sentiments in healthcare information to identify the health status of patients. This work uses SVM classifier for classifying the sentiments with binomial polarity, namely positive and negative sentiments. Gaussian distribution is used to estimate the probability of diagnosis of health condition of the patient. SVM achieves high performance with accuracy of about 96.3%.
Sentiment Analysis on Diabetes Diagnosis …
499
Fig. 3 Probability distribution function of BP, glucose, and BMI for diabetes
As a future research direction, the sentiment analysis methodology can be extended to large datasets. Hybrid machine learning classifiers can be used for the sentiment analysis.
References 1. Medhat W, Hassan A, Korashy H (2014) Sentiment analysis algorithms and applications: a survey. Ain Shams Eng J 5(4):1093–1113 2. Abualigah L, Alfar HE, Shehab M, Hussein AMA (2020) Sentiment analysis in healthcare: a brief review. In: Recent advances in NLP: the case of Arabic language, pp 129–141 3. Junwei L, Ramkumar S, Emayavaramban G, Thilagaraj M, Muneeswaran V, Rajasekaran MP, Hussein AF (2018) Brain computer interface for neurodegenerative person using electroencephalogram. IEEE Access 7:2439–2452 4. Jialu G, Ramkumar S, Emayavaramban G, Thilagaraj M, Muneeswaran V, Rajasekaran MP, Hussein AF (2018) Offline analysis for designing electrooculogram based human computer interface control for paralyzed patients. IEEE Access 6:79151–79161 5. Muneeswaran V, Rajasekaran MP (2017, March) Beltrami-regularized denoising filter based on tree seed optimization algorithm: an ultrasound image application. In: International conference on information and communication technology for intelligent systems. Springer, Cham, pp 449–457 6. Muneeswaran V, Rajasekaran MP (2019) Local contrast regularized contrast limited adaptive histogram equalization using tree seed algorithm—an aid for mammogram images enhancement. In: Smart intelligent computing and applications. Springer, Singapore, pp 693–701 7. Muneeswaran V, Rajasekaran MP (2019) Automatic segmentation of gallbladder using bioinspired algorithm based on a spider web construction model. J Supercomput 75(6):3158–3183 8. Muneeswaran V, Rajasekaran MP (2016, December) Analysis of particle swarm optimization based 2D FIR filter for reduction of additive and multiplicative noise in images. In: International conference on theoretical computer science and discrete mathematics. Springer, Cham, pp 165–174 9. Muneeswaran V, Rajasekaran MP (2018) Gallbladder shape estimation using tree-seed optimization tuned radial basis function network for assessment of acute cholecystitis. In: Intelligent engineering informatics. Springer, Singapore, pp 229–239 10. Nagaraj P, Muneeswaran V, Reddy LV, Upendra P, Reddy MVV (2020, May) Programmed multi-classification of brain tumor images using deep neural network. In: 2020 4th International conference on intelligent computing and control systems (ICICCS). IEEE, pp 865–870
500
P. Nagaraj et al.
11. Kanagaraj H, Muneeswaran V (2020, March) Image compression using HAAR discrete wavelet transform. In: 2020 5th International conference on devices, circuits and systems (ICDCS). IEEE, pp 271–274 12. Muneeswaran V, Rajasekaran MP (2019) Automatic segmentation of gallbladder using intuitionistic fuzzy based active contour model. In: Microelectronics, electromagnetics and telecommunications. Springer, Singapore, pp 651–658 13. Perumal B, Kalaiyarasi M, Deny J, Muneeswaran V (2021) Forestry land cover segmentation of SAR image using unsupervised ILKFCM. Materials Today: Proc 14. Muneeswaran V, Nagaraj P, Godwin S, Vasundhara M, Kalyan G (2021, May) Codification of dental codes for the cogent recognition of an individual. In: 2021 5th International conference on intelligent computing and control systems (ICICCS). IEEE, pp 1387–1390 15. Perumal B, Deny J, Devi S, Muneeswaran V (2021, May) Region based skull eviction techniques: an experimental review. In: 2021 5th International conference on intelligent computing and control systems (ICICCS). IEEE, pp 629–634 16. Varma CG, Nagaraj P, Muneeswaran V, Mokshagni M, Jaswanth M (2021, May) Astute segmentation and classification of leucocytes in blood microscopic smear images using titivated K-means clustering and robust SVM techniques. In: 2021 5th International conference on intelligent computing and control systems (ICICCS). IEEE, pp 818–824 17. Li L, Muneeswaran V, Ramkumar S, Emayavaramban G, Gonzalez GR (2019) Metaheuristic FIR filter with game theory based compression technique-a reliable medical image compression technique for online applications. Pattern Recogn Lett 125:7–12 18. Nagaraj P, Muneeswaran V, Kumar AS (2020, May) Competent ultra data compression by enhanced features excerption using deep learning techniques. In: 2020 4th International conference on intelligent computing and control systems (ICICCS). IEEE, pp 1061–1066 19. Nagaraj P, Rajasekaran MP, Muneeswaran V, Sudar KM, Gokul K (2020, August) VLSI implementation of image compression using TSA optimized discrete wavelet transform techniques. In: 2020 Third international conference on smart systems and inventive technology (ICSSIT). IEEE, pp 667–670 20. Muneeswaran V, Nagaraj P, Dhannushree U, Lakshmi SI, Aishwarya R, Sunethra B (2021) A framework for data analytics-based healthcare systems. In: Innovative data communication technologies and application. Springer, Singapore, pp 83–96 21. Nagaraj P, Deepalakshmi P (2020) A framework for e-healthcare management service using recommender system. Electronic Govt, Int J 16(1–2):84–100 22. Vamsi AM, Deepalakshmi P, Nagaraj P, Awasthi A, Raj A (2020) IOT based autonomous inventory management for warehouses. In: EAI international conference on big data innovation for sustainable cognitive computing. Springer, Cham, pp 371–376 23. Muneeswaran V, Bensujitha B, Sujin B, Nagaraj P (2020) A compendious study on security challenges in big data and approaches of feature selection. Int J Control Autom 13(3):23–31 24. Nagaraj P, Muneeswaran V, Rajasekaran MP, Sudar KM, Sumithra M (2021) Implementation of automatic soil moisture dearth test and data exertion using Internet of Things. In: Emerging technologies in data mining and information security. Springer, Singapore, pp 511–517 25. Nagaraj P, Vamsi AM, Deepalakshmi P, Awasthi A (2021) Movable barcode scanning system using IoT smart glass technology. Int J Intell Enterpr 1(1):1. https://doi.org/10.1504/ijie.2021. 10038607 26. Pa N, Mb A, Mb A, Kb B, Ab D, Cb R et al (2020) Analysis of data mining techniques in diagnalising heart disease. Intell Syst Comput Technol 37:257 27. Muneeswaran V, Nagaraj MP, Rajasekaran MP, Chaithanya NS, Babajan S, Reddy SU (2021, July) Indigenous health tracking analyzer using IoT. In: 2021 6th International conference on communication and electronics systems (ICCES). IEEE, pp 530–533 28. Ramakala R, Thayammal S, Ramprakash A, Muneeswaran V (2017, December) Impact of ICT and IOT strategies for water sustainability: a case study in Rajapalayam-India. In: 2017 IEEE international conference on computational intelligence and computing research (ICCIC). IEEE, pp 1–4
Sentiment Analysis on Diabetes Diagnosis …
501
29. Perumal B, Muneeswaran V, Pothirasan N, Reddy KRM, Pranith KSS, Chaitanya K, Kumar RK (2021, July) Bee eloper: a novel perspective for emancipating honey bees from its comb using a contrivable technique. AIP Conf Proc 2378(1):020003. AIP Publishing LLC 30. Nagaraj P, Muneeswaran V, Sudar KM, Ali RS, Someshwara AL, Kumar TS (2021, May). Internet of Things based smart hospital saline monitoring system. In: 2021 5th International conference on computer, communication and signal processing (ICCCSP). IEEE, pp 53–58 31. Muneeswaran V, Rajasekaran MP (2016, March) Performance evaluation of radial basis function networks based on tree seed algorithm. In: 2016 International conference on circuit, power and computing technologies (ICCPCT). IEEE, pp 1–4 32. Sudar KM, Deepalakshmi P, Nagaraj P, Muneeswaran V (2020, November) Analysis of cyberattacks and its detection mechanisms. In: 2020 Fifth international conference on research in computational intelligence and communication networks (ICRCICN). IEEE, pp 12–16 33. Sudar KM, Deepalakshmi P, Ponmozhi K, Nagaraj P (2019, December) Analysis of security threats and countermeasures for various biometric techniques. In: 2019 IEEE international conference on clean energy and energy efficient electronics circuit for sustainable development (INCCES). IEEE, pp 1–6 34. Sudar KM, Beulah M, Deepalakshmi P, Nagaraj P, Chinnasamy P (2021, January) Detection of distributed denial of service attacks in SDN using machine learning techniques. In: 2021 International conference on computer communication and informatics (ICCCI). IEEE, pp 1–5 35. Sudar KM, Nagaraj P, Deepalakshmi P, Chinnasamy P (2021, January) Analysis of intruder detection in big data analytics. In: 2021 International conference on computer communication and informatics (ICCCI). IEEE, pp 1–5 36. Sudar, K. M., Lokesh, D. L., Chowdary, Y. C., & Chinnasamy, P. (2021, January). Gas Level Detection and Automatic Booking Notification Using IOT. In 2021 International Conference on Computer Communication and Informatics (ICCCI) (pp. 1–4). IEEE. 37. Nagaraj P, Deepalakshmi P, Romany FM (2021) Artificial flora algorithm-based feature selection with gradient boosted tree model for diabetes classification. DMSO 14:2789 38. Vb SK (2020) Perceptual image super resolution using deep learning and super resolution convolution neural networks (SRCNN). Intell Syst Comput Technol 37:3 39. Nagaraj P, Muneeswaran V, Ali RS, Kumar TS, Someshwara AL, Pranav J (2020, September) Flexible bolus insulin intelligent recommender system for diabetes mellitus using mutated kalman filtering techniques. In: Congress on intelligent systems. Springer, Singapore, pp 565– 574 40. Sharan ES, Kumar KS, Madhuri G (2021, July) Conceal face mask recognition using convolutional neural networks. In: 2021 6th International conference on communication and electronics systems (ICCES). IEEE, pp 1787–1793 41. Muneeswaran V, Nagaraj MP, Rajasekaran MP, Chaithanya NS, Babajan S, Reddy SU (2021, July). Indigenous health tracking analyzer using IoT. In: 2021 6th International conference on communication and electronics systems (ICCES). IEEE, pp 530–533 42. Ahmad M, Aftab S, Muhammad SS, Waheed U (2017) Tools and techniques for lexicon driven sentiment analysis: a review. Int J Multidiscip Sci Eng 8(1):17–23 43. Ahmad M, Aftab S, Muhammad SS, Ahmad S (2017) Machine learning techniques for sentiment analysis: a review. Int J Multidiscip Sci Eng 8(3):27 44. Benamara F, Chardon B, Mathieu Y, Popescu V (2011) Towards context-based subjectivity analysis. In: Proceedings of 5th international joint conference on natural language processing, Chiang Mai, Thailand, 08–13 Nov 2011. Asian Federation of Natural Language Processing (AFNLP), pp 1180–1188 45. Bruce RF, Weibe JM (2000) Recognizing subjectivity: a case study in manual tagging. Nat Lang Eng 5(2, 6):187–205. Available from: http://journals.cambridge.org/action/displayAbstr act?fromPage=online&aid=48503 46. Georgiou D, MacFarlane A, Russell-Rose T (2015, July) Extracting sentiment from healthcare survey data: an evaluation of sentiment analysis tools. In: 2015 Science and information conference (SAI). IEEE, pp 352–361
502
P. Nagaraj et al.
47. Salas-Zárate MDP, Medina-Moreira J, Lagos-Ortiz K, Luna-Aveiga H, Rodriguez-Garcia MA, Valencia-Garcia R (2017) Sentiment analysis on tweets about diabetes: an aspect-level approach. Comput Math Methods Med 48. Gabarron E, Dorronzoro E, Rivera-Romero O, Wynn R (2019) Diabetes on Twitter: a sentiment analysis. J Diabetes Sci Technol 13(3):439–444 49. https://www.cdc.gov/healthyweight/assessing/bmi/adult_bmi
Predicting the Health of the System Based on the Sounds Manisha Pai and Annapurna P. Patil
Abstract A fundamental challenge in artificial intelligence is to predict the system’s state by detecting anomalies generated due to the faults in the systems. Sound data that deviates significantly from the default sounds generated by the system is referred to as anomalous sounds. Predicting anomalous sounds has gained importance in various applications as it helps in maintaining and monitoring machine conditions. The goal of anomaly detection involves training the system to distinguish default sounds from abnormal sounds. As self-supervised learning helps in improvising representations when labeled data are used, it is employed where only the normal sounds are collected and used. The largest interval on the feature space defines the support vector machine, which is a linear classifier. We propose a self-supervised support vector machine (SVM) to develop a health prediction model that helps understand the current status of the machinery activities and their maintenance, enhancing the system’s health accuracy and efficiency. This work uses a subset of MIMII and ToyADMOS datasets. The implemented system would be tested for the performance measure by obtaining the training accuracy, validation accuracy, testing accuracy, and overall mean accuracy. The proposed work would benefit from faster prediction and better accuracy. Keywords Self-supervised learning · Support vector machine · Anomalous sounds · Machine conditioning and monitoring · The health of the system
1 Introduction The automatic anomalous sound detection (ASD) system detects the unusual sounds produced by the systems. These systems are used to monitor the machine’s condition to detect any unusual sounds. Anomalies of many kinds do exist, but they are uncommon. An extensive and time-consuming data collection method is required to extract a variety of abnormalities from a system [1]. It is critical to understand the M. Pai (B) · A. P. Patil M. S. Ramaiah Institute of Technology, Bengaluru, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. Saraswat et al. (eds.), Congress on Intelligent Systems, Lecture Notes on Data Engineering and Communications Technologies 114, https://doi.org/10.1007/978-981-16-9416-5_36
503
504
M. Pai and A. P. Patil
system’s health for it to function properly. Monitoring a system’s health allows us to receive all the information needed. Predicting the system’s health is critical for improving the current status of any system and its accuracy and efficiency. Following the evaluation of the health status, appropriate actions can be taken depending on the input. The status of the machine may not always be what we expect. Then, we need to keep an eye on things. First, the data under normal conditions are collected, which is referred to as training data. The system is then trained to recognize the strange sounds made by the devices. The audio data, which will be necessary for training, must be collected for this reason. Weak/auxiliary labels are assigned to all audio data, which may be needed to interpret normal sound representations and distinguish them from pathological ones [1]. By experimenting with examples from non-anomalous or default sound events, an excellent ASD system should be able to spot the abnormality [2]. Due to the rarity and diversity of these situations, obtaining anomalous data to train automated systems is difficult. As a result, such systems are necessary [3]. A fault in part is recognized with the help of strange sounds made in predictive maintenance. As a result, the sounds produced to aid in the prevention of harm [4]. For the fourth industrial revolution, automatic mechanical failure detection is a vital technology. It involves AI-based factory automation and the early detection of machine irregularities by listening to their sounds, which could help monitor machine conditions [4]. There are two sorts of ADS activities: supervised-ADS and unsupervised-ADS. The difference between the two groups is the way anomalies are defined. Supervised ADS is a rare sound event detection technique that involves identifying unusual noises like gunshots or cries (SED). Since the anomalies have been discovered, we could compile a list of the target strange noises, even though the aberrations are uncommon than regular noises. As a result, the ADS system can be trained using a supervised method similar to the “Recognition and Classification of Acoustic Scenes and Events challenge” (DCASE), which includes audio scene classification, sound event detection, and audio tagging [4]. On the contrary, unsupervised ADS is tasked with recognizing inexplicable unusual sounds that have never been heard before. In real-world industries, deliberately harming the expensive target machine is impracticable due to manufacturing expenses. Anomaly natural noises are also unusual and vary considerably. As a result, it is impractical to collect a comprehensive collection of aberrant sounds, and there is a need to recognize abnormal sounds for which no training data exists [4]. Because of its wide range of applications, such as audio surveillance, animal husbandry, product inspection, predictive maintenance, and machine condition monitoring, abnormal sound detection has grown in popularity. In this work, a self-supervised support vector machine (SVM) is employed to predict the system’s health on the systems’ sounds. Self-supervision is connected with favorable elements of the data and a system to forecast it in the context of unsupervised learning [5]. The goal is to locate anomalies in audio recordings. In addition, the goal is to identify sounds significantly different from natural sounds from a training set of audio recordings [5]. A supervised approach that can be used to solve classification and regression problems is the support vector machine. First,
Predicting the Health of the System Based on the Sounds
505
we use the SVM algorithm to represent each data item as a point in n-dimensional space, with the value of each feature being the value of a specific coordinate. Then, we locate the hyperplane that separates the two classes to complete categorization. Finally, individual observation coordinates are used to create support vectors. The SVM classifier is a boundary that separates the two classes the most effectively. Humans enable us to experience the outside world through a variety of sensory channels. As the regular business expands, the data tends to grow and gather day by day. Accessing the machine’s health status is also necessary for better advancement. As the demand for computing resources increases, so does the supply. In our daily lives, we use a variety of devices [6]. The use of a washing machine and a refrigerator, for example, can be seen. After a long service period, we notice that the engine begins to make strange noises and eventually stops working. When there are such malfunctions, system maintenance is essential. This problem occurs in commercial devices such as industrial equipment and air conditioning systems in buildings. Instead of relying on humans to execute the jobs, these concerns have recently been handled by employing various sensors to monitor the instruments’ operation and detect anomalies [7]. The objective of this work includes providing a detailed understanding of anomalous detection of sounds. In addition, we have identified the challenges involved in predicting the health of the systems. The work is beneficial for researchers, academic members, and industrial people to monitor machines depending on the sounds they produce. The rest of the work is structured as follows: the current section provides a brief overview of the anomalous sound detection systems, Sect. 2 describes the related works, Sect. 3 describes the types of the dataset used, Sect. 4 describes the proposed work, Sect. 4 describes the evaluation results for different machines, Sect. 5 describes the results obtained from this work, and Sect. 6 concludes the proposed work.
2 Related Works Anomaly detection has gotten a lot of attention throughout the years. Because anomalous sounds can signal malicious behavior or blunders, this prediction can help prevent such events from occurring. Audio anomaly detection (ADS) has been used in audio surveillance, product inspection, and predictive maintenance [8]. A variety of work has been put forward previously that proposes different architectures in detecting the anomalies generated by the system. Alam et al. [9] offer a copy detection algorithm for unsupervised anomalous sound detection (UASD) tasks. Three distinct algorithms are used to show how anomalous scores are generated. The first approach (NN-sigma-norm) aggregates all score differences using 40-dimensional MFCC features after normalization with standard deviation (Melfrequency cepstral coefficients). The second approach (NN-sigma-norm-1) adds up the score variations without normalizing them with standard deviation. Like the first, the third approach (NN-sigma-norm-fbank) employs filter-bank features
506
M. Pai and A. P. Patil
with normalization. Jalali et al. [1] construct an anomaly detection method using a common recurrent neural network (RNN) architecture called sequential long shortterm memory (LSTM). Perez-Castanos et al. [2] attempt to explore malfunctioning conditions of a specific range of industrial machinery through sound analysis. It makes use of unsupervised and semi-supervised auto-encoders and gammatone audio representation. Reference [3] is based on the look, listen and learn (L 3 − net) embedding is proved to perform better in identifying the machine type or machine ID of the recorded sounds. It can also recognize audio-visual communication between a video frame and a one-second audio clip. In [4], an outlier-detection-based ADS considers a statistical hypothesis test to define optimality as an objective function based on the Neyman– Pearson lemma. In [4], the author uses the linear combination and spectral wrapping augmentation methods. For categorization, it employs two distinct architectures: MobileNetV2 and ResNet-50. The proposed models outperform the baseline autoencoder-based strategy by a large margin, according to the author. When compared to the auto-encoder model, MobileNetV2 improves the average AUC by over 12.8%. In [10], spectrograms are employed for distinguishing between normal and abnormal sounds instead of log-mel energies as characteristics. While defining the AE model, the proposed work aims to minimize the reconstruction error (RE) between the input and reconstructed log-Mel spectrogram feature vector. Reference [11] is based on dictionary learning, whose goal is to reduce the reconstruction error and produce a more concise representation of the signal and lower the model’s complexity; the coefficient matrix is made sparse. After the dictionary is constructed, the sparse representation coefficient matrix of the input standard sample in the training set is utilized for training a one-class SVM classifier. In [12] the first approach, a classifier is trained using the typical noises from all machine types and IDs. The second approach uses data augmentation to build pseudo-classes from natural sounds. The classifier then uses the pseudo-classes to forecast the data-augmentation method utilized for each sound sample. In [13], three approaches were employed to determine the degree of similarity between the audio pairings. These three methods are: cosine similarities between embedding vectors, KL divergence between feature distributions, and dynamic time warping (DTW) between feature sequences. Kaltampanidis et al. [14] used protopnet for detecting unsupervised anomalous sounds. To measure the anomaly scores for the anomalous sounds generated by the system, [15] uses masked autoregressive flows (MAFs). Giri et al. [5] define two methods. One that makes use of the group masked auto-encoder model that is based on a neural density estimator model, and the next one uses selfsupervised classification to derive data representations. In [16], for detecting anomalous sounds, the local outlier factor (LOF) and Gaussian mixture model (GMM) are proposed. Ribeiro et al. [17] used two deep learning models using Mel-spectrograms and a dense and convolutional architecture. The goal is to detect whether a machine is malfunctioning using only the computer’s sound.
Predicting the Health of the System Based on the Sounds
507
3 Dataset For properly training and fairly evaluating any system, a large-scale dataset is required. As a result, the availability of freely available large-scale datasets speeds up related research in this field. The Wall Street Journal (WSJ0) voice corpus for automatic voice recognition and the VCTK corpus for text-to-speech synthesis are examples of large-scale datasets that have led to recent speech and acoustic signal processing [18]. Industrial Internet of Things (IoT) and data-driven methodologies have revolutionized the manufacturing industry over the last decade, and several systems of monitoring the state of machinery have been implemented [19]. As a result, rather than gathering anomalous sounds by indiscriminately breaking expensive hardware, an ADS system was assessed using a synthetic anomalous sound dataset. A freely available dataset is required to appropriately evaluate solutions for anomaly detection in machine-operating sound (ADMOS) [18]. Also, Purohit et al. [19] collect different types of machines sounds from a real factory environment. Two datasets have been used in this paper. ToyADMOS [18] and MIMII [19]. Both have been discussed below.
3.1 ToyADMOS (Toy Anomalous Detection in Machine-Operating Sounds) The ToyADMOS dataset was developed to train and test ADMOS (anomaly detection in machine-operating system) systems. This dataset was created by intentionally destroying the components of micro-machines to collect both normal and anomalous functioning sounds. The recording circumstances can be regulated because small equipment can be installed in an acoustic laboratory. Nevertheless, some of the characteristics of the dataset can be seen. It is made for three ADMOS tasks: product inspection (ToyCar), fixed machine fault diagnosis (ToyConveyor), and moving machine fault diagnosis (ToyTrain). Individually recorded machine-operating sounds and external noise are used to simulate varying noise levels. To test noise reduction and data-augmentation techniques such as mix-up, all sounds are recorded with four microphones. Multiple machines from the same class are employed in each activity; each machine belongs to the same toy class but has a different detailed structure. Each anomalous sound was captured multiple times to evaluate a few-shot learning-based ADMOS for getting anomalous sound characteristics from only a few samples. Over 180 hours of regular machine-operating sounds and over 4000 samples of anomalous noises were captured with four microphones at a sampling rate of 48 kHz for each task in the released dataset [18].
508
M. Pai and A. P. Patil
3.2 Malfunctioning Industrial Machine Investigation and Inspection (MIMII) MIMII dataset consists of machine sounds collected from real factory environments under normal and anomalous operating situations. Four machine-type sounds were considered. Those are valves, pumps, fans, and slide rails. Seven potential product models were evaluated for each type of machine. This dataset contains 26,092 sound files for four different machine types in normal settings. It also includes real-life anomalous sound samples for each machine category. The noises produced by the devices are stationary and non-stationary, have various properties, and have varying degrees of difficulty. To thoroughly train the models, large datasets incorporating reallife complexity are required. Therefore, a total of 26,092 regular sound segments for all individual devices were recorded [19]. The dataset is used in this work which is the combination of the ToyADMOS [18] and MIMII [19]. First, a pickled form of a dataset is built for the audio dataset. Pickling helps build a Python object into a byte stream to store it in a file or database. Then, the dataset is separated into testing, training, and validation sets. The data is stored in two-dimensional labeled data structure. The pickled dataset is stored in a worksheet for easy access. Features for each dataset are extracted, and also y-data from each sheet in the worksheet is extracted. Then, the number of features extracted is measured, and names are extracted accordingly. Finally, the features extracted are merged to combine the features and y-data. Now, split the data within each dataset and finally combine. Now to standardize the features, measure the size of each dataset’s training, testing, and validation data.
4 Proposed Work This work focuses on combining the advantages of self-supervised learning and the SVM design to predict the system’s health. The work first explains the basics of support vector machine (SVM) and self-supervised learning, combining the advantages of the methods mentioned above.
4.1 SVM Algorithm The support vector machine, or SVM, is a widely used supervised learning methodology that may address both classification and regression problems. It is, however, primarily used in machine learning for classification problems. The goal of the SVM method is to determine the best line or decision boundary for categorizing n-dimensional space into classes, so that subsequent data points can be easily placed in the correct category. The optimal choice boundary is referred to as a hyperplane.
Predicting the Health of the System Based on the Sounds
509
SVM is used to select the extreme points/vectors that help build the hyperplane. The algorithm is known as a support vector machine, and support vectors are extreme examples [6].
4.2 Self-supervised Learning Self-supervision creates effective representations for downstream tasks without the usage of labels. Self-supervision can help robustness in various ways, including against adversarial examples, label corruption, and common input corruptions. Selfsupervision also beats fully supervised algorithms in detecting out-of-distribution outliers on problematic, near-distribution outliers [20].
4.3 Self-supervised Support Vector Machine (S3 VM) Support vector machines (SVMs) are a two-class model. The purpose is to find a hyperplane that will allow the samples to be segmented. The goal of segmentation is to maximize the interval before converting it to a convex quadratic programming problem. For association analysis, support vector machines map vectors to a highdimensional space. On both sides of the hyperplane that separates the data, two parallel hyperplanes are generated. To increase the distance between two parallel hyperplanes, create hyperplanes that are split apart. Assume that the greater the distance or gap between the parallel hyperplanes, the lower the classifier’s total error and the higher the prediction accuracy [6]. SVMs are a form of supervised machine learning technique that is used to solve binary classification problems. An SVM uses both classes’ input data to create a decision boundary that optimally separates them. This boundary, referred to as the hyperplane, maximizes the margins from both classes. That is, the boundary is as far away from both classes as practicable. The classifier becomes more robust because of this. Kernels are also commonly employed with SVMs for nonlinear and difficult datasets. They effectively transfer the input data into a higher-dimensional space before determining a decision boundary [21]. This research suggests using self-supervised categorization based on metadata associated with audio files to develop compact representations of “normal” data. Only normal audio data are considered part of self-supervised learning because it is easier to record normal sounds than collect anomalous sounds. Now we explain how the features are selected from the dataset. First, the data is prepared for testing, training, and validation. Feature selection is done using the L1 logistic regression. A machine learning classification approach called logistic regression is used to predict the likelihood of a categorical-dependent variable. It is used as it handles the dense input. Next, fit and transform functions are used. The values of these features are calculated using the fit () method. The transform function normalizes the value by applying the feature values to the real data. Next,
510
M. Pai and A. P. Patil
principal component analysis (PCA) is done. PCA, also known as the data reduction technique, is a beneficial feature selection technique that uses linear algebra to compress the information. PCA is a statistical methodology for converting highdimensional data to low-dimensional data by identifying the essential characteristics that capture the most information about the dataset. The features are chosen based on how much volatility they introduce into the result. The first principal component is the feature that causes the most volatility. The second principal component is the feature responsible for the second most significant variation, and so on. It is vital to note that considerable components have no relationship with one another. Dimensions are reduced as a result of transforming the PCA. In this work, SVM considers the pickled form of the dataset as the input. In the first phase, a class is defined to load the dataset and build the graph. Next, we describe the SVM model that results from training, testing, and validation data as the input. The testing and validation dataset results are obtained by converting a tensor-like object to a constant tensor. And the training data is obtained by building a computational graph. Along with this, to consider the advantage of self-supervised learning, only default machinery sounds are considered and thus give high accuracy for anomalous machine sounds as done in [22]. As the first step, an SVM model is defined in this work.
4.3.1
Proposed SVM Model
The SVM model first defines a distance between the normal data and the anomalous data, which is defined as the sum of the square of the training data. Dist = Sum Train_Data2
(1)
where Dist is the distance between normal data and the anomalous data, and Train_Data is the training data of the dataset. Next, squared distance is calculated as, first, transpose of the training data is taken. Then, matrix multiplication of the training data and the transpose of the training data is performed. The result is followed by an expansion of the result with two, which is subtracted with distance, and finally, the addition of this result and transpose of the matrix is performed. Sq_Dist = Add Sub Dist, Mul 2, MatMul Train_Data, (Train_Data)T , (Dist)T (2) where Sq_Dist is the squared distance, Train_Data is the training data of the dataset, Add is the addition, Sub is the subtraction, Mul is the multiplication, MatMul is the matrix multiplication, and T is the transpose. The SVM kernel must be defined, which takes data as the input and transforms it into the required form. Here, we define kernel as the absolute function of the
Predicting the Health of the System Based on the Sounds
511
squared_distance multiple with gamma taking the exponential form of the result. Kernel = e(γ ∗abs(Sq_Dist))
(3)
where γ is the gamma function, abs is the absolute function, and Sq_Dist is the squared distance. Using the kernel obtained in Eq. (3), the model output is determined. For this first, we define, b = specified shape(batch_size)
(4)
where batch_size is the elements in the dataset taken at once, we now define the model’s output as the matrix multiplication of b in Eq. (4) and kernel in Eq. (3) and finally taking the transpose of the result. Model_Output = MatMul(b, Kernel)T
(5)
MatMul is the matrix multiplication, and b specifies the shape of the batch size and SVM kernel for transforming into the required form. Now, the loss that occurred at every step needs to be calculated. The minimum value between the elements C and b is calculated, where C is 1, then the minimum value is found in comparison with 0. Now, the result is multiplied by 10000. Directly, on the other hand, the first term value and the second term value are subtracted. The first term value is determined by reducing the sum of b, and the second term is defined by multiplying b_vec_cross and y_target_cross. Then, multiply the result with the kernel and finally reducing the final value, where b_vec_cross is obtained by performing matrix multiplication with b and transpose of b. And y_target_cross is obtained by performing matrix multiplication of training set labels with its transpose. Thus, the loss is finally accepted by negating its final value. Loss = −((firstterm − secondterm ) − 10000 ∗ min((C − b), 0))
(6)
firstterm = red_sum(b)
(7)
secondterm = red_sum(kernel ∗ (b_vec_cross ∗ y_target_cross))
(8)
b_vec_cross = MatMul (b)T ∗ b
(9)
y_target_cross = MatMul Train_Labels ∗ (Train_Labels)T
(10)
512
M. Pai and A. P. Patil
The second step is to predict the output. For this purpose, an Gaussian (RBF) prediction kernel is explained in this work.
4.3.2
Gaussian (RBF) Prediction Kernel
It is challenging to figure out the best feature transform or kernel to perform when dealing with nonlinear datasets. For this purpose, we define the radial basis function (RBF) kernel. Due to its resemblance to the Gaussian distribution, RBF kernels are the most generic form of kernelization and one of the most extensively used kernels. The RBF kernel function computes their similarity or how near they are from any two points. To speed up SVM classifications, Gaussian RBF kernels are approximated. Theoretically, kernel support vector machines (SVMs) can be recast into linear SVMs. SVM classifications can be significantly accelerated using this reformulation, especially if the number of support vectors is considerable. However, because the frequently used Gaussian radial basis function (RBF) kernel’s replicating kernel Hilbert space (RKHS) has infinite dimensionality, this theoretical fact is impracticable [23]. In this paper, we use Gaussian (RBF) prediction kernel to predict the output. The final expected output is obtained by taking the transpose of the expected output. Predicted output is defined as the product of b and transpose of training set labels. We are then performing a matrix multiplication with the predicted kernel. The predicted kernel is an absolute solution of indicated squared distance multiplied by the gamma function. Finally, the exponential form of the result is provided. The predicted squared distance used in this work is determined by matrix multiplying training set and data transposing. This is then multiplied by two, and the result is subtracted from rA, resulting in value added with the transpose of rB. Finally, data is obtained by casting the data. Here, data is known to training, testing, and validation data. rA is obtained by reshaping the reduced sum of the squared training dataset. rB is obtained by reshaping the reduced sum of squared data. And then casting the result, which will be the final rB. T Predictedoutput = Predictedoutput Predictedoutput = MatMul Mul(Train_Labels)T , b , PredictedKernel PredictedKernel = e(γ ∗abs(Predictedsqdist )) Predictedsqdist =
r A − 2 ∗ MatMul TrainData , (Data)T + (r B)T Data = cast(Data)
(11) (12) (13) (14) (15)
Predicting the Health of the System Based on the Sounds
513
r A = reshape red_sum(Train_Data)2
(16)
r B = reshape red_sum(Data)2
(17)
r B = cast(Data)
(18)
Thus, the final output obtained here is the predicted output. Now, the designed model needs to be trained for computations. For this, an optimizer is required to change the attributes to reduce the loss rates. By minimizing the function, optimizers are employed to address optimization problems. In this work, we engage a gradient descent optimizer. It is the most popular optimization strategy used in machine learning algorithms. It is used to train data models, and it can be used with any algorithm. Moreover, it is effortless to learn and implement. Gradient descent is an optimization approach for locating a differentiable function’s local minimum. Gradient descent is a method for determining the parameters (coefficients) of a function that minimizes the cost function as much as possible. This results in obtaining the model’s training accuracy, validation accuracy, full training accuracy, testing accuracy, sensitivity (the capacity of a model to predict true positives and true negatives of each accessible category), specificity (the percent of actual negatives that were expected as the negative or genuine negative), and the number of training steps are all factors to consider. These results are determined from the prediction results.
5 Results In this section, we discuss the proposed model’s performance. We have obtained the training accuracy, validation accuracy, testing accuracy, sensitivity, specificity, and mean accuracy that would benefit faster prediction and better accuracy. We have got performance for every machine with the proposed method. Seven machines are used three from the ToyADMOS dataset [18] and four from the MIMII dataset [19]. The considered machines from the ToyADMOS dataset are ToyCar, ToyConveyor, and ToyTrain, respectively. The machines under the MIMII dataset are fan, pump, slider, and valve, respectively. Five iterations are carried out for this experiment for every machine under each dataset. Accuracy for training, validation, and testing is determined by obtaining the accuracy of train_preds and training set across y label (train_y), where train_pred is obtained from the training set prediction (train_prediction) and training dataset, training data along with x label, a batch of training data, a training set labels, from labels, and this, in turn, is obtained from the GuassRBF function, which holds the training dataset as the parameter. The same is done for testing and validation, as shown in the below equations.
514
M. Pai and A. P. Patil
5.1 Training Accuracy TrainingAccuracy = Accuracy Trainpreds , Train y Trainpreds = (TrainPrediction and TrainDataset , Trainx , Trainbatch , Trainlabels ) TrainPrediction = GuassRBF(TrainDataset )
(19) (20) (21)
5.2 Validation Accuracy ValidationAccuracy = Accuracy Valpreds , Val y
(22)
Valpreds = (ValPrediction and ValDataset , Valx , Valbatch , Vallabels )
(23)
ValPrediction = GuassRBF(ValDataset )
(24)
5.3 Testing Accuracy TestingAccuracy = Accuracy Testpreds , Test y
(25)
Testpreds = (TestPrediction and TestDataset , Testx , Testbatch , Testlabels )
(26)
TestPrediction = GuassRBF(TestDataset )
(27)
The values for ToyCar, ToyConveyor, and ToyTrain are recorded under Table 1. The values for fan, pump, slider, and valve are recorded under Table 2. Every iteration recorded is considered from a different set of recordings under default machinery conditions. The average performance of the ToyADMOS dataset is given in Table 3 and Table 4, the values in bold explain the highest accuracies(%) obtained for training, testing, validation, mean,sensitivity, and specificity. The ToyCar shows the highest value in average training accuracy as 75.41%, ToyConveyor shows the highest value in average validation accuracy as 74.64%, ToyTrain shows the highest value in average testing accuracy as 79.32%, and ToyConveyor shows the highest value in sensitivity as 79.86%, specificity as 78.12%, and mean accuracy as 77.16%, as given in Table 3. While the slider shows the highest value in average training accuracy as 75.83%, fan shows the highest value in average validation accuracy as 75.72%,
Predicting the Health of the System Based on the Sounds
515
Table 1 Training accuracy, validation accuracy, testing accuracy, sensitivity, specificity, and mean accuracy (%) obtained from the proposed model using ToyADMOS dataset Sl. no.
Machine type
1
ToyCar
76.31
75.72
79.37
82.13
75.12
78.62
ToyConveyor
74.34
72.95
76.59
74.88
77.78
76.33
ToyTrain
72.65
69.69
77.58
75.60
71.26
73.43
2
3
4
5
Training accuracy (%)
Validation accuracy (%)
Testing accuracy (%)
Sensitivity (%)
Specificity (%)
Mean accuracy (%)
ToyCar
77.35
74.88
82.34
73.19
82.61
77.90
ToyConveyor
72.54
73.67
80.16
77.05
78.99
78.02
ToyTrain
75.52
73.91
80.75
70.53
81.16
75.85
ToyCar
71.89
70.65
76.79
74.15
75.12
74.64
ToyConveyor
73.33
75.60
78.37
78.74
79.23
78.99
ToyTrain
73.63
74.64
80.75
75.36
81.64
78.50
ToyCar
76.17
75.24
80.36
78.74
78.02
78.38
ToyConveyor
74.45
73.79
78.37
73.91
79.71
76.81
ToyTrain
75.84
71.98
82.74
73.43
76.09
74.76
ToyCar
75.34
73.55
76.59
76.33
77.29
76.81
ToyConveyor
76.17
77.17
82.54
77.29
83.57
80.43
ToyTrain
73.10
71.74
74.80
66.67
82.61
74.64
slider shows the highest value in average testing accuracy as 82.06%, and sensitivity as 80.34%, fan shows the highest value in specificity as 78.74%. Finally, slider shows the highest value in average mean accuracy as 79.01%, as given in Table 4.
6 Conclusion In this paper, a self-supervised support vector machine approach is presented for predicting the health of the systems. It aims at segmenting the samples by finding the hyperplane. An SVM model is defined to determine the accuracy of each dataset. Detection of anomalous sound in this fashion may be helpful for machine condition monitoring. The accuracy for training, testing, and validation dataset shows that some machines provide higher accuracy in detecting anomalous sounds. Also, the average quantitative measurements for each machine type are calculated, which tells how the proposed models work with different kinds of the dataset. This work aids in understanding the present status of machinery activities and maintenance, hence improving the system’s health accuracy and efficiency that benefits in better accuracy and faster prediction. The proposed model can predict the accuracy but fails to determine if the predicted accuracy is correct. As future work, the proposed model
516
M. Pai and A. P. Patil
Table 2 Training accuracy, validation accuracy, testing accuracy, sensitivity, specificity, and mean accuracy (%) obtained from the proposed model using MIMII dataset Sl. no.
Machine type
Training accuracy (%)
1
Fan Pump
2
3
4
5
Validation accuracy (%)
Testing accuracy (%)
Sensitivity (%)
Specificity (%)
Mean accuracy (%)
75.16
75.85
79.56
75.16
75.97
82.54
80.19
78.26
79.23
86.71
72.46
Slider
76.17
75.36
79.59
81.75
78.99
78.74
78.86
Valve
75.37
72.58
73.02
77.78
74.40
76.09
Fan
76.43
77.29
80.36
78.74
81.16
79.95
Pump
77.14
75.48
81.94
75.60
81.64
78.62
Slider
76.17
75.85
78.77
80.68
77.29
78.99
Valve
74.87
72.71
78.97
78.02
74.88
76.45
Fan
74.34
73.31
75.60
76.57
77.05
76.81
Pump
73.60
71.74
73.21
71.74
77.78
74.76
Slider
74.13
74.64
81.75
75.60
80.92
78.26
Valve
76.93
74.15
82.54
76.33
76.57
76.45
Fan
73.57
76.93
78.17
80.92
79.71
80.31
Pump
71.53
71.26
74.80
71.74
78.99
75.36
Slider
76.78
74.40
83.53
81.64
74.88
78.26
Valve
74.99
71.98
77.58
74.15
76.33
75.24
Fan
74.96
75.24
77.18
78.26
77.54
77.90
Pump
75.13
73.55
77.78
78.74
75.60
77.17
Slider
75.90
76.93
84.52
84.78
76.57
80.68
Valve
74.60
73.31
80.36
74.15
78.50
76.33
Table 3 Average training accuracy, validation accuracy, testing accuracy, sensitivity, specificity, and mean accuracy (%) obtained from the proposed model using ToyADMOS dataset Sl. no.
Machine type
Training accuracy (%)
Validation accuracy (%)
Testing accuracy (%)
Sensitivity (%)
Specificity (%)
Mean accuracy (%)
1
ToyCar
75.41
74.01
79.09
76.91
77.63
77.27
2
ToyConveyor
74.17
74.64
79.21
76.37
79.86
78.12
3
ToyTrain
74.15
72.39
79.32
72.32
78.55
75.44
can be further extended with deep learning algorithms in predicting the health of the system. As deep learning algorithm helps in determining if a prediction is accurate or not. This benefit of deep learning would overcome the disadvantage of the proposed support vector machine. Research could be made on wide range of deep learning algorithms to find out the best fitting model for the proposed problem.
Predicting the Health of the System Based on the Sounds
517
Table 4 Average training accuracy, validation accuracy, testing accuracy, sensitivity, specificity, and mean accuracy (%) obtained from the proposed model using MIMII dataset Sl. no.
Machine type
Training accuracy (%)
1
Fan
74.89
2
Pump
74.51
3
Slider
75.83
4
Valve
75.35
Validation accuracy (%)
Testing accuracy (%)
Sensitivity (%)
Specificity (%)
Mean accuracy (%)
75.72
78.17
73.60
78.05
78.94
78.74
78.84
76.91
77.29
75.44
77.10
82.06
80.34
77.68
79.01
72.92
78.49
76.09
76.14
76.11
References 1. Jalali A, Schindler A, Haslhofer B (2020) Dcase challenge 2020: unsupervised anomalous sound detection of machinery with deep autoencoders 2. Perez-Castanos S, Naranjo-Alcazar J, Zuccarello P, Cobos M (2020) Anomalous sound detection using unsupervised and semi-supervised autoencoders and gammatone audio representation. arXiv preprint arXiv:2006.15321 3. Wilkinghoff K (2020) Anomalous sound detection with look, listen and learn embeddings, vol 2. Tech. report in DCASE2020 Challenge Task 4. Giri R, Tenneti SV, Cheng F, Helwani K, Isik U, Krishnaswamy A (2020) Self- supervised classification for detecting anomalous sounds. In: Detection and classification of acoustic scenes and events workshop (DCASE), pp 46–50 5. Giri R, Tenneti SV, Cheng F, Helwani K, Isik U, Krishnaswamy A (2020) Unsupervised anomalous sound detection using self-supervised classification and group masked autoencoder for density estimation, Vol 2. Tech. report in DCASE2020 Challenge Task 6. Jin D, Li C, Wang Q, Chen Y (2020) Research on server health state prediction model based on support vector machine. IOP Conf Ser: Mater Sci Eng 790(1):012029 7. Uematsu H, Koizumi Y, Saito S, Nakagawa A, Harada N (2017) Anomaly detection technique in sound to detect faulty equipment. NTT Tech Rev: NTT Group’s Artif Intell Technol 15(8):7 8. Koizumi Y, Saito S, Uematsu H, Kawachi Y, Harada N (2018) Unsupervised detection of anomalous sound based on deep learning and the Neyman–Pearson lemma. IEEE/ACM Trans Audio, Speech Lang Process 27(1):212–224 9. Alam J, Boulianne G, Gupta V, Fathan A (2020) An ensemble approach to unsupervised anomalous sound detection. Technical report, DCASE2020 Challenge (July 2020) 10. Park J, Yoo S. Dcase 2020 task2: anomalous sound detection using relevant spectral feature and focusing techniques in the unsupervised learning scenario 11. Zhang C, Yao Y, Zhou Y, Fu G, Li S, Tang G, Shao X. Unsupervised detection of anomalous sounds based on dictionary learning and autoencoder 12. Inoue T, Vinayavekhin P, Morikuni S, Wang S, Trong TH, Wood D, Tatsubori M, Tachibana R (2020) Detection of anomalous sounds for machine condition monitoring using classification confidence, vol 2. Tech. report in DCASE2020 Challenge Task 13. Zhao S. Acoustic anomaly detection based on similarity analysis 14. Kaltampanidis Y, Thoidis I, Vrysis L, Dimoulas C. Unsupervised detection of anomalous sounds via protopnet 15. Haunschmid V, Praher P (2020) Anomalous sound detection with masked autoregressive flows and machine type dependent postprocessing. DCASE2020 Challenge, Tech. Rep 16. Morita K, Yano T, Tran KQ. Anomalous sound detection by using local outlier factor and Gaussian mixture model 17. Ribeiro A, Matos LM, Pereira PJ, Nunes EC, Ferreira AL, Cortez P, Pilastri A (2020) Deep dense and convolutional autoencoders for unsupervised anomaly detection in machine condition sounds. arXiv preprint arXiv:2006.10417
518
M. Pai and A. P. Patil
18. Koizumi Y, Saito S, Uematsu H, Harada N, Imoto K (2019) ToyADMOS: a dataset of miniature-machine operating sounds for anomalous sound detection. In: 2019 IEEE workshop on applications of signal processing to audio and acoustics (WASPAA). IEEE, pp 313–317 19. Purohit H, Tanabe R, Ichige K, Endo T, Nikaido Y, Suefusa K, Kawaguchi Y (2019) MIMII dataset: sound dataset for malfunctioning industrial machine investigation and inspection. arXiv preprint arXiv:1909.09347 20. Hendrycks D, Mazeika M, Kadavath S, Song D (2019) Using self- supervised learning can improve model robustness and uncertainty. arXiv preprint arXiv:1906.12340 21. Tanveer M (2020) Classification of anomalous machine sounds using i-vectors. PhD dissertation, Georgia Institute of Technology 22. Xiao Y (2019) Unsupervised detection of anomalous sounds technical report. In: Workshop (DCASE2019), vol 209, p 213 23. Ring M, Eskofier BM (2016) An approximation of the Gaussian RBF kernel for efficient classification with SVMs. Pattern Recogn Lett 84:107–111
A Model Based on Convolutional Neural Network (CNN) for Vehicle Classification F. M. Javed Mehedi Shamrat, Sovon Chakraborty, Saima Afrin, Md. Shakil Moharram, Mahdia Amina, and Tonmoy Roy
Abstract The convolutional neural network (CNN) is a form of artificial neural network that has become very popular in computer vision. We proposed a convolutional neural network for classifying common types of vehicles in our country in this paper. Vehicle classification is essential in many applications, including surveillance protection systems and traffic control systems. We raised these concerns and set a goal to find a way to eliminate traffic-related road accidents. The most challenging aspect of computer vision is achieving effective outcomes in order to execute a device due to variations of data shapes and colors. We used three learning methods to identify the vehicle: MobileNetV2, DenseNet, and VGG 19, and demonstrated the methods detection accuracy. Convolutional neural networks are capable of performing all three approaches with grace. The system performs impressively on a real-time standard dataset—the Nepal dataset, which contains 4800 photographs of vehicles. DenseNet has a training accuracy of 94.32% and a validation accuracy of 95.37%. Furthermore, the VGG 19 has a training accuracy of 91.94% and a validation accuracy of 92.68%.
F. M. Javed Mehedi Shamrat (B) Department of Software Engineering, Daffodil International University, Dhaka, Bangladesh S. Chakraborty Department of Computer Science and Engineering, European University of Bangladesh, Dhaka, Bangladesh S. Afrin · Md. S. Moharram Department of Computer Science and Engineering, Daffodil International University, Dhaka, Bangladesh e-mail: [email protected] Md. S. Moharram e-mail: [email protected] M. Amina Department of Computer Science and Engineering, University of Liberal Arts Bangladesh, Dhaka, Bangladesh T. Roy Department of Computer Science and Engineering, North South University, Dhaka, Bangladesh © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. Saraswat et al. (eds.), Congress on Intelligent Systems, Lecture Notes on Data Engineering and Communications Technologies 114, https://doi.org/10.1007/978-981-16-9416-5_37
519
520
F. M. Javed Mehedi Shamrat et al.
The MobileNetV2 architecture has the best accuracy, with a training accuracy of 97.01% and validation accuracy of 98.10%. Keywords Vehicle detection · Transfer learning · CNN · MobileNetV2 · DenseNet · VGG 19
1 Introduction Vehicle traffic collisions now account for around 12 million deaths and 1–3% of global GDP losses in terms of social property. The subjective factors of drivers are the primary causes of traffic crashes. As a result, it is critical to increase road protection and assist vehicles in anticipating and avoiding traffic collisions. In recent years, an increasing number of academics have begun to focus on automotive research and the advancement of driving assistance technologies. In the world of computer vision and secure driving aids, car vision recognition dependent on machine vision is a hotspot. Many researchers have already applied pattern analysis, image processing, and machine learning to the area of vehicle identification, with promising findings that have aided fundamental science and engineering applications [1–5]. Handcrafted feature-based vehicle recognition approaches were proposed for intelligent transportation networks in the early days of computer vision. Ng et al. [6] suggested a HOG-SVM-dependent handcrafted features approach for training SVM classifiers with HOG features and Gaussian kernel functions. The suggested classifier was tested on a 2800-image surveillance camera dataset, and it correctly categorized motorcycles, cars, and lorries 92.3% of the time. Chen et al. [7] proposed a classification system that extracts texture and HOG characteristics and classifies vehicles using a fuzzy-influenced SVM classifier in a separate study. The proposed classifier was tested on a dataset of 2000 photographs, with the proposed systems correctly classifying vehicles, trucks, and buses 92.6% of the time. Matos et al. [8] suggested a two-neural network-based combined approach for embedding the characteristics of the cars, such as height, distance, and bounding boundaries. As a result, the suggested classifier scored 69% on the 100-image dataset. Cui [9] have used SVM to characterize the dataset, which consisted of 340 photographs of vehicles, minibuses, and trucks, and suggested two scale-invariant feature transform (SIFT) descriptors and a Bag of Words (BoW)-dependent combined model for feature extraction. The proposed classifier achieved 90.2% accuracy on the given dataset, according to the results. To separate data into the vehicle and non-vehicle types, Wen et al. [10] suggested an AdaBoost-based quick learning vehicle classifier. Furthermore, the researchers have developed a method for obtaining Haar-like features in order to train classifiers quickly. The accuracy of the provided classifier was evaluated on the public Caltech dataset, and it was found to be 92.89%. R. Sindoori and et al. proposed a technique for determining vehicle disclosure structure from groveling images in 2013. The pixel clever grouping technique is used to complete the vehicle disclosure. Alpatov and colleagues proposed a method to track and count real-time vehicles in
A Model Based on Convolutional Neural Network (CNN) …
521
order to ensure road protection in 2018. They processed and implemented the algorithm using photographs from a stationary camera to prevent any traffic accidents [11]. Vishwanath P. Baligar and Mallikarjun Anandhalli suggested a color-based algorithm for detecting and tracking vehicles. Seda Kul proposed an analysis paper on vehicle identification and classification in 2017 [12]. Logistic regression, neural networks, SVM, and CNN are just a few of the algorithms they listed for detecting and classifying vehicles. In 2019, Watcharin Maungmai introduced a method for categorizing vehicles based on two factors: vehicle model and color. They used CNN as a deep learning algorithm and received 81.62% and 70.09% for those two cases. They used 763 s of video data to create photographs in this analysis [13]. In 2018, Bensedik Hicham and colleagues presented a CNN-based method for vehicle classification. They used a total of 2400 photographs to build their dataset, with an overall precision of 89% across four different styles of vehicles. The following section follows the very same format. The most recent advances in the field of vehicle detection are included in this sector. Section 2 outlines the research strategy and methods for designing and developing the whole system. Section 3 examines the outcomes of the system that was implemented. Section 4 ends with a theory, flaws, and recommendations for potential research.
2 Research Methodology CNNs are a form of deep neural [14–20] network that is often used in deep learning to analyze visual imagery. A CNN is a deep learning algorithm that takes an image as input and assigns value to various image sections when distinguishing between them. Because of its great accuracy, CNNs are utilized for image processing and mapping. The CNN uses a hierarchical model to build a funnel-shaped network, which then produces a fully connected layer that connects all neurons and processes input. AI has made considerable progress in closing the gap between human and machine capabilities. The DenseNet, VGG 19, and MobileNetV2 architectures were utilized in this work for vehicle detection. The complete suggested system diagram is shown in Fig. 1.
2.1 Data Collection For this research, we have used the vehicles-Nepal dataset. The dataset consists of 4800 images. We have classified the images [21–26] into the following categories. The categories are bus pictures, motorbike pictures, truck pictures, car pictures, etc. The images are in different sizes. In Fig. 2, we have shown the sample images of the dataset.
522
F. M. Javed Mehedi Shamrat et al.
Fig. 1 Proposed model diagram
2.2 Preprocessing We have converted all the images into graylevel images using the OpenCV library [27–29] for this research. The label is set using the os.path.sep function. As all the images are not in the same size, we converted all the images into the same size. After resizing, the images [30–32] are in 128 × 128 shape. As overfitting is a big issue
A Model Based on Convolutional Neural Network (CNN) …
523
Fig. 2 Dataset images samples
in the case of deep learning models, we have normalized all the images by dividing 255.0. The images are converted into NumPy arrays for faster calculations.
2.3 Proposed Convolution Neural Network (CNN) Architecture 2.3.1
DenseNet
DenseNet is a robust transfer learning architecture that is designed based on CNN architecture. Among different feasibilities, the most powerful feature of DenseNet is it takes fewer parameters to give high accuracy. As individual layers can access the gradient provided by the loss function, it also takes lower data to be trained. In our DenseNet-121 architecture, there are five convolution layers and pooling layers. The amount of transition layers is 3; one classification layer is available in this architecture. Finally, the number of DenseBlock is 2. DenseNet uses the weight of ImageNet. Figure 3 shows the overall architecture of our DenseNet121 model.
524
F. M. Javed Mehedi Shamrat et al.
Fig. 3 DenseNet121 model architecture
2.3.2
VGG 19
VGGNet is a convolutional neural network architecture proposed by Karen Simonyan and Andrew Zisserman from the University of Oxford in 2014. One of the VGG model’s variants is VGG 19. It has been trained by millions of image samples. It is mainly known as convolutional neural network (CNN) architecture for image classification using deep learning. VGG 19 consists of 19 layers, where there are 16 convolutional layers, 3 fully connected layers, 5 MaxPool layers, and 1 SoftMax Layer. The VGG model has many variants like VGG 11, VGG 16, VGG 19, and others. VGG 19 is primarily made up of 4096-channel fully connected layers, each followed by a 1000-channel fully connected layer to predict 1000 labels (see Fig. 4).
Fig. 4 VGG 19 model architecture
A Model Based on Convolutional Neural Network (CNN) …
525
Fig. 5 MobileNetV2 model architecture
2.3.3
MobileNetV2 Architecture
A total of 28 deep convolutional neural network layers are used in MobileNetV2. It is based on an inverted residual structure where the residual connections are between the bottleneck layers. Depthwise separable convolutions are used to construct lightweight deep convolutional neural networks for MobileNet streamlined architecture. MobileNet proved as an efficient model for mobile and embedded vision applications. MobileNetV2’s design includes a 32-filter initial fully convolution layer, followed by 19 bottleneck-remaining layers. MobileNets are resource-constrained models that are compact, low latency, and low power. Figure 5 shows the architecture of MobileNetV2.
2.4 Evaluating Performance Using Performance Matrix After the training and testing phases were completed, we evaluated the performance of two models based on precision, recall, F1-score, and accuracy. The following are the formulae that we utilized: Precision = Recall =
TP TP + FP
TP TP + FN
(1) (2)
526
F. M. Javed Mehedi Shamrat et al.
Accuracy =
TP + TN TP + FP + TN + FN
F1 − Score = 2∗
(3)
Recall*Precision Recall + Precision
(4)
3 Experiment Result Analysis NumPy arrays are required for all of our images. We used 80% of our data along with size random state 42 for training purposes. We used the categorical function to binarize the labels when we have more than two outputs. We used the ImageDataGenerator feature from Keras to supplement images for better training. The following criteria are used to construct our images: Rescale
Shear_range
Zoom_range
Horizontal_flip
Rotation_range
1./255
0.2
0.3
True
0.3
We have frozen out the base layer for all transfer learning architectures in order to include our trainable layer. All of the trainable layers have been set to False. We used the average pooling activity in our trainable layer, where the pool size is (2.2). The ReLU activation mechanism is included in the first convolution layer, which has a total of 128 neurons. We have four neurons in the output layer, and our output is divided into four types and SoftMax activation functions. We have set the learning rate to 0.001, ran our code for 10 epochs, and selected 64 as the batch size. We used the stochastic gradient descent algorithm for backpropagation and learning, and the loss function was measured using categorical cross-entropy. The training performance of our three transfer learning algorithms is shown in Fig. 6 (Table 1). Fig. 6 Comparison of validation accuracy of the proposed algorithms
A Model Based on Convolutional Neural Network (CNN) …
527
Table 1 Accuracy shown by transfer learning models on training set Epoch
DenseNet 121
MobileNetV2
VGG 19
1
93.33
95.51
89.84
2
93.72
95.89
90.12
3
92.90
96.24
90.85
4
93.67
96.93
89.99
5
93.83
96.64
91.32
6
92.49
96.87
90.53
7
94.21
97.15
90.64
8
93.01
96.78
91.73
9
94.55
97.10
91.59
10
94.32
97.01
91.94
Fig. 7 Comparison of validation accuracy of the proposed algorithms
Table 2 Accuracy of transfer learning models on validation dataset Epoch
DenseNet 121
MobileNetV2
VGG 19
1
94.12
96.03
90.94
2
94.35
96.57
92.12
3
93.80
97.23
92.05
4
94.25
97.19
91.78
5
94.73
97.75
92.43
6
95.39
96.33
93.09
7
94.73
97.20
92.73
8
94.93
98.06
92.66
9
95.03
98.02
92.79
10
95.37
98.10
92.68
528
F. M. Javed Mehedi Shamrat et al.
Fig. 8 Detection of different types of vehicles
Figure 7 depicts the validation accuracy graphs. Table 2 shows the approximate results, with a maximum accuracy of 98.10% on the MobileNetV2 architecture. Besides, DenseNet achieves 95.37% accuracy and 92.68% gain on VGG 19. The Recall, Accuracy, F1-score, and Precision are given below for three transfer learning models. The classification report is generated based on four categories of data. Precision, Accuracy, F1-score and Recall have been calculated using the following formulas: Report
DenseNet 121
MobileNetV2
VGG 19
Precision
93.32
96.39
92.39
Accuracy
94.18
96.46
91.38
F1-score
95.74
95.23
91.36
Recall
93.26
96.63
93.20
Many of the other models in this analysis performed worse than the MobilenetV2 version. This model is capable of identifying categorical vehicles in a picture. The identification outcome of MobileNetV2 is depicted in Fig. 8. It shows the single prediction using MobileNetV2.
4 Conclusion This study proposes a CNN-based vehicle classification and detection system to improve the intelligent transport system. We used DenseNet 121, MobileNetV2,
A Model Based on Convolutional Neural Network (CNN) …
529
VGG 19 in our proposed model to detect the vehicles. MobileNetV2 performed well and gain the highest accuracy of 98.10% on the MobileNetV2 architecture. Besides, DenseNet achieves 95.37% accuracy and 92.68% gain on VGG 19. In the future, we will work with a broader dataset and integrate IoT cloud with this model for centralized monetization.
References 1. Tsai LW, Hsieh JW, Fan KC (2007) Vehicle detection using normalized color and edge map. IEEE Trans Image Process 16(3):850–864 2. Martin DR, Fowlkes CC, Malik J (2004) Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Trans Pattern Anal Mach Intell 26(5):530–549 3. Zhang Z, Xu Y, Yang J, Li X, Zhang D (2015) A survey of sparse representation: algorithms and applications. IEEE Access 3:490–530 4. Wen X, Shao L, Fang W, Xue Y (2015) Efficient feature selection and classification for vehicle detection. IEEE Trans Circuits Syst Video Technol 25(3):508–517 5. Cai Y, Liu Z, Wang H, Sun X (2017) Saliency-based pedestrian detection in far infrared images. IEEE Access 5:5013–5019 6. Ng LT, Suandi SA, Teoh SS (2014) Vehicle classification using visual background extractor and multi-class support vector machines. In: Proceedings of the the 8th international conference on robotic, vision, signal processing and power applications, pp 221–227, Springer, Singapore 7. Chen Y, Qin GF (2014) Video-based vehicle detection and classification in challenging scenarios. Int J Smart Sens Intell Syst 7(3) 8. Matos FMS, de Souza RMCR (2013) Hierarchical classification of vehicle images using NN with conditional adaptive distance. In: Proceedings of the international conference on neural information processing, pp 745–752, Springer, Berlin, Germany 9. Cui YY (2013) Research on vehicle recognition in intelligent transportation. University of Electronic Science and Technology of China, Chengdu, China 10. Wen X, Shao L, Xue Y, Fang W (2015) A rapid learning algorithm for vehicle classification. Inf Sci 295(1):395–406 11. Alpatov BA et al (2018) Vehicle detection and counting system for real-time traffic surveillance. In: 2018 7th mediterranean conference on embedded computing (MECO), Budva, pp 1–4 12. Kul S, Eken S, Sayar A (2017) A concise review on vehicle detection and classification. In: 2017 international conference on engineering and technology (ICET), Antalya, pp 1–4 13. W Maungmai, C Nuthong (2019) Vehicle classification with deep learning. In: 2019 IEEE 4th international conference on computer and communication systems (ICCCS), pp 294–298 14. Jeved Mehedi Shamrat FM, Chakraborty S, Billah MM, Jubair MA, Islam MS, Ranjan R (2021) Face mask detection using convolutional neural network (CNN) to reduce the spread of Covid19. In: 2021 5th ınternational conference on trends in electronics and ınformatics (ICOEI), pp 1231–1237. https://doi.org/10.1109/ICOEI51242.2021.9452836 15. Karim A, Ghosh P, Anjum AA, Junayed MS, Md ZH, Hasib, Khan M, Bin Emran AN (2020) A comparative study of different deep learning model for recognition of handwriting digits. ICICNIS 2020, Available at SSRN: https://ssrn.com/abstract=3769231. https://doi.org/ 10.2139/ssrn.3769231 16. Jeved Mehedi Shamrat FM, Tasnim Z, Ghosh P, Majumder A, Hasan MZ (2020) Personalization of job circular announcement to applicants using decision tree classification algorithm. In: 2020 IEEE ınternational conference for ınnovation in technology (INOCON), Bangluru, India, pp 1–5. https://doi.org/10.1109/INOCON50539.2020.9298253 17. Pronab G, Azam S, Hasib KM, Karim A, Jonkman M, Anwar A (2021) A performance based study on deep learning algorithms in the effective prediction of breast cancer. IJCNN 2021: ınternational joint conference on neural networks
530
F. M. Javed Mehedi Shamrat et al.
18. Jeved Mehedi Shamrat FM, Chakraborty S, Imran MM, Muna JN, Billah MM, Das P, Rahman MO (2021) Sentiment analysis on Twitter tweets about COVID-19 vaccines using NLP and supervised KNN classification algorithm. Indonesian J Electr Eng Comput Sci 23(1):463–470, ISSN: 2502–4752. https://doi.org/10.11591/ijeecs.v23.i1.pp463-470 19. Chakraborty S, Jeved Mehedi Shamrat FM, Billah MM, Jubair MA, Alauddin M, Ranjan R (2021) Implementation of deep learning methods to ıdentify rotten fruits. In: 2021 5th ınternational conference on trends in electronics and ınformatics (ICOEI), 2021, pp 1207–1212. https://doi.org/10.1109/ICOEI51242.2021.9453004 20. Jeved Mehedi Shamrat FM, Jubair MA, Billah MM, Chakraborty S, Alauddin M, Ranjan R (2021) A deep learning approach for face detection using max pooling. In: 2021 5th ınternational conference on trends in electronics and ınformatics (ICOEI), pp 760–764. https://doi.org/10. 1109/ICOEI51242.2021.9452896 21. Karim A, Ghosh P, Anjum AA, Junayed MS, Md ZH, Hasib KM, Karmokar AN, Bairagi P, Mondal S, Nur A, Moon FN, Karim NN, Yeo KC (2020) A novel IoT based accident detection and rescue system. In: 2020 third international conference on smart systems and inventive technology (ICSSIT), IEEE 22. Islam MA, Akter S, Hossen MS, Keya SA, Tisha SA, Hossain S (2020) Risk factor prediction of chronic kidney disease based on machine learning algorithms. In: 2020 3rd ınternational conference on ıntelligent sustainable systems (ICISS), pp 952–957. https://doi.org/10.1109/ ICISS49785.2020.9315878 23. Pronab Ghosh FM, Jeved Mehedi Shamrat FM, Shultana S, Afrin S, Anjum AA, Khan AA (2020) Optimization of prediction method of chronic kidney disease using machine learning algorithm. In: 2020 15th ınternational joint symposium on artificial ıntelligence and natural language processing (iSAI-NLP), Bangkok, Thailand, pp 1–6. https://doi.org/10.1109/iSAINL P51646.2020.9376787 24. Javed Mehedi Shamrat FM, Imran M, Sazzadur Rahman AKM, Anup M, Zarrin T, Naimul IN (2020) A smart automated system model for vehicles detection to maintain traffic by ımage processing. Int J Sci Technol Res 9(02):2921–2928, ISSN: 2277–8616 25. Pronab G et al (2021) Human face recognition using eigenface, SURF methods. In: International conference on pervasive computing and social networking (ICPCSN 2021). [Springer_LNCS] 26. Ghosh P et al (2021) Efficient prediction of cardiovascular disease using machine learning algorithms with relief and LASSO feature selection techniques. IEEE Access 9:19304–19326. https://doi.org/10.1109/ACCESS.2021.3053759 27. Jeved Mehedi Shamrat FM, Chakraborty S, Billah MM, Kabir M, Shadin NS, Sanjana S (2021) Bangla numerical sign language recognition using convolutional neural networks (CNNs). Indonesian J Electr Eng Comput Sci 23(1):405–413, ISSN: 2502–4752. https://doi.org/10. 11591/ijeecs.v23.i1.pp405-413 28. Karim A, Azam S, Shanmugam B, Kannoorpatti K (2020) Efficient clustering of emails into spam and ham: the foundational study of a comprehensive unsupervised framework. IEEE Access 8:154759–154788 29. Mahmud K, Azam S, Karim A, Zobaed S, Shanmugam B, Mathur D (2021) Machine learning based PV power generation forecasting in alice springs. IEEE Access, 1–1 30. Islam MR, Azam S, Shanmugam B, Karim A, El-Den J, Boer FD, Jonkman M, Yadav A (2020) Smart parking management system to reduce congestion in urban area. In: 2020 2nd international conference on electrical, control and instrumentation engineering (ICECIE), pp 1–6. https://doi.org/10.1109/ICECIE50279.2020.9309546 31. Chowdhury AI, Ashraf M, Islam A, Ahmed E, Jaman MS, Rahman MM (2020) hActNET: an improved neural network based method in recognizing human activities. In: 2020 4th international symposium on multidisciplinary studies and innovative technologies (ISMSIT), pp 1–6. https://doi.org/10.1109/ISMSIT50672.2020.9254992 32. Song Q, Lee J, Akter S, Rogers M, Grene R, Li S (2020) Prediction of condition-specific regulatory genes using machine learning. Nucleic Acids Res 48(11):e62. https://doi.org/10. 1093/nar/gkaa264
A Transfer Learning Approach for Face Recognition Using Average Pooling and MobileNetV2 F. M. Javed Mehedi Shamrat, Sovon Chakraborty, Md. Shakil Moharram, Tonmoy Roy, Masudur Rahman, and Biraj Saha Aronya
Abstract Facial recognition is a fundamental method in facial-related science such as face detection, authentication, monitoring, and a crucial phase in computer vision and pattern recognition. Face recognition technology aids in crime prevention by storing the captured image in a database, which can then be used in various ways, including identifying a person. With just a few faces in the frame, most facial recognition systems function sufficiently when the techniques have been tested under artificial illumination, with accurate facial poses and non-blurry images. In our proposed system, a face recognition system is proposed using average pooling and MobileNetV2. The classifiers are implemented after a set of preprocessing steps on the retrieved image data. To compare the model is more effective, a performance test on the result is performed. It is observed from the study that MobileNetV2 triumphs over average pooling with an accuracy rate of 98.89% and 99.01% on training and test data, respectively. Keywords Face recognition · CNN · Average pooling · MobileNetV2 · Accuracy · Performance comparison F. M. Javed Mehedi Shamrat (B) · B. S. Aronya Department of Software Engineering, Daffodil International University, Dhaka, Bangladesh B. S. Aronya e-mail: [email protected] S. Chakraborty Department of Computer Science and Engineering, European University of Bangladesh, Dhaka, Bangladesh Md. S. Moharram · M. Rahman Department of Computer Science and Engineering, Daffodil International University, Dhaka, Bangladesh e-mail: [email protected] M. Rahman e-mail: [email protected] T. Roy Department of Computer Science and Engineering, North South University, Dhaka, Bangladesh © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. Saraswat et al. (eds.), Congress on Intelligent Systems, Lecture Notes on Data Engineering and Communications Technologies 114, https://doi.org/10.1007/978-981-16-9416-5_38
531
532
F. M. Javed Mehedi Shamrat et al.
1 Introduction The human face has a significant impact on our day-to-day social interactions, such as how we project someone’s personality when we see their face. Using mathematics to evaluate facial characteristics and then saving them as a faceprint to identify individuals, face recognition is a biometric technique used by the government. Biometric face recognition technology has recently generated a lot of attention due to its many uses in law enforcement as well as various civilian businesses, institutions, and organizations [1–4]. Other biometric technologies such as fingerprints, palm prints, and iris recognition have the benefit of being frictionless. Machine learning techniques and neural network models are often utilized for human face identification. Conventional image feature extraction methods for face recognition are accurate and speedy. Ma et al. [5] introduced an AdaBoost-based training method to achieve cascade classifiers of different function forms: Haar-like HOG for more excellent discrimination. Since there are too many imperfect classifiers, this requires a lot of computing power. To address severe facial occlusion, a Bayesian framework-based method [6] utilized the Omega form produced by the person’s head and shoulder. It detects faces with intense occlusion well in automatic teller machines, but the scene is thin. In addition to AdaBoost-based methods, Mathias et al. [7] proposed face recognition utilizing deformable component models (DPM) and obtained positive results. However, the computational cost of this method is usually very large. Another technique based on DPM is proposed for identifying faces of occlusion [8–12]. Although only facial recognition representations are used in the tests, they have a low uniqueness and minimize false-negative face recognition and identity error rates [13–17]. In the proposed system, face recognition is done using two neural network algorithms, i.e., average pooling and MobileNetV2. After a series of preprocessing steps on the retrieved image data, the classifiers are implemented. A performance evaluation of the outcome is done to compare which model is more efficient.
2 Research Methodology The proposed study is based on two models for human face recognition. The average pooling and MobileNetV2 models are implemented in the image data. The whole suggested system diagram is shown in Fig. 1.
2.1 Data Collection This study makes use of the LFW (face recognition) dataset (https://www.kaggle. com/atulanandjha/lfwpeople). Although there are over 13,000 pictures in the collection, but only utilized 13,000 pictures for our study. Each image is appropriately
A Transfer Learning Approach for Face Recognition Using Average …
533
Fig. 1 System diagram of the process
labeled with the person’s name. A total of 104 photographs were used to build the dataset. Figure 2 contains the sample data from the retrieved dataset.
Fig. 2 Sample image data
534
F. M. Javed Mehedi Shamrat et al.
2.2 Preprocessing and Augmentation of Data CNN performs better with a large volume of data. We have allowed zooming, sharing, and scaling in the ImageDataGenerator function [18–21]. The photographs were first converted to 256 × 256 pixels. The ImageDataGenerator tool was used to expand the size of our current dataset. NumPy arrays contain all the image data [22–26]. In our study used 80% of our data along with size random state 42 for training purposes and utilized the categorical function to binarize the labels when we have more than two outputs. The ImageDataGenerator feature from keras to supplement images for better training. The following criteria are used to create our images:
2.3 Proposed Classifier (1)
Average Pooling
To initiate, images are converted to 256 × 256 pixels and passed to the proposed model’s first convolutional layer. There are a total of 128 hidden layers. After running the average pooling method, converted all of the images to 128 × 128 pixels. The feature is then extracted by the second convolutional layer, which employs average pooling once more [27–31]. The images are resized to 32 × 32 pixels in the final layer. Images at this stage are converted into NumPy arrays to make measurements easier. Applying a connected layer is the final stage. We used the ReLU activation feature in both convolution layers, and in the output layer, and used the softmax activation function [32–36]. For finding the best results, Adam’s stochastic gradient descent is used. The following is the suggested algorithm 1:
Algorithm 1: Proposed Average Pooling Step 1: Load Dataset Step 2: Function Conv2D (matrix = 256 x 256){ Step 3: Activate RELU} Step 4: Function AveragePooling2D (data, pool size) Step 5: Function Conv2D (matrix = 128 x 128, padding){ Step 6: Activate RELU} Step 7: Function AveragePooling2D (data, pool size) Step 8: Function Conv2D (matrix = 64 x 64){ Step 9: Activate RELU)} Step 10:Function AveragePooling2D (data, pool size) Step 11: Function Conv2D (matrix = 32 x 32){ Step 12: Activate RELU)} Step 13:Function AveragePooling2D (data, pool size) Step 14: Reshape image, set list -> Flatten Step 15: Activate Softmax Step 16: Output Data Classification
A Transfer Learning Approach for Face Recognition Using Average …
(2)
535
MobileNetV2
MobileNetV2 employs 28 deep convolutional neural network layers. It is built on an inverted residual structure, with residual connections between bottleneck layers [37– 42]. For MobileNet streamlined architecture, lightweight deep convolutional neural networks are built using depth-wise separable convolutions. MobileNet is a costeffective model for mobile and embedded vision applications [43–48]. The original completely convolution layer with 32 filters is followed by 19 residual bottleneck layers in the MobileNetV2 design. MobileNets are low-latency, low-power models that have been parameterized to satisfy the resource constraints of various use cases.
2.4 Evaluating Performance Using Performance Matrix After the training and testing process, we evaluated the performance using precision, recall, F1-score, and accuracy. Equations 1, 2, 3, and 4 are the formulas that used: Accuracy =
TP + TN TP + TN + FP + FN
Sensitivity or recall = Precision = F1 − score = 2 ×
TP TP + FN
TP TP + FP
precision × recall precision + recall
(1) (2) (3) (4)
3 Results and Discussion 3.1 Performance of Average Pooling The model will recognize a specific person’s face based on their name. The dataset contains 13,000 photos, including images from 1680 individuals. Training and research testing have been divided into 80/20 fractions. The model can detect images with 93.13% accuracy for training results, and validation accuracy reaches 93.65% at the top stage at 10 epoch. The data loss during validation is a minimum of 6.89%. Table 1 shows the results of our training and validation sets. The model attained 91.78% training accuracy and 92.17% validation accuracy on the overall classification. After the evaluation of the dataset has been done, the
536
F. M. Javed Mehedi Shamrat et al.
Table 1 Average pooling classification result of training and validation dataset Epoch
Training loss (%)
Training accuracy (%) Validation loss (%)
Validation accuracy (%)
1
47.34
81.93
19.74
86.87
2
15.13
85.25
15.57
87.18
3
10.25
88.02
11.73
87.87
4
10.02
88.89
9.28
88.49
5
9.47
88.37
9.54
88.58
6
9.29
89.74
8.90
89.43
7
9.0.21
89.37
8.58
90.00
8
7.78
90.37
7.36
91.57
9
7.25
92.54
7.28
92.48
10
6.94
93.13
6.89
93.65
Table 2 Performance evaluation on average pooling Class
Precision
Recall
F1—score
Accuracy
Training set
89.54
90.57
90.10%
91.78
Testing set
90.89
91.35
90.85%
92.17
Table 3 Performance evaluation of classes on average pooling Class
Precision
Recall
F1—Score
Accuracy
Shamrat Shongkho
92.74
90.59
91.78
93.53
92.05
91.56
90.11
92.89
Masum
93.35
90.29
91.84
89.47
Jubair
91.73
90.39
90.09
90.23
performance evaluation is shown. In Table 2, the performance measurements are stated. The classification of the model based on each class is done. The performance of the classification of the classes is described in Table 3.
3.2 Performance of MobileNetV2 The model can detect images with 98.92% accuracy in the training data, and validation accuracy reached 99.54% at 10 epoch. A minimum of 3.59% of data is lost during validation. The outcomes of our training and validation sets are seen in Table 4.
A Transfer Learning Approach for Face Recognition Using Average …
537
Tabel 4 Accuray of MobileNetV2 on dataset Epoch
Training loss
Training accuracy
Validation loss
Validation accuracy
1
6.47
96.46
5.84
97.05
2
5.83
96.84
5.25
97.67
3
5.73
96.92
5.94
97.85
4
5.49
97.02
5.67
97.58
5
3.87
98.59
5.37
97.04
6
4.49
97.99
4.39
98.49
7
3.78
98.73
3.93
98.64
8
3.57
98.79
3.68
98.95
9
3.98
98.62
3.73
99.10
10
3.76
98.92
3.59
99.54
Table 5 Performance of MobileNetV2 Class
Precision
Recall
F1—score
Accuracy
Training set
98.75
98.73%
98.28%
98.89
Testing set
99.20
98.69%
98.74%
99.01
Table 6 Perfomance of each class in MobileNetV2 Class
Precision
Recall
F1—score
Accuracy
Shamrat Shongkho
98.47
97.75
98.26
98.48
98.55
97.76
98.46
98.86
Masum
98.18
97.56
98.45
97.99
Jubair
98.54
97.82
97.28
98.43
The overall performance calculation of the MobileNetv2 model on the dataset is illustrated in Table 5, and the performance based on class is shown in Table 6.
3.3 Comparison of Models’ Classifications Figure 3 shows the accuracy comparison graph of both the train and test set of the MobileNetV2 model and the CNN average pooling model. From the graph, it can be observed that the MobileNetV2 has a way more accuracy rate compared to the other model. In Fig. 4, the final output of the implemented MobileNetV2 model is given.
538
F. M. Javed Mehedi Shamrat et al.
Fig. 3 Accuracy comparison of the models
Fig. 4 Face recognition using MobileNetV2
4 Conclusion Face recognition is a method of recognizing or verifying a person’s identity by using their face. Face recognition has been utilized for various uses, including an auto-management attendance control system and surveillance of restricted access areas. From the study, it is understood that the MobileNetV2 has a high accuracy rate compared to CNN average pooling. The model has an accuracy rate of 98.89% on training data and 99.01% on validation data. Furthermore, in the case of face recognition based on class, the model shows the highest accuracy, up to 98.86%. In future, determine to implement more CNN models for face recognition to come up with the most accurate system.
A Transfer Learning Approach for Face Recognition Using Average …
539
References 1. Zhang T, Li J, Jia W, Sun J, Yang H (2018) Fast and robust occluded face detection in ATM surveillance. Pattern Recognit Lett 107:33–40 2. Mathias M, Benenson R, Pedersoli M, Van Gool L (2014) Face detection without bells and whistles. In: Proceedings of European conference on computer vison springer, pp 720–735 3. Marcetic D, Ribaric S (2016) Deformable part-based robust face detection under occlusion by using face decomposition into face components. In: Proceedings of 39th interenational convention on information and communication technology, electronics microelectronics (MIPRO), pp 1365–1370 4. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of IEEE conference on computer vison pattern recognition (CVPR), pp 1–9 5. Li X, Yang Z, Wu H (2020) Face detection based on receptive field enhanced multi-task cascaded convolutional neural networks. IEEE Access 8:174922–174930. https://doi.org/10. 1109/ACCESS.2020.3023782 6. Huang GB, Mattar M, Berg T, Learned–Miller E (2008) Labeled faces in the wild: a database forstudying face recognition in unconstrained environments. In: Proceedings of workshop faces ‘real-life’ images, detection, alignment, recognition, pp 1–11 7. Sun Y, Liang D, Wang X, Tang X (2015) DeepID3: face recognition with very deep neural networks. 2015, arXiv:1502.0087. Available: https://arxiv.org/abs/1502.00873 8. Javed Mehedi Shamrat FM, Tasnim Z, Ghosh P, Majumder A, Hasan MZ (2020) Personalization of Job circular announcement to applicants using decision tree classification algorithm. In: 2020 IEEE international conference for innovation in technology (INOCON), Bangluru, India, pp 1–5, https://doi.org/10.1109/INOCON50539.2020.9298253 9. Manlangit S (2019) Novel machine learning approach for analyzing anonymous credit card fraud patterns. Int J Electron Commerce Stud 10(2) 10. Javed Mehedi Shamrat FM, Ghosh P, Sadek MH, Kazi MA, Shultana S (2020) Implementation of machine learning algorithms to detect the prognosis rate of kidney disease. In: 2020 IEEE international conference for innovation in technology (INOCON), Bangluru, India, pp 1–7. https://doi.org/10.1109/INOCON50539.2020.9298026 11. Ghosh P, Javed Mehedi Shamrat FM, Shultana S, Afrin S, Anjum AA, Khan AA (2020) Optimization of prediction method of chronic kidney disease using machine learning algorithm. In: 2020 15th international joint symposium on artificial intelligence and natural language processing (iSAI-NLP), Bangkok, Thailand, pp 1–6. https://doi.org/10.1109/iSAI-NLP51646. 2020.9376787 12. Mahmud K, Azam S, Karim A, Zobaed S, Shanmugam B, Mathur D (2021) Machine Learning based PV power generation forecasting in alice springs. IEEE Access, pp 1–1 13. Javed Mehedi Shamrat FM, Asaduzzaman M, Sazzadur Rahman AKM, Tusher RTH, Tasnim Z (2019) A comparative analysis of parkinson disease prediction using machine learning approaches. Int J Sci Technol Res 8(11):2576–2580, ISSN: 2277–8616 14. Taigman Y, Yang M, Ranzato M, Wolf L (2014) DeepFace: closing the gap to human-level performance in face verification. In: Proceedings of IEEE Conference on Computer Vision Pattern Recognition, pp 1701–1708 15. Sun Y, Wang X, Tang X (2014) Deep learning face representation from predicting 10,000 classes. In: Proceeding of IEEE Conference on Computer Vision Pattern Recognition, pp 1891– 1898 16. Javed Mehedi Shamrat FM, Abu Raihan M, Sazzadur Rahman AKM, Imran M, Rozina A (2020) An analysis on breast disease prediction using machine learning approaches. Int J Sci Technol Res 9(02):2450–2455, ISSN: 2277–8616 17. Ghosh P et al (2021) Efficient prediction of cardiovascular disease using machine learning algorithms with relief and LASSO feature selection techniques. IEEE Access 9:19304–19326. https://doi.org/10.1109/ACCESS.2021.3053759
540
F. M. Javed Mehedi Shamrat et al.
18. Sazzadur Rahman AKM, Javed Mehedi Shamrat FM, Tasnim Z, Roy J, Hossain SA (2019) A comparative study on liver disease prediction using supervised machine learning algorithms. Int J Sci Technol Res 8(11):419–422, ISSN: 2277–8616 19. Javed Mehedi Shamrat FM, Tasnim Z, Imran M, Jahan MN, Nobel NI (2020) Application of K-means clustering algorithm to determine the density of demand of different kinds of jobs. Int J Sci Technol Res 9(02):2550–2557, ISSN: 2277–8616 20. Karim A, Azam S, Shanmugam B, Kannoorpatti K (2020) Efficient clustering of emails into spam and ham: the foundational study of a comprehensive unsupervised framework. IEEE Access 8:154759–154788 21. Ghosh P et al (2020) A comparative study of different deep learning model for recognition of handwriting digits. Int Conf IoT Based Control Netw Intell Syst (ICICNIS) 857–866 22. Foysal MF, Islam MS, Karim A, Neehal N (2019) Shot-Net: a convolutional neural network for classifying different cricket shots. Commun Comput Inf Sci 111–120 23. Junayed MS, Jeny AA, Neehal N, Atik ST, Hossain SA (2019) A comparative study of different CNN models in city detection using landmark images. In: Santosh K, Hegadi R (eds) Recent trends in image processing and pattern recognition. RTIP2R 2018. Communications in computer and information science, vol 1035. Springer, Singapore. https://doi.org/10. 1007/978-981-13-9181-1_48 24. Biswas A, Chakraborty S, Rifat ANMY, Chowdhury NF, Uddin J (2020) Comparative analysis of dimension reduction techniques over classification algorithms for speech emotion recognition. In: Miraz MH, Excell PS, Ware A, Soomro S, Ali M (eds) Emerging technologies in computing. iCETiC 2020. Lecture notes of the institute for computer sciences, social informatics and telecommunications engineering, vol 332. Springer, Cham. https://doi.org/10.1007/ 978-3-030-60036-5_12 25. Javed Mehedi Shamrat FM, Allayear SM, Alam MF, Jabiullah MI, Ahmed R (2019) A smart embedded system model for the AC automation with temperature prediction. In: Singh M, Gupta P, Tyagi V, Flusser J, Ören T, Kashyap R (eds) Advances in computing and data sciences. ICACDS 2019. Communications in computer and information science, vol 1046. Springer, Singapore. https://doi.org/10.1007/978-981-13-9942-8_33 26. Javed Mehedi Shamrat FM, Tasnim Z, Nobel NI, Ahmed MR (2019) An automated embedded detection and alarm system for preventing accidents of passengers vessel due to overweight. In: Proceedings of the 4th international conference on big data and internet of things (BDIoT’19). Association for computing machinery, New York, NY, USA, Article 35, 1–5. https://doi.org/ 10.1145/3372938.3372973 27. Javed Mehedi Shamrat FM, Nobel NI, Tasnim Z, Ahmed R (2020) Implementation of a smart embedded system for passenger vessel safety. In: Saha A, Kar N, Deb S (eds) Advances in computational intelligence, security and internet of things. ICCISIoT 2019. Communications in computer and information science, vol 1192. Springer, Singapore. https://doi.org/10.1007/ 978-981-15-3666-3_29 28. Islam Chowdhury A, Munem Shahriar M, Islam A, Ahmed E, Karim A, Rezwanul Islam M (2020) An automated system in ATM booth using face encoding and emotion recognition process. In: 2020 2nd international conference on image processing and machine vision 29. Javed Mehedi Shamrat FM, Allayear SM, Jabiullah MI (2018) Implementation of a smart AC automation system with room temperature prediction. Je Bangladesh Electron Soc 18(1–2):23– 32, ISSN: 1816–1510 30. Deng J, Guo J, Xue N, Zafeiriou S (2019) ArcFace: additive angular margin loss for deep face recognition. In: Proceedings of IEEE/CVF conference computer vision pattern recognition (CVPR), pp 4690–4699 31. Chen D, Xu C, Yang J, Qian J, Zheng Y, Shen L (2018) Joint Bayesian guided metric learning for end-to-end face verification. Neurocomputing 275:560–567 32. Khan MH, McDonagh J, Tzimiropoulos G (2017) Synergy between face alignment and tracking via discriminative global consensus optimization. In: Proceedings IEEE international conference computer vision (ICCV), pp 3811–3819
A Transfer Learning Approach for Face Recognition Using Average …
541
33. Dro˙zd˙z M, Kryjak T (2016) FPGA implementation of multi-scale face detection using HOG features and SVM classifier. Image Process Commun 21(3):27–44 34. Ma C, Trung N, Uchiyama H, Nagahara H, Shimada A, Taniguchi R-I (2017) Adapting local features for face detection in thermal image. Sensors 17(12):2741 35. Song Q, Lee J, Akter S, Rogers M, Grene R, Li S (2020) Prediction of condition-specific regulatory genes using machine learning, Nucleic Acids Res 48(11):e62. https://doi.org/10. 1093/nar/gkaa264 36. Islam Chowdhury A, Munem Shahriar M, Islam A, Ahmed E, Karim A, Rezwanul Islam M (2020) An automated system in atm booth using face encoding and emotion recognition process. In: 2020 2nd international conference on image processing and machine vision, pp 57–62 37. Javed Mehedi Shamrat FM, Chakraborty S, Billah MM, Das P, Muna JN, Ranjan R (2021) A comprehensive study on pre-pruning and post-pruning methods of decision tree classification algorithm. In: 2021 5th International conference on trends in electronics and informatics (ICOEI), pp 1339–1345. https://doi.org/10.1109/ICOEI51242.2021.9452898 38. Islam MA, Akter S, Hossen MS, Keya SA, Tisha SA, Hossain S (2020) Risk factor prediction of chronic kidney disease based on machine learning algorithms. In: 2020 3rd international conference on intelligent sustainable systems (ICISS), pp 952–957. https://doi.org/10.1109/ ICISS49785.2020.9315878 39. Javed Mehedi Shamrat FM, Chakraborty S, Billah MM, Jubair MA, Islam MS, Ranjan R (2021) Face mask detection using convolutional neural network (CNN) to reduce the spread of Covid19. In: 2021 5th international conference on trends in electronics and informatics (ICOEI), pp 1231–1237. https://doi.org/10.1109/ICOEI51242.2021.9452836 40. Chowdhury AI, Ashraf M, Islam A, Ahmed E, Jaman MS, Rahman MM (2020) hActNET: an improved neural network based method in recognizing human activities. In: 2020 4th international symposium on multidisciplinary studies and innovative technologies (ISMSIT) pp 1–6. https://doi.org/10.1109/ISMSIT50672.2020.9254992 41. Javed Mehedi Shamrat FM, Jubair MA, Billah MM, Chakraborty S, Alauddin M, Ranjan R (2021) A deep learning approach for face detection using max pooling. In: 2021 5th international conference on trends in electronics and informatics (ICOEI), pp 760–764. https://doi.org/10. 1109/ICOEI51242.2021.9452896 42. Akter S, Shekhar H, Akhteruzzaman S (2021) Application of biochemical tests and machine learning techniques to diagnose and evaluate liver disease. Adv Biosci Biotechnol 12:154–172. https://doi.org/10.4236/abb.2021.126011 43. Javed Mehedi Shamrat FM, Chakraborty S, Billah MM, Kabir M, Shadin NS, Sanjana S (2021) Bangla numerical sign language recognition using convolutional neural networks (CNNs). Indonesian J Electr Eng Comput Sci 23(1):405–413, ISSN: 2502–4752. https://doi.org/10. 11591/ijeecs.v23.i1.pp405-413 44. Anowar F, Sadaoui S (2021) Incremental learning framework for real-world fraud detection environment. Comput Intell 37(1):635–656 45. Rahman Shuvo MN, Akter S, Islam MA, Hasan S, Shamsojjaman M, Khatun T (2021) Recognizing human emotions from eyes and surrounding features: a deep learning approach. Int J Adv Comput Sci Appl (IJACSA) 12(3). https://doi.org/10.14569/IJACSA.2021.0120346 46. Javed Mehedi Shamrat FM, Chakraborty S, Imran MM, Muna JN, Billah MM, Das P, Rahman MO (2021) Sentiment analysis on Twitter tweets about COVID-19 vaccines using NLP and supervised KNN classification algorithm. Indonesian J Electr Eng Comput Sci 23(1):463–470, ISSN: 2502–4752. https://doi.org/10.11591/ijeecs.v23.i1.pp463-470 47. Anowar F, Sadaoui S (2020) Incremental neural-network learning for big fraud data. In: 2020 IEEE international conference on systems, man, and cybernetics (SMC), pp 3551–3557. https:// doi.org/10.1109/SMC42975.2020.9283136 48. Chakraborty S, Javed Mehedi Shamrat FM, Billah MM, Jubair MA, Alauddin M, Ranjan R (2021) Implementation of deep learning methods to identify rotten fruits. In: 2021 5th international conference on trends in electronics and informatics (ICOEI), pp 1207–1212. https://doi.org/10.1109/ICOEI51242.2021.9453004
A Deep Learning Approach for Splicing Detection in Digital Audios Akanksha Chuchra , Mandeep Kaur , and Savita Gupta
Abstract The authenticity of digital audios has a crucial role when presented as evidence in the court of law or forensic investigations. Fake or doctored audios are commonly used for manipulation of facts and causing false implications. To facilitate passive-blind detection of forgery, the current paper presents a deep learning approach for detecting splicing in digital audios. It aims to eliminate the process of feature extraction from the digital audios by taking the deep learning route to expose forgery. A customized dataset of 4200 spliced audios is created for the purpose, using the publicly available Free Spoken Digit Dataset (FSDD). Unlike the other related approaches, the splicing is carried out at a random location in the audio clip that spans 1–3 s. Spectrograms corresponding to audios are used to train a deep convolutional neural network that classifies the audios as original or forged. Experimental results show that the model can classify the audios correctly with 93.05% classification accuracy. Moreover, the proposed deep learning approach also overcomes the drawbacks of feature engineering and reduces manual intervention significantly. Keywords Digital forensics · Audio splicing detection · Deep learning · Audio forensics · CNN
1 Introduction The authenticity of audio data is highly significant when presented as evidence in courts during legal proceedings and crime investigations. Audio recordings also play a crucial role in other applications like aircraft accident investigations [1]. For A. Chuchra (B) · M. Kaur · S. Gupta University Institute of Engineering and Technology, Panjab University, Chandigarh 160014, India M. Kaur e-mail: [email protected] S. Gupta e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. Saraswat et al. (eds.), Congress on Intelligent Systems, Lecture Notes on Data Engineering and Communications Technologies 114, https://doi.org/10.1007/978-981-16-9416-5_39
543
544
A. Chuchra et al.
example, recorded calls, cockpit conversations, etc., can be presented as evidence. Fake evidence might be used to misinterpret facts and mislead investigations. The advances in technology have led to the development of such cheap and easy-touse media manipulation tools that can be used even by a person with almost no technical knowledge. Yet, one can produce such fake or doctored content which is quite believable to be authentic. An audio forgery refers to any kind of tampering or manipulation carried out on a given audio to produce a fake version. Deletion (deleting or removing a segment of the audio), splicing (cut and paste), and copymove (copy a segment and paste over some other region in the same audio) are such operations that can be used to alter the meaning of the audio content maliciously [2]. The domain of audio forensics aims to expose any alterations and forgeries that the audios under consideration might have gone through. To verify the integrity of given audios, various methods exploiting different properties of audios have been proposed by the researchers to date. Active approaches such as watermarking [3] and digital signatures [4] have been explored by researchers in the past. However, it is not practical in real-life scenarios to have a watermark embedded in every audio. Hence, passive approaches prove to be more realistic since they do not depend on any additional data with audio, but the contents of the audio itself. Researchers have identified several different cues and features to expose any forgeries in audio data. Audio authentication based on electrical network frequency (ENF) [5–7] is one of the most popular methods among others such as environmental acoustic features analysis [8, 9], source identification [10–12], and double compression detection [13–15]. Researchers have also explored various audio authentication techniques based on time–frequency analysis in [16–22]. Other works based on graphical spectral analysis include [23–25]. Traditional feature-based methods require feature engineering, which refers to making use of domain knowledge to identify various features such as pixel values, shapes, and orientations associated with the data. These features are then used to train a classifier. Hence, the performance of most of these algorithms depends on the accuracy of the identified features. This requires a lot of human intervention, that too with a certain level of domain expertise, which makes it very time-consuming. In recent years, researchers have shown more interest in deep learning for applications like classification. Deep learning is a subset of machine learning, in which the machine learns itself by multiple hits and misses. Deep learning works with neural networks, the model learns and improves on its own by analyzing various algorithms. Deep learning algorithms eliminate this requirement for domain expertise and explicit extraction of features by automatically identifying features from the data that are important for classification purposes, therefore reducing the time required as well. Convolutional neural networks, i.e., CNNs, also known as ConvNets, are deep networks where each layer performs a complex operation and passes on the data to the next layer for further processing. In this work, CNN has been used to perform forgery detection in audios. The audios are converted to spectrograms first, which are then fed to the CNN. Hence, the problem of splicing detection in the audios is reduced to an image classification problem.
A Deep Learning Approach for Splicing Detection in Digital Audios
545
Fig. 1 Spectrograms corresponding to an original audio (left) and an audio having a spliced segment (right)
A spectrogram, besides being easily interpretable than the audio itself due to its graphical nature, preserves the time, frequency, and amplitude information in a single graph. This makes the spectrogram a simple yet very informative visualization of an audio. Figure 1 represents spectrograms corresponding to an original and a forged audio. The rest of the paper is organized as follows. Section 2 presents the literature review of the related work. Section 3 describes the details of the dataset used in this work. Section 4 explains the proposed method and network architecture in detail. Experimental results have been discussed in Sect. 5. We conclude the work in Sect. 6 and discuss various possibilities for future research in this domain.
2 Related Work With the fast evolution of technology, the development of sophisticated editing tools has allowed the creation of fake or forged audio content which is quite believable to be authentic. Hence, researchers in the domain of audio forensics have also explored various features and cues to expose such forgeries. We discuss some of the works exploiting different feature sets, in this section. Probably, the most popular method, electrical network frequency (ENF) analysis approach, was first discussed in [5]. Any recording device which is powered by the main power or even batteries has an ENF signal frequency associated with it due to electromagnetic fields, for reference, its value is 60 Hz for the European countries and 50 Hz for the US [26]. Although the ENF keeps fluctuating, the fluctuating pattern over a network remains the same. Moreover, the fluctuation pattern is unique for a given time interval. Hence, it can be used to detect the time of the creation of the audio [27]. In [6], the authors used absolute-error-map (AEM) to visualize the extracted ENF signal and check for any inconsistencies. In addition to detection of tampering, they were also able to detect the type of forgery and localize it as well. In this way, they proposed a combined system for timestamp verification and tampering detection with high accuracy. A recent work presented in [7] was also based on ENF analysis combined with supervised learning. The method was robust to mp3 audios and worked for short as well as long length recordings.
546
A. Chuchra et al.
Environmental acoustic features such as reverberation have been used by the researchers to detect splicing. Malik and Farid in [8] proposed a method for estimation of the amount of reverberation present in given audio. For experiments, they firstly generated spliced audios such that half of the audio had one reverb time of 0.3 s and the other half had the other reverb time of 0.6 s. Reverberation was estimated for different positions in the spliced audio, and it was found that either half had different mean estimates of the decay parameter. Zhao et al. in [9] proposed an algorithm that focused on source identification and splicing detection and localization in audios using extraction of the magnitude of channel impulse response as well as the ambient noise using spectral classification technique for the suspected frame. The correlation was then used to estimate the similarity between the suspected and reference frame. Splicing or insertion in audios leads to the introduction of audio segments from different sources which have different traces of their respective sources. Microphone classification can be useful to detect the presence of more than one type of traces in the given audio. The study in [10] aimed to determine the microphone model used for a given audio recording by studying frequency-domain features of nearly-silent regions in the audio to classify the source of the audio out of seven microphones. Cuccovillo et al. in [11] proposed a blind audio tampering detection algorithm, to check the presence of footprints from more than one microphones by calculating the magnitude of frequency response. SVM with a radial basis function (RBF) kernel was used for training on the feature vectors, and high classification accuracy was observed. In another work presented in [12], background noise present in audio recordings was used as a fingerprint to identify the recording device with the help of deep learning. After extraction of the noise signal, they turned them into a frequency-domain feature which was then fed to various classifiers including softmax, multilayer perceptron, and CNN. In addition to this, they used model averaging and voting model to sum up the results to get the final classification result. The results showed a high classification accuracy. Whenever an audio file is tampered with, usually, it is first decompressed, then the changes are made, then it is recompressed. This leads to double (or multiple) compression of the original segments of the audio, whereas the modified segment is compressed only once. Several techniques have been developed to detect traces of double compression artifacts in signals for detecting fake quality audios, i.e., transcoding. Bianchi et al. in [13] proposed a method that was able to detect transcoding as well as tampering. Firstly, the given mp3 file was used to obtain a corresponding simulated singly compressed file. Then, they compared the histograms of the given mp3 audio and the corresponding simulated one. A significant difference between the two histograms was observed and measured as the doubly compressed audio showed a certain characteristic distribution of the coefficients. Tampering localization was also achieved based on the fact that the tampered region shows different behavior for a feature proposed in the paper than the non-tampered segments. A similar method was proposed in [14], which used the difference of calibration histogram (DCH) characteristic. In Yan et al. [15], an algorithm was proposed which could reveal the compression history of given audio. Statistics of scale factor
A Deep Learning Approach for Splicing Detection in Digital Audios
547
and Huffman table index were used as features, which were differential statistics, probability distribution, and cross-correlation. Many algorithms are based on time–frequency analysis which can efficiently detect tampering, while also preserving time information that can be used for localization of the tampering. In [16], the authors presented a method based on singularity analysis using wavelets to expose forgeries like insertion, deletion, and splicing in audios and to locate them as well. The observation that tampering generates a singularity point at the tampered location which is isolated as its correlation with other points is disturbed was used as the cue and hence formed the basis of their work. Authors in [17] presented a novel approach for tampering detection in uncompressed audio recordings based on analysis of spectral phase across the STFT, i.e., short-term Fourier transform sub-bands. The higher-order statistical analysis of spectral phase residue and the correlation between phases of adjacent frames were used as a new feature for authentication purpose. In Meng et al. [18], another audio splicing detection algorithm was proposed, where the position of each syllable present in the audio was detected using spectral entropy. Similarity analysis was used to detect the presence of any heterogeneous syllables in the audio using local noise estimated for each syllable. Another algorithm presented by Rouniyar et al. [19] focused on splicing detection and localization based on channel response multi-feature approach. Features obtained from the channel response and log spectrum coefficients of STFT were combined for each frame and were compared with the reference features, and low correlation was used as an indication of the tampered region. Authors in [20, 21] and [22] used pitch-based analysis to expose copy-move forgery. It was suggested by Rafal Korycki in [23] that graphical spectral analysis could be used for the integrity determination of given audio. The author also discussed a reassignment method that could be used to enhance the readability of the spectrogram, hence making it easier to identify tampering and locating them as well. Further, [24] used cross-correlation measures on waveforms to expose forgeries. To detect any abrupt changes in the spectrum, a combination of spectral distance and analysis of coefficients of an adaptive filter was used. Linear prediction residual signal was also analyzed for tampering detection. Machine learning was used for classification and high accuracy was observed. A more recent work presented in Jadhav et al. [25] used an algorithm based on frequency spectrogram of given audio for splicing detection in the audio. According to the authors, instead of explicit extraction of spectral features, convolutional neural network (CNN), which is based on deep learning, could effectively estimate high-level features of the audio spectrogram. The methods discussed above were able to achieve decent accuracies, however, most of them require manual feature engineering, which involves a lot of human expertise and is very time-consuming. To the best of our knowledge, [25] were the first to use deep learning for splicing detection. However, there is a major limitation associated with their work; spliced segments in the audios were only located only in the middle of the audios. Practically, the spliced segment can be present at any location within a given audio. This paper aims to extend the work presented in [25] to overcome their limitations and explore the usability of deep learning in the domain of audio forensics,
548
A. Chuchra et al.
audio splicing detection to be more specific. Inspired by the remarkable performance offered by CNNs over traditional approaches in the domain of computer vision, we use a deep CNN to train on spectrograms generated corresponding to the audios.
3 Dataset Free Spoken Digit Dataset (FSDD) [28] is a publicly available simple audio/speech dataset consisting of recordings of English pronunciations of spoken individual digits 0, 1, 2 up to 9 by six speakers in.wav files at 8 kHz. FSDD contains a total of 3000 recordings; 50 of each digit per speaker. The duration of each recording ranges from 0.5 s to 1 s. Authors in [25] described a method for the generation of original and spliced audios using audio clips from the FSDD dataset in their work. Using a similar method for this work, we use audios from the FSDD to create a dataset containing a total of 8400 audios (4200 authentic and 4200 spliced audios), each of a duration of 10 s approximately. Each of the generated audio files is a.wav file with a bit rate equal to 128 kbps. For simplicity, we refer to the dataset as audio dataset for splicing detection (ADSD) in this paper. For an original audio generation, 10–12 random audios belonging to a single speaker are selected and combined in random order to form a single 10-s long audio. A total of 4200 original audios are generated such that there are 700 audios per speaker. Forged audios, i.e., audios with splicing, are generated by inserting an audio clip belonging to one speaker into audio of a different speaker. There are three different durations of inserted clips, i.e., 1 s, 2 s, and 3 s. The audios are generated automatically using a script, which helps to reduce the manual involvement and saves a huge amount of time as compared to manual generation. The splicing in the forged audios is done at any random position within the audio which provides a better simulation of a real-world setting, as compared to insertion at a fixed position as in [25]. Moreover, the randomized generation of the audios belonging to 6 different speakers introduces more diversity within the dataset. Table 1 describes the detailed distribution of the audios in ADSD. Table 1 Detailed distribution of audios in the proposed dataset ADSD Speaker
Original
Forged 1 s
Forged 2 s
Forged 3 s
Total
George
700
230
230
240
1400
Jackson
700
230
230
240
1400
Lucas
700
230
230
240
1400
Nicolas
700
230
230
240
1400
Theo
700
230
230
240
1400
Yweweler
700
230
230
240
1400
Total
4200
1380
1380
1440
8400
A Deep Learning Approach for Splicing Detection in Digital Audios
549
Fig. 2 Conversion of an audio to spectrogram
4 Proposed Methodology 4.1 Spectrogram Generation and Preprocessing A total of 8400 .wav audio files belonging to the two classes, i.e., original and spliced, from the ADSD are used in this study. The process of dataset generation is described in Sect. 3. The next step is the generation of spectrograms corresponding to every audio. For the spectrogram generation, the audios are subjected to short-time Fourier transforms which convert the signal from the time domain to the frequency domain in the form of power-log spectrograms since they can better approximate the human perception of the sound levels rather than the linear scale. The generated spectrograms are saved as.png images and further subjected to preprocessing to clean and transform the data to obtain uniformity so that it can be utilized efficiently. All images are resized to a fixed target size of 1000 × 500 pixels to maintain uniformity. The pixel values are normalized by dividing the pixel values by 255, i.e., the maximum pixel value so that the new pixel values lie between 0 and 1 (Fig. 2). After preprocessing and normalization, the true labels for the training data are generated, after which it is passed as an input to the CNN model for training and learning of the model. Section 4.2 describes the network architecture in detail. Data augmentation, i.e., horizontal flip and vertical flip is also used which helps to increase the size and diversity of the dataset and helps to prevent overfitting as well. The values for the model hyperparameters are tuned to obtain optimal results for the classification, as discussed in Sect. 5. After the training process, the learned model is obtained, which is then used to make predictions on the test set. Evaluation metrics are then used to check the performance of the model. Figure 3 summarizes the flow of the proposed methodology.
4.2 Network Architecture The proposed CNN model consists of 4 convolution layers, 4 max pooling layers, 1 dropout layer, 1 flatten layer, and 2 dense layers. Figure 4 shows the arrangement of layers in the model. In a convolution layer, the input is taken and the kernel or filter,
550
A. Chuchra et al.
Fig. 3 Proposed method for audio splicing detection
Fig. 4 Proposed CNN architecture
which is a small matrix and generally has dimensions of 3 × 3 or 5 × 5, is added to it to perform some mathematical operation to obtain a feature map. Four convolution layers are used, each Conv2D is a 2-dimensional convolution layer with filter size 3 × 3, where the number of filters used is 16, 32, 64, and 32 in the first, second, third, and fourth convolution layers, respectively. The first-most layer accepts an input image of dimensions 1000 × 500 × 3, where 1000 and 500 correspond to the length and breadth, respectively, and 3 represents the depth, which corresponds to the three color channels namely R, G, and B. The pooling layer is used to reduce the size of the convolved features by creating a down-sampled or “pooled” version of the feature map obtained, which preserves dominant and important structural elements but emits the fine details that might not be useful for the task. Dropout is a regularization technique used to prevent overfitting and making the model more robust, where some randomly selected neurons are dropped out or ignored during training, which makes the rest of the neurons learn in a better way to compensate for the dropped neurons by making good predictions. The dropout layer used has a dropout rate of 0.1. Two dense layers are used, where one dense layer has 32 neurons and the other one is the output layer with 1 neuron for the binary classification. For activation function, rectified linear unit, i.e., ReLU has been used which is the most widely used activation function in deep learning models, especially CNNs due to its good performance and ease of training. ReLU has been used with the convolution
A Deep Learning Approach for Splicing Detection in Digital Audios
551
layers and the dense layers except for the output layer. Since the classification problem in this work is a binary classification problem, the sigmoid function has been used with the output layer.
5 Experimental Results The normalized images after preprocessing are then split into two sets: the training set and the testing set in the ratio 70: 30 approximately, which means 70% of the total data is used to train the CNN model, while the remaining 30% of the data is used to test and evaluate the model’s performance on unseen data. The data is processed in batches of 64 samples. The model is trained for 50 epochs initially, but it leads to overfitting, so the number of epochs is reduced to 30 which gives good results. Various learning rates are tried, a learning rate of 1 × 10–2 makes the model learn too fast, while a learning rate of 1 × 10–4 makes the learning too slow in this case. Finally, a learning rate of 1 × 10–3 is set which gives optimal results. Adam optimizer is used, while using binary cross entropy as the loss function. The proposed system has been implemented in Python language using the TensorFlow [29] framework. All the experiments were run on Google Colaboratory which provides GPU with 12 GB of memory.
5.1 Performance Evaluation Metrics A confusion matrix is a table that describes the performance of a classification model on a set of test data for which the true labels are known, in terms of true positives (prediction and actual both are positive), true negatives (prediction and actual both are negative), false positives (prediction is positive but actually negative), and false negatives (prediction is negative but actually positive). With the help of TP, TN, FN, and FP, other performance metrics can be calculated. Classification accuracy is defined as the number of correct predictions out of the total predictions made. It is usually represented as a percentage and is given as Accuracy =
TP + TN × 100 TP + TN + FP + FN
(1)
Precision aims to find out that out of all the positive predicted, what percentage is truly positive. Recall aims to find out that out of the total actual positives, what percentage is predicted as positive. Precision and recall are given as Precision =
TP TP + FP
(2)
552
A. Chuchra et al.
Recall =
TP TP + FN
(3)
The accuracy is easy to understand, but not robust when the data are unevenly distributed, or where there is a higher cost associated with a particular type of error (Type I or Type II). F1 score is the harmonic mean of precision and recall which is a more balanced metric to evaluate a model on, as it takes both false positives and false negatives into account. The values of precision, recall, and F1 score lie between 0 and 1. F1 score =
1 Precision
2 +
1 Recall
=
2∗(Precision*Recall) (Precision + Recall)
(4)
5.2 Results The proposed model was empirically evaluated with different parameter values to obtain optimal learning outcome. While experimenting with different parameter values, the model was initially trained for 50 epochs, but it was observed that the model suffered from overfitting. Hence, the number of epochs was reduced to 30. At the end of 30 epochs, the training accuracy of 0.9716 (or 97.16%) was observed. Figure 5 summarizes the training accuracy w.r.t 30 epochs. The trained model is evaluated on the test set containing a total of 2520 audios belonging to the two classes. Figure 6 shows the results obtained over the test data in the form of a confusion matrix. Considering the detection of the forged audios as “positive,” the following observations can be made from the obtained confusion matrix: TP = 1170, TN = 1175, FP = 85, FN = 90. The model performs well on the test data yielding a high classification accuracy of 93.05%. The values of precision and recall obtained are 0.9322 and 0.9285, respectively. The model obtains an F1 score of 0.9302, which Fig. 5 Training accuracy w.r.t. 30 epochs
A Deep Learning Approach for Splicing Detection in Digital Audios
553
Fig. 6 Confusion matrix summarizing the classification results obtained
indicates that the overall performance of the proposed model is very good for the detection of splicing in audios (Table 2). In order to understand the performance of the model over the three durations individually, the model evaluation is carried on three individual datasets containing 1 s, 2 s, and 3 s forgeries, respectively. The number of forged audios in these datasets is 400, 420, and 420, respectively. The accuracies obtained are summarized in Table 3. It can be observed that the proposed model performs the best for 3 s forgeries, however, the accuracies obtained for 1 s and 2 s forgeries are also quite good. Table 2 Summary of the results obtained
Table 3 Test accuracies over different forgery durations
Performance metric
Results
Accuracy (%)
93.05
Precision
0.9322
Recall
0.9285
F1 score
0.9302
Forgery type
Accuracy (%)
1s
91.77
2s
92.86
3s
94.99
554
A. Chuchra et al.
5.3 Discussion In the past decade, researchers in the domain of audio forensics have proposed many different algorithms based on different properties of audios to check the audios for any forgeries that they might have gone through. For example, the work presented in [7] based on ENF analysis using supervised learning was able to achieve good performance with an error rate of 8%. The study in [10] that aimed to determine the microphone model used for a given audio recording based on frequency-domain features, achieved an accuracy of 93.5%. Authors in [11] were able to check the presence of footprints from more than one microphones in an audio with microphone classification accuracies ranging from 94 to 99%, depending on the encoding of the audio. However, only 64 tampered samples were used for evaluation, which makes it difficult to estimate the generalizability of their algorithm. [18] proposed an audio splicing detection algorithm using local noise estimated for each syllable in the audio that was able to achieve sample false positive rate as low as 0.21. Another algorithm presented by Rouniyar et al. [19] focused on splicing detection and was able to achieve a precision of 0.8452. These algorithms discussed above require manual feature extraction, which is then followed by training a classifier. Algorithms based on deep learning include [12], who were able to identify the recording device with the help of deep learning with an accuracy of 90%. Due to lack of standardized audio datasets containing forged audios, the researchers need to create their own set of forged audios for training and evaluation purposes. This makes it very difficult to compare the efficiency of different algorithms directly due to the lack of any common grounds. Hence, we compare our work with that presented in [25] because of its similarity to our work, as the work presented in this paper can be considered as an extension to their work. Jadhav et al. [25] first used deep CNN for audio splicing detection and obtained good accuracies of 82.80%, 87.54%, and 96.67% for the detection of 1 s, 2 s, and 3 s splicing, respectively. However, the forged audios in the dataset created and used in their work for the splicing detection contain the spliced segment only in the middle of the audios, i.e., after the 5th digit. In real life, it is not the case to have forgery always in the middle. The proposed model in this work overcomes this limitation as it can detect splicing at any location within the audio. Moreover, the model achieves a high accuracy of 93.05% to detect splicing of any duration. It can be observed from Fig. 7 that the proposed method offers an improved overall classification performance for the three spliced durations as compared to that in [25], keeping in mind that the proposed model in this work can detect forgery at any location within an audio. Principles of information fusion can be further applied to augment the robustness and reliability of tamper detection in passive-blind manner, as described by Kaur and Gupta [30] for images. In future, the model can be used in collaboration with video tamper detection module via an ensemble architecture to improve the trustworthiness in multimedia tamper detection.
A Deep Learning Approach for Splicing Detection in Digital Audios
555
Splicing Detection Accuracy (%) 100 90
91.77 82.8
87.54
92.86
96.67 94.99 CNN model in [25]
80
Proposed Model
70 1 sec Forgery
2 sec Forgery
3 sec Forgery
Fig. 7 Comparison between the accuracies obtained by the proposed model versus [25]
6 Conclusion With such a tremendous amount of audio-visual media being generated and shared across the Internet, it becomes really important to verify the authenticity and integrity of the data. Audio data can serve the purpose of evidence in many criminal investigations and legal cases. Hence, it becomes necessary to check the credibility of the audio evidence to derive fair conclusions. In this work, a system for splicing detection in digital audios has been proposed which makes the use of deep learning. Not only the model can achieve high accuracy, but the use of deep learning also eliminates the need for explicit feature identification and extraction, which is a very tedious and time-consuming task in itself, by automatically extracting and learning features from the input data. Spectrograms corresponding to the audios are used to train a deep convolutional neural network, thus reducing the problem to an image classification problem. A total of 8400 audios (4200 “original” and 4200 “forged” audios) have been used to carry out the training and evaluation of the system. The proposed model achieves a decent accuracy of 93.05%, while the precision and the recall are 0.9322 and 0.9285, respectively. The model successfully detects splicing of different durations, i.e., 1 s, 2 s, and 3 s in the 10 s audios. The proposed model works well for the audios having forgeries at any location within the audio (Table 4). The results obtained are convincing enough to motivate the usage of deep learning over the conventional approaches in the domain of digital audio forensics. However, in certain applications that might demand detailed knowledge of the extracted features, the deep learning approaches might not prove to be helpful since the implicit feature extraction makes it difficult to identify the significance of the features used. In addition to this, detection and localization of other forgeries like deletion and copy-move forgery can be another challenge for future researchers.
556
A. Chuchra et al.
Table 4 Comparison of our work with [25] Parameter
Model in [25]
Proposed model
Dataset
4000 original, 4400 spliced audios created using FSDD, each of 10 s
4200 original, 4200 spliced audios created using FSDD, each of 10 s approx
Location of forgery within the audios
Middle of the audios
Anywhere within the audios
Duration of forgeries
1,2,3 s
1,2,3 s (approx.)
Classification accuracy % (1 s forgery)
82.80
91.77
Classification accuracy % (2 s forgery)
87.54
92.86
Classification accuracy % (3 s forgery)
96.67
94.99
Accuracy over mixed dataset % –
93.05
References 1. Maher RC (2010) Overview of audio forensics’. In: Sencar HT, Velastin S, Nikolaidis N, Lian S (eds) Intelligent multimedia analysis for security applications, vol 282. Springer Berlin Heidelberg, Berlin, Heidelberg, pp 127–144. https://doi.org/10.1007/978-3-642-11756-5_6 2. Teerakanok S, Uehara T (2017) Digital media tampering detection techniques: an overview. In: 2017 IEEE 41st annual computer software and applications conference (COMPSAC), Turin, pp 170–174. https://doi.org/10.1109/COMPSAC.2017.109 3. Hua G, Huang J, Shi YQ, Goh J, Thing VLL (2016) Twenty years of digital audio watermarking—a comprehensive review. Signal Process 128:222–242. https://doi.org/10.1016/j.sig pro.2016.04.005 4. Yang X, Wu X, Zhang M (2009) Audio digital signature algorithm with tamper detection. In: 2009 fifth international conference on information assurance and security, Xi’An China, pp 15–18. https://doi.org/10.1109/IAS.2009.258 5. Grigoras C (2005) Digital audio recording analysis–the electric network frequency criterion. Int J Speech Language Law 12(1):63–76 6. Hua G, Zhang Y, Goh J, Thing VLL (2016) Audio authentication by exploring the absoluteerror-map of ENF signals. IEEE Trans Inform Forensic Secur 11(5):1003–1016. https://doi. org/10.1109/TIFS.2016.2516824 7. Lin X, Kang X (2017) Supervised audio tampering detection using an autoregressive model. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), New Orleans, LA, pp 2142–2146. https://doi.org/10.1109/ICASSP.2017.7952535 8. Malik H, Farid H (2010) Audio forensics from acoustic reverberation. In: 2010 IEEE international conference on acoustics, speech and signal processing, Dallas, TX, USA, pp 1710–1713. https://doi.org/10.1109/ICASSP.2010.5495479 9. Zhao H, Chen Y, Wang R, Malik H (2014) Audio source authentication and splicing detection using acoustic environmental signature. In: Proceedings of the 2nd ACM workshop on information hiding and multimedia security—IH&MMSec ’14, Salzburg, Austria, pp 159–164. https://doi.org/10.1145/2600918.2600933 10. Buchholz R, Kraetzer C, Dittmann J (2009) Microphone classification using fourier coefficients. In: Katzenbeisser S, Sadeghi AR (eds) Information hiding, vol 5806, Springer Berlin Heidelberg, Berlin, Heidelberg, pp 235–246. https://doi.org/10.1007/978-3-642-04431-1_17
A Deep Learning Approach for Splicing Detection in Digital Audios
557
11. Cuccovillo L, Mann S, Tagliasacchi M, Aichroth P (2013) Audio tampering detection via microphone classification. In: 2013 IEEE 15th international workshop on multimedia signal processing (MMSP), Pula (CA), Italy, pp 177–182. https://doi.org/10.1109/MMSP.2013.665 9284 12. Qi S, Huang Z, Li Y, Shi S (2016) Audio recording device identification based on deep learning. In: 2016 IEEE international conference on signal and image processing (ICSIP), Beijing, China, pp 426–431. https://doi.org/10.1109/SIPROCESS.2016.7888298 13. Bianchi T, Rosa AD, Fontani M, Rocciolo G, Piva A (2014) Detection and localization of double compression in MP3 audio tracks. EURASIP J Info Secur 2014(1):10. https://doi.org/ 10.1186/1687-417X-2014-10 14. Ren Y, Fan M, Ye D, Yang J, Wang L (2016) Detection of double MP3 compression based on difference of calibration histogram. Multimed Tools Appl 75(21):13855–13870. https://doi. org/10.1007/s11042-015-2758-3 15. Yan D, Wang R, Zhou J, Jin C, Wang Z (2018) Compression history detection for MP3 audio. KSII TIIS, vol 12(2). https://doi.org/10.3837/tiis.2018.02.007 16. Chen J, Xiang S, Liu W, Huang H (2013) Exposing digital audio forgeries in time domain by using singularity analysis with wavelets. In: Proceedings of the first ACM workshop on Information hiding and multimedia security, pp 149–158 17. Lin X, Kang X (2017) Exposing speech tampering via spectral phase analysis. Digital Signal Process 60:63–74. https://doi.org/10.1016/j.dsp.2016.07.015 18. Meng X, Li C, Tian L (2018) Detecting audio splicing forgery algorithm based on local noise level estimation. In: 2018 5th international conference on systems and informatics (ICSAI), Nanjing, pp 861–865. https://doi.org/10.1109/ICSAI.2018.8599318 19. Rouniyar SK, Yingjuan Y, Hu Y (2018) Channel response based multi-feature audio splicing forgery detection and localization. In: Proceedings of the 2018 international conference on Ebusiness, information management and computer science—EBIMCS ’18, Hong Kong, Hong Kong, pp 46–53. https://doi.org/10.1145/3210506.3210515 20. Yan Q, Yang R, Huang J (2015) Copy-move detection of audio recording with pitch similarity. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), South Brisbane, Queensland, Australia, pp 1782–1786. https://doi.org/10.1109/ICASSP.2015. 7178277 21. Yan Q, Yang R, Huang J (2019) Robust copy-move detection of speech recording using similarities of pitch and formant. IEEE Trans Inform Forensic Secur 14(9):2331–2341. https://doi. org/10.1109/TIFS.2019.2895965 22. Li C, Sun Y, Meng X, Tian L (2019) Homologous audio copy-move tampering detection method based on pitch. In: 2019 IEEE 19th international conference on communication technology (ICCT), Xi’an, China, pp 530–534. https://doi.org/10.1109/ICCT46805.2019.8947002 23. Korycki R (2010) Methods of time-frequency analysis in authentication of digital audio recordings. Int J Electron Telecommun 56(3):257–262. https://doi.org/10.2478/v10177-0100033-0 24. Korycki R (2013) Time and spectral analysis methods with machine learning for the authentication of digital audio recordings. Forensic Sci Int 230(1–3):117–126. https://doi.org/10.1016/ j.forsciint.2013.02.020 25. Jadhav S, Patole R, Rege P (2019) Audio splicing detection using convolutional neural network. In: 2019 10th international conference on computing, communication and networking technologies (ICCCNT), Kanpur, India, pp 1–5. https://doi.org/10.1109/ICCCNT45670.2019.894 4345 26. Maher RC (2018) Principles of forensic audio analysis. Springer International Publishing, Cham. https://doi.org/10.1007/978-3-319-99453-6 27. Huijbregtse M, Geradts Z (2009) Using the ENF criterion for determining the time of recording of short digital audio recordings. In: Geradts ZJMH, Franke KY, Veenman CJ (eds) Computational forensics, vol 5718. Springer Berlin Heidelberg, Berlin, Heidelberg, pp 116–124. https:// doi.org/10.1007/978-3-642-03521-0_11
558
A. Chuchra et al.
28. Jackson Z (2020) Free spoken digit dataset (FSDD). https://github.com/Jakobovski/free-spo ken-digit-dataset 29. ‘TensorFlow’. https://www.tensorflow.org/. Accessed 25 March 2021 30. Kaur M, Gupta S (2019) A fusion framework based on fuzzy integrals for passive-blind image tamper detection. Cluster Comput 22(S5):11363–11378. https://doi.org/10.1007/s10586-0171393-3
Multi-criteria Decision Theory-Based Cyber Foraging Peer Selection for Content Streaming Parisa Tabassum, Abdullah Umar Nasib, and Md. Golam Rabiul Alam
Abstract COVID-19 has made it necessary for educational institutes to make their materials available online. Having access to these vast amounts of knowledge and learning materials can benefit students outside of these institutes greatly. With that in mind, this paper proposes a cyber foraging system. The proposed system is a peerto-peer streaming system for educational institute content streaming that selects the best peers based on eight decision criteria. Judgments from experts are used as data to assign relative weights to these criteria using the Fuzzy Analytical Hierarchy Process method. Finally, the criteria are ranked based on the assigned relative weights to find out their importance in the peer selection decision making process. Keywords Mobile P2P streaming · Cyber foraging · P2P · FAHP · Peer selection · MADM
1 Introduction In the past few months, the world has been shaken by COVID-19 [1, 2]. To keep the system running, most of the institutes have moved online and that includes educational institutes [3]. Most, if not all, of the educational institutes are still proceeding online to ensure the safety of the students and the staff. Unreliability of the Internet has inspired the teachers to make their lectures available online. Study materials are also being posted online out of necessity and ease of work. All these materials being available have opened a new door to knowledge sharing. Motivated by this, this paper
P. Tabassum (B) · A. U. Nasib · Md. G. R. Alam BRAC University, 66 Mohakhali, Dhaka, Bangladesh e-mail: [email protected] A. U. Nasib e-mail: [email protected] Md. G. R. Alam e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. Saraswat et al. (eds.), Congress on Intelligent Systems, Lecture Notes on Data Engineering and Communications Technologies 114, https://doi.org/10.1007/978-981-16-9416-5_40
559
560
P. Tabassum et al.
proposes a cyber foraging system to stream content from different educational institutes. Cyber foraging is a technique for a device to offload computation to another device [4]. Using cyber foraging technique, devices with low resources can share the load of some of their work with devices that have high resources. Streaming content is one of its many uses [5]. The proposed system will not only benefit the institute’s own students but also the students from all over the world [6]. According to this system, if an educational institute agrees to be a part of the system, their students will be allowed to keep the institute’s online educational content in their device storage and make it available to be streamed by others. The system will follow the ‘Neighbor Peer Selection Scheme Based on Effective Capacity for Mobile Peerto-Peer Streaming’ [7] to choose a suitable streaming peer. Furthermore, this paper proposes 8 criteria that the system will use while selecting a suitable peer. The presence of multiple criteria makes this a Multiple-criteria decision making (MCDM) problem which indicates a technique that assesses multiple conflicting criteria to take decisions [8]. One of the techniques of MCDM is the Fuzzy Analytic Hierarchy Process (FAHP) method [9]. In this paper, FAHP method is performed to assign relative weights to the criterions using expert decisions and rank them based on their importance. The FAHP method is used to overcome the uncertainty that comes with the classical Analytic Hierarchy Process (AHP) method. The paper starts with introduction in the first section and continues with related works in Sect. 2. After that Sect. 3 states the evaluation criteria for suitable peer selection. Following that Sect. 4 explains the system model while Sect. 5 introduces the methods and steps followed in this paper. Section 6 provides Result Analysis which contains calculation and result. Finally, the last section concludes the paper by demonstrating an overall summary of this paper accompanied by the scopes of future contribution to improve the method. The system proposed in this paper will allow students access and learn from the educational materials provided by the institutions they are not enrolled in, helping them make the best out of online classes. Moreover, this paper successfully ranks the criteria that are used to select the best streaming peer for the proposed system.
2 Related Work In recent years, there has been a notable number of researches in this field. One of them deals with Multiple Attribute Decision Making (MADM) theory, which considers different factors of possible peers, which also includes Signal to Interference and Noise Ratio (SINR), residency time, security, moving speed, power level, and effective capacity. This proposed model could enhance the stability of ECPS for wireless network and mobile network environments. However, the efficiency of ECPS approaches needs to be implemented in real experimental set-ups. Along with the enhancement of communication technology, there is always the possibility to have numbers of fresh mobile networks and environments that might cause serious
Multi-criteria Decision Theory-Based Cyber Foraging Peer …
561
heterogeneousness among the scheme. This proposed method does not provide a satisfactory answer to this scenario [7]. In another similar research, the authors proposed a P2P application framework, which represents the combination of an abstraction model and implementation in an experimental environment. This approach resolves the possible complexity which can be arised in developing P2P applications by providing a set of generic and services which are protocol independent applications. These services demonstrate common P2P functions that developers would do in general. Otherwise, they need to implement it by themselves. The authors believed that the introduction of an easy to use structure will enable the implementation personnel to focus further on developing applications rather than peer-to-peer characteristics functions. In addition, the authors expected that the proposed method might motivate the developer to develop more and more applications. They hoped that it would also make the use of P2P technologies reliable and easy to use within business environments. Although this approach can gain a good performance in wired P2P networks, it may fall in the mobile network because of its complicated wireless environment characters. Moreover, differences in equipment capability in terms of mobile device power, security, and position management complicate the situation more than usual [10]. Thus, this proposed method does not provide a concrete solution to the problems this very research has been done. In another paper, the researchers proposed a distribution algorithm which is for scheduling senders for numerous sources of multimedia transmission sources of multimedia transmission in wireless mobile Peer-to-Peer networks. The proposed model maximizes the data rate as well as minimizes the consumption of power for wireless mobile P2P applications. In mostly used wireless networks, the channels of wireless mobile Peer-to-Peer networks experience time differs significantly and results in user dependency. In this regard, choosing the optimal sender for multisource multimedia transmission in wireless mobile P2P networks can be a suitable strategy for maximizing data throughput. Also, another important issue in wireless mobile devices is system resource constraints. Being able to select the best sender, these resource limits will be dealt with in a convenient way. Lastly, in wireless mobile Peer-to-Peer networks, there is no centralized control point. As a result, the sender scheduling scheme would be distributed evenly. However, due to several evident differences such as time changing wireless channels and specific properties of mobile terminals, it is difficult to directly enhance the aforementioned systems to mobile P2P networks [11]. A good number of enhancements in existing Peer-to-Peer rules is proposed to cut down on energy use in another approach. The authors review the most popular solutions in the existing works, focusing on three types of P2P systems and applications which are file sharing and distribution, content streaming, and epidemics. In addition, researchers discuss open concerns and potential research directions for each type of P2P system. However, they are designed to concentrate on Peer-to-Peer content streaming thus constraints in streaming services have not been taken into enough consideration [12].
562
P. Tabassum et al.
This work uses the suitable streaming peer selection method implemented in [7] to select a suitable peer for the proposed system. To improve the quality of the selection process, two new decision making criteria was added along with the six used in the original peer selection method.
3 Decision Criterion The proposed system will take 8 criteria into consideration while making decisions. Each of these criteria plays an important role to ensure successful data transmission. Figure 1 shows the relationship between the criterion and the amount of influence they have on each other and successful data transmission. The criterion can be explained as: SINR. The signal to interference plus noise ratio represents the quality of different wireless channels the peer can use while providing the service. Residency Time. This factor stands for the stability of P2P peers as each peer can join or leave from the system at any point in time. Power Level. Power level of the peer device is an important factor because a peer with low power level can shut down at any moment. Moving Speed. Link break and terminal unavailability may occur if a peer moves too fast. Security. Low security of peers might cause pollution files or loss of data. So, a high security level is important to make sure effective data transmission.
Fig. 1 Relations between different peer factors
Multi-criteria Decision Theory-Based Cyber Foraging Peer …
563
Effective Capacity. The maximum amount of work a system is capable of doing in a given time because of its limitations is denoted by Effective capacity. Contributor. Contributes allow data from their devices to be streamed. The more streamable data an user provides, the more bandwidth he/she gets allocated while streaming. Data Flagged. Users can flag inappropriate data or data from outside listed organizations. If at least 15% viewers flag any content, the content will be deleted from streamable content list.
4 The Proposed System Model The proposed system will use the following devices for P2P data transmission: Mobile P2P Streaming Service Server (SSS). It performs the selection procedure. This device is installed in every service area. Peers with P2P service. Peers are assigned to or removed from the same service area at random. Each peer in the network can be either a client or a server. Streaming Cache Server (SCS). If there is no peer that is suitable to be a neighbor peer, the SCS will provide the data directly to the data requesting peer. Figure 2 illustrates the system model. The steps the system will use for peer selection and content streaming are presented as follows: • User will log in using ID and password. • Dashboard: Add files to list: Users can add data they will allow to stream from their device to the list. Add files to list: Select files for streaming: The website/app will give users a list of files for streaming. • The user will select data he/she wants to stream and start a data requesting session. • The user will send the request message to its residency network. The SSS will begin the process of selecting a suitable peer. When a peer joins in the mobile cell, the SSS will collect its necessary parameters (Effective capacity, SINR, Residency time, Power level, Security, Moving speed, Contributor, Data flagged). The ideal and negative ideal peers determine the relative proximity to the ideal peer calculation. Finally, select the appropriate peers as the member of the set of suitable neighbor peers. • If there is no P2P peer that owns the demanding data or the demanding data owner peers do not qualify to be suitable peer, the data will be delivered to the demanding peer directly via the SCS. If any available peers enter the network during the data delivery process by SCS, SSS will develop a connection between the user and the new neighbor peer set, interfering with the service offered by SCS.
564
P. Tabassum et al.
Fig. 2 System model of the proposed cyber foraging peer selection
• Data of only member organizations will be listed. Users can flag data outside of member organizations. The flagged data will be checked and deleted manually if its from an organization that is not a member of the service. Otherwise, the data will stay up unless the organization asks to take it down. • Streaming speed: Check if the user demanding the data is a contributor of the service or not. If the user is not a contributor, selected bandwidth by the algorithm will be given to the user. If the user is a contributor, then increase bandwidth in accordance with amount of contribution will be provided. • Users can flag inappropriate content. If at least 15% viewers flag the data for inappropriate content, it will be deleted automatically from the list.
Multi-criteria Decision Theory-Based Cyber Foraging Peer …
565
5 Methodology An experimental setup is necessary to assign values to the 8 criteria (Effective capacity, SINR, Residency time, Power level, Security, Moving speed, Contributor, Data flagged) used in this paper for alternative devices. The experiment would be expensive to set up. Thus, as an alternative, decision makers’ opinions are used in this paper as the FAHP method uses judgements from experts to perform calculation [13]. Ten computer science post-graduate students performed the role of decision makers for this paper. The methodology is divided into 3 sub-sections which are described below.
5.1 Analytic Hierarchy Process Multiple-criteria decision making or MCDM deals with problems where a choice needs to be made from multiple criteria based on their respective importance. Analytic Hierarchy Process is an effective method to solve these types of problems. It derives priority scales through pairwise comparison of given criterion by assigning weights to each decision criteria of a hierarchy depending on the judgments from experts. The procedure of assigning weights of different criteria using AHP method was explained in [14, 15]. The nine-point preference scale to quantify judgment also known as Saaty scale used in order to achieve pairwise comparison are listed in Table 1. AHP depends on the judgment of decision makers and does not consider the uncertainty that comes with it [16]. For decision making, the method provides crisp values. Moreover, the ranking is imprecise. Therefore, the Fuzzy AHP model is proposed as it is associated with quantifying and reasoning out imprecise and vague terms that appear in human languages [17]. Table 1 Linguistic terms and AHP preference scale for pair-wise comparison
Saaty scale
Definition (importance of 2 criteria being compared)
1
Equally important (Eq. Imp.)
3
Weakly important (W. Imp.)
5
Fairly important (F. Imp.)
7
Strongly important (S. Imp.)
9
Absolutely important (A. Imp.)
2 4 6 8
Intermediate values between two adjacent scales (I. Val.)
566
P. Tabassum et al.
Table 2 Linguistic terms and the corresponding triangular fuzzy numbers Saaty scale
Fuzzy triangular scale
Definition (importance of 2 criterion being compared)
1
(1,1,1)
Equally important (Eq. Imp.)
3
(2,3,4)
Weakly important (W. Imp.)
5
(4,5,6)
Fairly important (F. Imp.)
7
(6,7,8)
Strongly important (S. Imp.)
9
(9,9,9)
Absolutely important (A. Imp.)
2 4 6 8
(1,2,3) (3,4,5) (5,6,7) (7,8,9)
Intermediate values between two adjacent scales (I. Val.)
5.2 Fuzzy Analytic Hierarchy Process An extension of the AHP method is the Fuzzy AHP method [18]. To cope with the uncertainties that occur with the AHP method, the Fuzzy AHP method uses fuzzy comparison ratios that are defined by triangular membership functions. Triangular Fuzzy Numbers (TFNs) are used to express the weights of the AHP preference scale to portray the relative significance of the criterion. The values are represented as (l, m, u) where ‘u’ denotes upper point, ‘m’ denotes middle point and ‘l’ denotes lower point. Table 2 shows Linguistic Terms and the Corresponding Triangular Fuzzy Numbers.
5.3 Procedure One application of the Fuzzy AHP method is assigning relative weights of different criterion of any MCDM problem. The weights can then be used to rank the criterion. The steps for weight calculation and ranking of criterion are as follows: Step 1. Describing the hierarchy of the problem with the goal at upper level and criteria at the lower level. Step 2. Construct a pairwise comparison matrix by comparing decision criteria and using the Saaty scale from Table 1 to establish priorities among them. The dimension of the matrix is No. of Criterion x No. of Criterion. The values that fill-up the matrix depend on the judgments of experts. Step 3. Construct the Fuzzified pair-wise comparison matrix. Replace the crisp numeric values in the pairwise comparison matrix with fuzzy numbers using Table 2. To convert the reciprocal values to fuzzy numbers use the following formula: (l, m, u)
−1
=
1 1 1 , , u m l
(1)
Multi-criteria Decision Theory-Based Cyber Foraging Peer …
567
Step 4. Calculate the respective weights of the decision making criterion using geometric mean. The formula for calculating geometric mean is: r = (l1 ∗ l2 ∗ . . . ∗ ln )1/n , (m 1 ∗ m 2 ∗ . . . ∗ m n )1/n , (u 1 ∗ u 2 ∗ . . . ∗ u n )1/n (2) Next, find the geometric mean summation value using the following formula: r1 + r2 . . . rn = (l1 + l2 + . . . + ln ), (m 1 + m 2 + . . . + m n ), (u 1 + u 2 + . . . + u n ) (3) Find the reciprocal of the geometric mean using Eq. (1). Multiply each geometric mean with the reciprocal of geometric mean summation to find the weight of each criterion. The formula for multiplying fuzzy numbers is: Wn = A 1 × A 2 × . . . A n = (l1 ∗ l2 ∗ . . . ∗ ln ), (m 1 ∗ m 2 ∗ . . . ∗ m n ), (u 1 ∗ u 2 ∗ . . . ∗ u n )
(4)
De-fuzzify the fuzzy weights using Center of Area (COA) method: Wn =
l +m+u 3
(5)
Add all the criteria weights to find Total (Criteria Weight). If Total (Criteria Weight) is not equal to 1, then normalize the weights to get the weight total as 1. Step 5. Rank the criterion using the relative weights from Step 4. The higher the weight of the criterion, the higher their rank will be. The highest weighted criterion will be ranked 1 and the lowest weighted criterion will be the last ranking criterion.
6 Result Analysis Step 1. Figure 3 shows the hierarchical structure of the system proposed by the paper. Step 2. Table 3 is the pairwise comparison matrix using judgments from decision makers and Table 1. Here, Number of criteria is 8. So, the dimension of the pairwise comparison matrix is 8 × 8.
Fig. 3 The hierarchical structure
568
P. Tabassum et al.
Table 3 Pairwise comparison matrix in AHP Cr
C1
C2
C3
C4
C5
C6
C7
C8
C1
1
1
2
1/5
6
1/3
1/7
4
C2
1
1
2
6
6
1/2
9
7
C3
1/2
½
1
1/7
8
9
1/4
9
C4
5
1/6
7
1
1
1/2
3
8
C5
1/6
1/6
1/8
1
1
3
3
1/7
C6
3
2
1/9
2
1/3
1
5
5
C7
7
1/9
4
1/3
1/3
1/5
1
2
C8
1/4
1/7
1/9
1/8
7
1/5
1/2
1
Step 3. Table 4 is the Fuzzified pair-wise comparison matrix constructed by replacing crisp numeric values from Table 2 with fuzzy values from Table 1 using Eq. (1). Step 4. Table 5 shows Fuzzy Geometry Mean, geometric mean summation values, Fuzzy weight, defuzzified weight calculated using Eqs. (2), (3), (4) and (5), respectively. It also contains the normalized weight. Step 5. In Table 6, we assigned ranks to the criterion based on their relative weights. In the Tables, criterias are denoted as followings, SINR: C1, C2: Security, C3: Effective Capacity, C4: Contributor, C5: Residency Time, C6: Power Level, C7: Moving Speed, and C8: Data Flagged. Moreover, in Table 5, F1 represents Fuzzy Geometry Mean, F2 is for Fuzzy Weight, and F3: Weight, F4: Normalized Weight. Table 4 Fuzzified pairwise comparison matrix Cri
C1
C2
C3
C4
C5
C6
C7
C8
C1
(1, 1, 1)
(1, 1, 1)
(1, 2, 3)
(1/6, 1/5, 1/4)
(5, 6, 7)
(1/4,1/3, 1/2)
(6, 7, 8)
(3,4, 5)
C2
(1, 1, 1)
(1, 1, 1)
(1, 2, 3)
(5, 6, 7)
(5, 6, 7)
(1/3, 1/2, 1)
(9, 9, 9)
(6,7,8)
C3
(1/3, 1/2, 1)
(1/3, 1/2, 1)
(1, 1, 1)
(1/8, 1/7, 1/6)
(7, 8, 9)
(9, 9, 9)
(3, 4, 5)
(9, 9, 9)
C4
(4, 5, 6)
(1/7, 1/6, 1/5)
(6, 7, 8)
(1, 1, 1)
(1, 1, 1)
(1/3,1/2, 1)
(2, 3, 4)
(7, 8, 9)
C5
(1/7, 1/6, 1/5)
(1/7, 1/6, 1/5))
(1/9, 1/8, 1/7)
(1, 1, 1)
(1, 1, 1)
(2, 3, 4)
(2, 3, 4)
(6, 7, 8)
C6
(2, 3, 4)
(1, 2, 3)
(1/9, 1/9, 1/9)
(1, 2, 3)
(1/4, 1/3, 1/2)
(1, 1, 1)
(4, 5, 6)
(4, 5, 6)
C7
(6, 7, 8)
(1/9, 1/9 1/9)
(3, 4, 5)
(1/4, 1/3, 1/2)
(1/4, 1/3, 1/2)
(1/6, 1/5, 1/4)
(1, 1, 1)
(1, 2, 3)
C8
(1/5, 1/4, 1/3)
(1/8, 1/7, 1/6)
(1/9, 1/9, 1/9)
(1/9, 1/8, 1/7)
(6, 7, 8)
(1/6, 1/5, 1/4)
(1/3, 1/2, 1)
(1,1, 1)
Multi-criteria Decision Theory-Based Cyber Foraging Peer …
569
Table 5 Weight calculation Cr
F1
F2
F3
F4
C1
(1.17, 1.47, 1.79)
(0.13, 0.13, 0.13)
0.130
0.130
C2
(2.14, 2.63, 3.18)
(0.24, 0.24, 0.24)
0.240
0.241
C3
(1.48, 1.76, 2.23)
(0.17, 0.16, 0.17)
0.167
0.168
C4
(1.41, 1.70, 2.08)
(0.16, 0.16, 0.16)
0.160
0.160
C5
(0.69, 0.83, 0.96)
(0.08, 0.08, 0.07)
0.077
0.077
C6
(0.99, 1.35, 1.71)
(0.11, 0.12, 0.13)
0.120
0.120
C7
(0.62, 0.78, 0.98)
(0.07, 0.07, 0.07)
0.070
0.070
C8
(0.32, 0.37, 0.45)
(0.04, 0.03, 0.03)
0.033
0.033
Total
(8.82, 10.89, 13.38)
0.997
1.0
Table 6 Ranking of criteria
Criteria
Normalized weight
Rank
SINR
0.130
4
Security
0.241
1
Effective capacity
0.168
2
Contributor
0.160
3
Residency time
0.077
6
Power level
0.120
5
Moving speed
0.070
7
Data flagged
0.033
8
Total
Total: 1.0
Table 3 shows the pairwise comparison matrix. Each cell of the matrix is a pairing of two criteria. The number in each cell denotes which criteria among the two respective row and column of the cell is preferred over the other. The table has been filled out using the expert opinions. If two criteria have the same priority, then the number in their intersecting cell is 1. If the number is bigger than 1, then the criteria representing the row has higher priority than the criteria representing the column. If the number is smaller than 1, then the criteria representing the column has higher priority than the criteria representing the row. Table 5 shows the ranking of criteria based on the normalized weight. The higher the normalized weight, the higher the rank of criteria. From the table, it can be seen that Security is the highest ranking thus the most important criteria for decision making followed by Effective Capacity, Contributor, SINR, Power Level, Residency Time, Moving Speed, and Data Flagged.
570
P. Tabassum et al.
7 Conclusion In this paper, a system for streaming educational institution content was proposed. First, eight criteria for this system were established. Second, the corresponding weights of the criterion were found using the FAHP technique. FAHP was used to overcome the uncertainty inherent in the AHP method. As the AHP technique cannot deal with unpredictability that comes with the opinion of decision makers, Fuzzy AHP technique was used to overcome this issue. Finally, the weights were used to rank the criterion. According to the ranking ‘Security’ is the most important decision making criteria for the proposed system. In future, different MCDM methods can be applied to criteria values for multiple alternative devices and the results can be compared to select the best MCDM technique for the problem described in this paper.
References 1. Nwokolo C, Ogbuagu M, Iwegbu O (2020) Impact of coronavirus pandemic on the global economy: demand and supply shock. SSRN Electron J. https://doi.org/10.2139/ssrn.3657067 2. Chaudhary S (2020) Coronavirus: impact on Chinese and global economy 3. Dong J (2020) Online learning and teaching experiences during the COVID-19 pandemic: a case study of Bangladeshi students receiving China’s higher education. English Linguistics Res 9(2). https://doi.org/10.5430/elr.v9n2p37 4. Balan R, Flinn J, Satyanarayanan M, Sinn Mohideen S, Yang H (2002) The case for cyber foraging. In: Proceedings of the 10th workshop on ACM SIGOPS European workshop 5. Lawton G (2012) Cloud streaming brings video to mobile devices. IEEE Comput 45:14–16. https://doi.org/10.1109/MC.2012.47 6. Brecht H (2012) Learning from online video lectures. J Inf Technol Educ Innovations Pract 11. https://doi.org/10.28945/1712 7. Hailun X, Ning W, Zhimin Z (2013) Neighbour peer selection scheme based on effective capacity for mobile peer-to-peer streaming, China communications, 89–98 8. Triantaphyllou E (2000) Multi-criteria decision making methods: a comparative study. https:// doi.org/10.1007/978-1-4757-3157-6 9. Singh A (2014) Major MCDM techniques and their application—a review. IOSR J Eng 4, 15–25 10. Walkerdine J, Hughes D, Rayson P (200/) A framework for P2P application development. Comput Commun 31(2):387–401 11. Si P, Yu FR, Ji H (2009) Distributed sender scheduling for multimedia transmission in wireless mobile peer-to-peer net-works. IEEE Trans Wireless Commun 8(9):4594–4603 12. Leung AK, Kwok Y (2005) On topology control of wireless peer-to-peer file sharing networks: energy efficiency, fairness and incentives. In: Proceedings of the 6th IEEE international symposium on world of wireless mobile and multimedia networks (WoWMoM). IEEE Press, Taormina, Italy 13. Liu Y, Eckert C, Earl C (2020) A review of fuzzy AHP methods for decision-making with subjective judgements. Expert Syst Appl 161. https://doi.org/10.1016/j.eswa.122020.113738 14. Saaty TL (1980) The analytic hierarchy process: planning, priority setting, resource allocation. McGraw-Hill, New York 15. Saaty TL (2008) Decision making with the analytic hierarchy process. Int J Serv Sci 1(1):83–98
Multi-criteria Decision Theory-Based Cyber Foraging Peer …
571
16. Benitez J, Delgado-Galván X, Gutiérrez-Pérez JA, Izquierdo J (2011) Balancing consistency and expert judgment in AHP. Math Comput Model. https://doi.org/10.1016/j.mcm.2010.12.023 17. Asuquo ED, Onuodu EF (2016) A fuzzy ahp model for selection of university academic staff. Int J Comput Appl 141(1):19–26 18. Demirel T, Cetin DN, Kahraman C (2008) Fuzzy analytic hierarchy process and its application. https://doi.org/10.1007/978-0-387-76813-7_3
Visualizing Missing Data: COVID-2019 K. Lavanya, G. Raja Gopal, M. Bhargavi, and V. Akhil
Abstract In this paper, provided data visualization about missing data and the actual data of COVID-2019 dataset of Andhra Pradesh. The study, in which applied different types of imputation methods on generated missing datasets. For each new dataset with the predicted values generated in the place of missing values using individual imputation methods. Later, applied the regression methods: Linear and Multi-linear to perform predictive analysis to the imputation result. However, the regression methods are used to decrease the margin values from highest to lowest values. The result of both regression shows the promising results in terms of the evaluation metrics Root Mean Square Error (RMSE), and Mean Absolute Error (MAE). All the methods were compared over COVID-2019 and among Liner Interpolation method produced promising results compared to various standard methods includes ad-hoc-zero filling, mean, and Hot-deck imputation. Finally, dataset will be processed under the different data visualization techniques to represent the data in the different forms like Bar chart, Line chart, and Scatter chart. Keywords Missing data · Imputation methods · Regression analysis · Data visualization
1 Introduction As of the present, current on-going situation is about the COVID-19 [1, 2]. Mainly most of the researchers focusing about this issue, like how to predict the future data analysis regarding the cases and missing values. At first this virus started at China in the city of Wuhan, and the first case was registered in the month of November 2019 in the laboratory of China [3]. Initially symptoms of this virus to be seen in between 7 and 14 days starting with dry cough and slightly fever and then throat problems and breathing problems, etc. And from March 2020 the effect of this virus had been increased tremendously spreading through different corners of the world. Thereafter K. Lavanya (B) · G. Raja Gopal · M. Bhargavi · V. Akhil Lakireddy Bali Reddy College of Engineering (Autonomous), Mylavaram, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. Saraswat et al. (eds.), Congress on Intelligent Systems, Lecture Notes on Data Engineering and Communications Technologies 114, https://doi.org/10.1007/978-981-16-9416-5_41
573
574
K. Lavanya et al.
started the collecting reports from the hospitals through government the data had been collected. As per the last year 2020, World Health Organization (WHO) report, a total of 100,075 deaths out of 15,225,252 active cases occurred in the world [4]. This study is about the three states of analysis. These three states are in India, i.e., Andhra Pradesh, Telangana, and Tamil Nadu. After considering all the datasets of these states of COVID-19 cases this analysis had done. For the datasets collected from different resources available in prsindia.org, we have included five different fields number of days, confirmed cases, active cases, cured/discharged cases, and death cases. In this we are using imputation methods to find the missing data in the predefined datasets. Regression is one of the kinds of Machine Learning methods for finding the correlation between the variables on the provided dataset. Different types of regression methods are available like Linear Regression, Multi-Linear Regression, Quantile regression [5–7]. Recently regression used in analyzing [8] and predicting cases of COVID-19 [9, 10]. Today most of the using method is regression to predict the continuous output variable and more predicted values. The study of this paper is about the analysis of data and finally to represent the dataset in the data visualization techniques.
2 Existing Methods Imputation methods [11]—Linear Interpolation, Ad-hoc zero filling, Marginal Means. In the existing only the implementation of the imputation methods on the missing values on the datasets which are Linear Interpolation- is basically the estimation of an unknown value that falls within two known values, Ad-hoc zero filling— in this method the missing values replaced with either zero or ones and Marginal Means—replacing missing values with mean of the existing values in the dataset [12]. When these are implemented, the following are the results obtained by the same dataset in Andhra Pradesh Case study which is shown in Fig. 1.
3 Imputation Methods For the existing datasets, we are generated Missing at Random (MAR) by randomly removing a subset of values in each graph (10%; 20%; and 30%). We then replaced these missing values with imputed values computed using one of the four imputation methods of Ad-hoc zero filling, linear interpolation, Marginal Means and Hot-Deck imputation. The 0% missingness of the dataset considered to be as the main dataset for measuring the values and changes for the upcoming dataset which obtained with different types of imputation. The imputed values in the dataset are then visualized using any one of the three visualization methods per plot type. In addition to this not only the imputation methods Linear Interpolation, Ad-hoc zero filling, Marginal
Visualizing Missing Data: COVID-2019
575
Fig. 1 Result of imputation methods: ad-hoc zero filling, linear interpolation and mean
Means, and Hot-Deck Imputation methods [13, 14] to this we are doing the regression analysis also so that to compare the results with the defined methods called root mean square and MAPE.
3.1 Missing Data Missing Data [15–17], generally classified into 3 categories according to its missing pattern. • Missing Completely at Random (MCAR) • Missing at Random (MAR) • Missing Not at Random (MNAR) Missing Completely at Random. The propensity for a records factor to be lacking is random. There’s no dating among whether a records factor is lacking and any values with inside the records set, lacking, or discovered. The lacking records are only a random subset of the records. Missing at Random. A higher call could certainly be Missing Conditionally at Random, due to the fact the missingness is conditional on any other variable. Missingness can be fully accounted for by variables where there is complete information. Missing Not at Random. Data that is neither MAR nor MCAR. The sample of missingness is associated with different variables with inside the dataset, however in addition, the values of the lacking records are not random.
576
K. Lavanya et al.
Fig. 2 Standard python libraries for data visualization
3.2 Data Visualization Data Visualization [18, 19] is a graphical representation of information and data. By using the visual elements like charts, graphs, maps, plots, histograms, and other visual forms. Data Visualization tools provide the accessible way to see and understand trends, outliers, and patterns in data. Data Visualization techniques [20] for handling missing data. Moreover, number of standard python libraries used to visualize the imputation results and shown in Fig. 2.
3.3 Imputation Imputation [12, 21] is the process of replacing missing data with substituted values. It is a statistical technique to estimate missing values in a dataset based on collected values from the dataset [6]. Ad -Hoc Zero filling. This could take the form of imputing zeros or ones for all the missing values of a discrete variable. Local Linear Interpolation. It is the simplest way method of getting values at positions in between the data points. Each segment can be interpolated independently. Marginal Means. Compute the values to mean and median using the non-missing values and use it to impute missing values to find the missing values.
Visualizing Missing Data: COVID-2019
577
Hot-Deck Imputation. It can be filled with the sample substitute values from current signal. A missing value is imputed based on an observed value that is closer in terms of distance.
4 Results and Discussion The dataset collected for this study form the Source: prsindia.org. The following features considered for the study which includes Month, Days, Confirmed Cases, Active Cases, Cured Cases, and Death Cases. The study considered the local state analysis (i.e., Andhra Pradesh, Telangana, and Tamil Nadu) of COVID-2019 and after studying various reports from the derivative analysis of media from the past to the present date since 12/03/2020.The complete details shown in Table 1. From the Table 2, provides in-complete data with 10% missingness to the COVID-2019 data and is summarized, respectively. Also, study considered 20% and 30% missing data also due information limitation not included in the paper. From Tables 3, 4, 5 and 6, shows the result of all imputation methods in all possible cases from Confirmed, Active, Cured, and Death. However, it is observed that among Table 1 Original dataset: COVID-2019, Andhra Pradesh case study Month
Days
Confirmed
Active
Cured
Death
21-Mar
10
3
3
0
0
2-Apr
22
86
84
1
1
11-Apr
31
363
350
7
6
10-May
60
1930
999
887
44
20-May
70
2532
859
1621
52
19-Jun
100
7518
3637
3789
92
11-Jul
122
25,422
11,936
13,194
292
8-Aug
150
206,960
84,654
120,464
1842
28-Aug
170
393,090
94,209
295,248
3633
19-Sep
192
609,558
84,423
519,891
5244
17-Oct
220
775,470
38,979
730,109
6382
6-Nov
240
835,953
21,878
807,318
6757
21-Nov
255
859,932
15,382
837,630
6920
17-Dec
280
876,814
4420
865,327
7067
31-Dec
300
881,948
3256
871,588
7104
29-Jan
320
887,466
1358
878,956
7152
18-Feb
340
889,010
607
881,238
7165
7-Mar
355
890,556
921
882,462
7173
31-Mar
376
900,805
6614
886,978
7213
578
K. Lavanya et al.
Table 2 Missing dataset: COVID-2019, Andhra Pradesh case study Month
Days
Confirmed
Active
Cured
Death
21-Mar
10
3
3
0
0
2-Apr
22
86
84
1
1
11-Apr
31
363
350
7
6
10-May
60
1930
999
887
–
20-May
70
–
859
–
52
19-Jun
100
7518
–
3789
92
11-Jul
122
–
11,936
13,194
292
8-Aug
150
206,960
84,654
–
1842
28-Aug
170
393,090
94,209
295,248
–
19-Sep
192
609,558
–
519,891
5244
17-Oct
220
–
38,979
730,109
6382
6-Nov
240
835,953
21,878
807,318
–
21-Nov
255
859,932
–
837,630
6920
17-Dec
280
876,814
4420
–
7067
31-Dec
300
881,948
3256
871,588
7104
29-Jan
320
–
1358
878,956
–
18-Feb
340
889,010
–
–
7165
7-Mar
355
890,556
921
882,462
7173
31-Mar
376
900,805
6614
886,978
7213
all imputation methods linear interpolation produces near to original values. Later, work also compared the regression model results of imputation datasets and is shown in Table 7. Moreover, study also produced relevant graphs of each imputation and regression methods. The detailed data visualization of linear interpolation imputation methods shown in Figs. 3, 4, 5, 6.
5 Conclusion In this paper, we have implemented the imputation methods of different types Linear Interpolation, Ad-hoc zero, Marginal Means, and Hot-deck imputation in addition to this, we have also implemented the regression types linear regression, multilinear regression and quantile regression and implementing these methods in the missingness of the dataset, we are compared those newly obtained values to the original dataset so that we have used the RMSE and MAPE methods to find the accuracy between them and finally the data is to be visualized with the graphs-line, bar and Scatter charts.
Visualizing Missing Data: COVID-2019
579
Table 3 Result of imputation method: ad-hoc zero filling Month
Days
Confirmed
Active
Cured
Death
21-Mar
10
3
3
0
0
2-Apr
22
86
84
1
1
11-Apr
31
363
350
7
6
10-May
60
1930
999
887
0
20-May
70
0
859
0
52
19-Jun
100
7518
0
3789
92
11-Jul
122
0
11,936
13,194
292
8-Aug
150
206,960
84,654
0
1842
28-Aug
170
393,090
94,209
295,248
0
19-Sep
192
609,558
0
519,891
5244
17-Oct
220
0
38,979
730,109
6382
6-Nov
240
835,953
21,878
807,318
0
21-Nov
255
859,932
0
837,630
6920
17-Dec
280
876,814
4420
0
7067
31-Dec
300
881,948
3256
871,588
7104
29-Jan
320
0
1358
878,956
0
18-Feb
340
889,010
0
0
7165
7-Mar
355
890,556
921
882,462
7173
31-Mar
376
900,805
6614
886,978
7213
6 Future Scope In this paper mainly, we have concentrated on the imputation methods so that we can find the missing values and compared to the original dataset and in addition to this Ad-hoc zero filling, marginal means, linear interpolation, and Hot-deck imputation we can extend to cold deck and other regression techniques even with the time-series analysis along to this project for the extension. Imputation we can extend to cold deck and other regression techniques even with the time-series analysis along to this project for the extension.
Table 4 Result of imputation method: linear interpolation Month
Days
Confirmed
Active
Cured
Death
21-Mar
10
3
3
0
0
2-Apr
22
86
84
1
1
11-Apr
31
363
350
7
6
10-May
60
1930
999
887
29 (continued)
580
K. Lavanya et al.
Table 4 (continued) Month
Days
Confirmed
Active
Cured
Death
20-May
70
4724
859
2338
52
19-Jun
100
7518
6397
3789
92
11-Jul
122
17,239
11,936
13,194
292
8-Aug
150
206,960
84,654
154,221
1842
28-Aug
170
393,090
94,209
295,248
3543
19-Sep
192
609,558
66,594
519,891
5244
17-Oct
220
722,755
38,979
730,109
6382
6-Nov
240
835,953
21,878
807,318
6651
21-Nov
255
859,932
13,149
837,630
6920
17-Dec
280
876,814
4420
854,609
7067
31-Dec
300
881,948
3256
871,588
7104
29-Jan
320
8,859,010
1358
878,956
7134
18-Feb
340
889,010
1139
880,709
7165
7-Mar
355
890,556
921
882,462
7173
31-Mar
376
900,805
6614
886,978
7213
Table 5 Result of imputation method: marginal means Month
Days
Confirmed
Active
Cured
Death
21-Mar
10
3
3
0
0
2-Apr
22
86
84
1
1
11-Apr
31
363
350
7
6
10-May
60
1930
999
887
3770
20-May
70
490,301
859
448,537
52
19-Jun
100
7518
18,034
3789
92
11-Jul
122
490,301
11,936
13,194
292
8-Aug
150
206,960
84,654
448,537
1842
28-Aug
170
393,090
94,209
295,248
3770
19-Sep
192
609,558
18,034
519,891
5244
17-Oct
220
490,301
38,979
730,109
6382
6-Nov
240
835,953
21,878
807,318
3770
21-Nov
255
859,932
18,034
837,630
6920
17-Dec
280
876,814
4420
448,537
7067
31-Dec
300
881,948
3256
871,588
7104
29-Jan
320
490,301
1358
878,956
3770
18-Feb
340
889,010
18,034
448,537
7165
7-Mar
355
890,556
921
882,462
7173
31-Mar
376
900,805
6614
886,978
7213
Visualizing Missing Data: COVID-2019
581
Table 6 Result of imputation method: hot-deck Month
Days
Confirmed
Active
Cured
Death
21-Mar
10
3
0
0
3
2-Apr
22
84
1
1
84
11-Apr
31
350
7
6
350
10-May
60
999
887
44
999
20-May
70
859
1621
52
859
19-Jun
100
1729
5679
92
1729
11-Jul
122
11,936
13,194
292
11,936
8-Aug
150
76,613
128,505
1842
76,613
28-Aug
170
94,209
295,248
3633
94,209
19-Sep
192
84,423
519,891
5244
84,423
17-Oct
220
38,979
730,109
6382
38,979
6-Nov
240
21,878
807,318
6757
21,878
21-Nov
255
15,382
837,630
6920
15,382
17-Dec
280
4420
865,327
7067
4420
31-Dec
300
3256
871,588
7104
3256
29-Jan
320
1358
878,956
7152
1358
18-Feb
340
607
881,238
7165
607
7-Mar
355
921
882,462
7173
921
31-Mar
376
6614
886,978
7213
6614
Table 7 Result of prediction method: linear and multi-linear regression
Regression
Imputation methods
RMSE
MAPE
Linear regression
Linear regression
0.5435
0.4782
Multi-Linear regression
AD-hoc zero filling
0.8209
0.7316
Linear interpolation
0.4457
0.3737
Marginal means
0.4523
0.3808
Hot-deck imputation
0.4878
0.4576
Linear regression
0.5334
0.4358
AD-hoc zero filling
0.6436
0.4989
Linear interpolation
0.3967
0.3459
Marginal means
0.3609
0.3196
Hot-deck imputation
0.4849
0.4385
582
K. Lavanya et al.
Fig. 3 Bar graph: the original and imputation data results from linear interpolation method
Fig. 4 Line graph: the original and imputation data results from linear interpolation method
Visualizing Missing Data: COVID-2019
583
Fig. 5 Scatter graph: the original and imputation data results from linear interpolation method
Fig. 6 Area graph: the original and imputation data results from linear interpolation method
References 1. Hossain MM (2020) Status of global research on novel Coronavirusdisease (COVID-19): a bibliometric analysis and knowledge mapping. Available at SSRN 3547824 2. Sanjay K (2020) Monitoring novel corona virus (COVID-19) infections in India by cluster analysis. Ann Data Sci 3. Nimpattanavong C, Khamlae P, Choensawat W, Sookhanaphibarn K (2020) Flight traffic visual analytics during COVID-19. In: Proceedings of the 2020 IEEE 9th global conference on consumer electronics (GCCE), pp 215–217, Kobe, Japan
584
K. Lavanya et al.
4. Jung JH, Shin JI (2020) Big data analysis of media reports related to COVID-19. Int J Environ Res Public Health 17:5688 5. Dsouza J, Velan S (2019) Preventive maintenance for fault detection in transfer nodes using machine learning. In: 2019 international conference on computational intelligence and knowledge economy (ICCIKE), pp 401–404.Dubai, United Arab Emirates 6. Lin WC, Tsai CF (2020) Missing value imputation: a review and analysis of the literature (2006–2017). Artif Intell Rev 53:1487–1509 7. Suresh GV, Lavanya K (2021) An additive sparse logistic regularization method for cancer classification in microarray data. Int Arab J Inf Technol 18(2).https://doi.org/10.34028/iajit/ 18/10. ISSN: 1683–3198E-ISSN: 2309–4524, Impact Factor is 0.654, SCIE 8. Kumari R, Kumar S, Poonia RC, Singh V, Raja L, Bhatnagar V, Agarwal P (2021) Analysis and predictions of spread, recovery, and death caused by COVID-19 in India. Big Data Min Analytics 4(2):65–75 9. Singh V, Poonia RC, Kumar S, Dass P, Agarwal P, Bhatnagar V, Raja L (2020) Prediction of COVID-19 corona virus pandemic based on time series data using Support Vector Machine. J Discrete Math Sci Crypt 23(8):1583–1597 10. Bhatnagar V, Poonia RC, Nagar P, Kumar S, Singh V, Raja L, Dass P (2021) Descriptive analysis of COVID-19 patients in the context of India. J Interdisc Math 24(3):489–504 11. Somasundaram R, Nedunchezhian R (2011) Evaluation of three simple imputation methods for enhancing preprocessing of data with missing values. Int J Comput Appl 21:14–19 12. Lavanya K, Reddy LSS, Eswara B (2019) Distributed based serial regression multiple imputation for high dimensional multivariate data in multicore environment of cloud. Int J Ambient Comput Intell (IJACI) 10:63–79. https://doi.org/10.4018/IJACI.2019040105 13. Puri A, Gupta M (2017) Review on Missing Value Imputation Techniques In Data Mining. In: Proceedings of the international conference on machine learning and computational intelligence, pp 35–40, Australia, 6–11, Sydney 14. Purwoningsih F, Santoso HB, Hasibuan ZA (2019) Online learners’ behaviors detection using exploratory data analysis and machine learning approach. In: 2019 fourth international conference on informatics and computing (ICIC), pp 1–8, Semarang, Indonesia 15. Humphries M (2013) Missing data and how to deal. An overview of missing data, population research center. University of Texas, Austin, TX, USA, pp 39–41 16. Little RJ, Rubin DB (2014) Statistical analysis with missing data. John Wiley & Sons 17. Rubin DB (1996) Multiple imputation after 18+ years. J Am Stat Assoc 91:473–548 18. Dybowski R, Weller P (2001) Prediction regions for the visualization of incomplete datasets. Comput Stat 16(1):25–41 19. Eaton C, Plaisant C, Drizd T (2003) The challenge of missing and uncertain data poster in the visualization 2003 conference compendium, IEEE, pp 40–41 20. Kirk A (2014) Visualizing zero: how to show something with nothing. http://blogs.hbr.org/ 2014/05/visualizing-zero-how-to-showsomething-with-nothing/,May 21. Hassani H, Kalantari M, Ghodsi Z (2019) Evaluating the performance of multiple imputation methods for handling missing values in time series data. A study focused on East Africa, soil-carbonate-stable isotope data. Stats 2019 2:457–467
Study of Impact of COVID-19 on Students Education Deepali A. Mahajan and C. Namrata Mahender
Abstract The COVID-19 epidemic had a negative impact on people all around the world. The COVID-19 epidemic had a negative impact on people all around the world in all aspects that is economy, education, and sports. Among all these sectors, the most affected sector is education. Not only in the India but in all over the world, the education system gets collapsed during this COVID-19 conditions. We have conducted a survey research for study of all these situation on student’s academics. Online questionnaires are prepared and distributed online to the students to collect their responses. Collected 181 responses for study, and found that more preference is given to the classroom study instead of the online study. In higher education, online education may be beneficial as they are grown students, but for school level, it becomes quit difficult to understand the concept and continuously attend the lectures. Keywords SNS · Smartphone · Education · COVID-19 · Questionnaire · Social media
1 Introduction More than 1.5 billion pupils from more than 200 nations have been afflicted by the COVID-19 pandemic. To resume the education, the use of apps like Google Meet, Cisco, WebEx meetings, Zoom, and Microsoft teams become mandatory for the educational institutions like schools and colleges. The corona pandemic has forced to adopt the online teaching across the country. Teaching and evaluation is done using online platform, and the submissions of various assignment works are done using WhatsApp, email, or any other application. The basic requirement for online lectures is a smartphone or a laptop or a desktop and a good Internet connection. For the proper communication in student and teacher, everybody must be digitally connected with each other. To avoid the educational loss, adoption of online education is more preferable medium in India. This requires the improvement in digital connectivity D. A. Mahajan (B) · C. Namrata Mahender Dr. Babasaheb Ambedkar Marathwada University, Aurangabad, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. Saraswat et al. (eds.), Congress on Intelligent Systems, Lecture Notes on Data Engineering and Communications Technologies 114, https://doi.org/10.1007/978-981-16-9416-5_42
585
586
D. A. Mahajan and C. Namrata Mahender
between all villages and cities in India for effective online communication between teacher and students. Institutes in higher education sectors have an infrastructure to connect students and to provide online education. But at the primary, middle, or higher schools, they do not have such advanced infrastructure and students also do not have such facilities, especially in the rural area students, it becomes very difficult. The students who belong from less privileged backgrounds face the more difficulties during lockdown period, and they have more unfavorable consequences as a result of the pandemic of COVID-19. In this lockdown, family income gets reduced, every person requires the digital devices, so lack of availability of devices and the expensive Internet connection have disturbed the student’s academic life. Over 1.5 billion students around the world are deprived of basic education. On the other hand, the students who are able to get the online education are facing the different kind of problems related to their physical and mental health. Increase in screen time change in life style, stress on eyes, absence of outdoor activity, social isolation, and change in sleeping habits have affected the mental health of the students.
1.1 Challenges in Learning The most common e-learning challenge is the lack of motivation. The children are naughty, and they cannot stay stable at a place, this results that they are not interested about the content or in the subject which is taught in the online class. They may be easily distracted from the lectures. The wide range of platforms and online educational technologies availability may make online learning simple, although students and teachers are facing lots of problems. The problems like accessibility, affordability, and flexibility are identified which are also have to consider. Many countries face significant challenges in terms of maintaining a stable Internet connection and gaining access to digital gadgets. In numerous countries, the underprivileged children are not able to fulfill the requirements of online learning. Need of parents guidance is essential for young learners. If both parents are working, then it becomes difficult to look after the younger students during the online learning. Many practical issues around physical workspaces are another challenge for such children.
2 Literature Review The university education vision must be revised in order to assure student learning outcomes and educational quality requirements [1]. Though the world get disturbed, online learning is the most effective way to resume the education and expand the student’s knowledge, and it appears to be a potent tool and a sign of hope in the time
Study of Impact of COVID-19 on Students Education
587
of COVID-19 [2]. The pandemic leaves an impact on the education sector. The lockdown imposed to stop the virus from spreading had an impact on the education sector as teaching learning process has been implemented through online platform. Issues connected to the altered environment, Internet connectivity, and electricity supply are just a few of the myriad challenges being faced during this present lockdown period, so we need to develop in these sectors [3]. There are a few advantages to be gained from certain circumstances that will assist both the student and the teacher in the long run [4]. COVID-19 has a significant impact on India’s education industry; hence, it has been suggested by Dr. Pravat Kumar Jena that the educational industry have to build up their information and infrastructure which is needed for information technology [5]. The COVID-19 epidemic has had an impact on countries’ traditional schooling systems. Hence, emerging countries should scale up the infrastructure for online teaching, learning process [6]. Carmen Carrillo, Maria Assunção Flores states several implications for teacher education. It is important to reduce the factors which are responsible to exclusion and inequalities and to improve students’ participation in the learning process [7]. COVID-19 pandemic lockdown conditions affected the academic performance of many students with varying degrees. Online education system helps the students for self-study. For better online education, there should be the implementation of more interactive lectures, providing concise information, and 3D virtual tools to mimic the real situation [8]. For the students, confinement is a new scenario, they have the fear of missing their academic year because of the COVID-19 pandemic, so they work harder to overcome any difficulty and to contribute to solve the problems which higher education system is facing [9]. Not only kids, but also their families and staff, face challenges as a result of the school shutdown. The only way to improve the educational system is to use distance learning. It is difficult in developing nations since many parents lack access to digital tools. The poor and digitally illiterate population are more suffering in this situation, and this increases inequality [6]. According to the study the COVID-19 outbreak has had a significant influence on kids’ mental health and education. It also throws off the everyday routine. The COVID-19 presents a possibility to identify alternative educational measures [10]. Some research has the suggestions for the future use of online distance learning. According to the researcher enhancing the technological configuration, providing responsive troubleshooting services, and establishing robust communication channels between management, learners, and professors, all are crucial factors for higher education institutions which must be considered. There are suggestions for university portal designers to improve interactivity between learners and professors and among peers [11]. The spread of the COVID-19 virus at schools has been low, and at colleges, it has the potential to spread at a faster rate. On the students’ academic performance, the COVID-19 leaves a negative effects [12]. Educational institutions and governments were ready for this abrupt change to technology-based learning. It raised the issues of inequality, lack of access, and lack of skills to facilitate online learning. There are many limitations to implement the online education system because of which it cannot replace traditional education completely mainly in the area where hands-on training is an absolute necessity to meet learning requirements [13]. In the developing countries, suitable pedagogy and platform for different class levels of
588
D. A. Mahajan and C. Namrata Mahender
higher secondary, middle, and primary education need to be explored further. Every young person has the chance to succeed in school and develop the knowledge, skills, attitudes, and values that will enable them to contribute to society [14]. From the literature review, we can understand the difficulties in education sector during the COVID-19 pandemic. After analysing the impact of closure of schools and colleges on the student’s education, we are proposing the solution on the problems in terms of suggestions.
3 Proposed Method 3.1 Data Collection Here, we have conducted the online survey for the data collection. We prepared the Google forms for collection of the responses from the participants. We have generated two different questionnaires, one for students. Survey questionnaires were prepared and distributed online to different categories of students like school and college students for their responses. Total students participants were 181, which are from different schools and colleges of Marathwada region in Maharashtra. This questionnaires can help to reflect the effect of COVID-19 on today’s education system and the benefits and loses because of all changes during this pandemic situation. As schools are closed because of lockdown condition, we are not able to communicate with more number of students, and only collected 181 responses.
3.2 Analysis of Data For statistical analysis, we have used SPSS software platform. “Statistical Package for the Social Sciences” is abbreviated as SPSS. This package serves the statistical analysis for survey research, which helps the researcher to discover powerful insights from the responses collected through the questionnaires. In this paper, we have computed frequencies and percent. Here, we have tried to discover the impact of use of the social networking sites during the spread of COVID-19 on the student’s education. Table 1 shows the questions which we have considered in this study.
3.3 Significance of the Questions Qh.1: We can understand the number of social media usage of the students so as to study about the engagement with social media.
Study of Impact of COVID-19 on Students Education
589
Table 1 Questions for students Sr. no.
Questions
1
With how many social media sites do you have your account?
2
How much time do you spend on social media for a day?
3
Which social network do you use the most?
4
Which device do you use the most?
5
Now a day which application is used for online lecture/education?
6
Do your parents allow you to use a mobile?
7
Excluding online lectures of school/college how much time do you use social media?
8
What type of learning would you like?
Qh.2: Total time spend on the social media. Qh.3: To study the nature of usage of SNS in the students as each social media is with different features. Qh.4: The devices used by the students may cause the addiction problem, as mobile is easily accessible from anywhere, any time it may cause SNS addiction. Qh.5: We can extract the popular social media for online education. Qh.6: To study the awareness of the use of SNS in the parents. Qh.7: Attending online lecture is mandatory nowadays, but other than educational activities how much the extra time students are spending on SNS. Qh.8: Here we are trying to extract the interests and understanding of the students among online and offline, i.e., classroom learning.
4 Results In the obtained results, we can observe that for Qh.1, we get an average percent for the use of total number of social media. Qh.2 reflects the result that only 16.1% students spend more than 2 h on the social media for a day. It indicates the moderate use of SNS. According to Qh.3, WhatsApp is used by 38.2%, and Instagram is used by 30.9% students. They are more likely used social networking sites. Qh.4 tells us that 85.6% students prefer to use the smartphone. Qh.5 reveals the fact that Zoom and Google Meet are used more preferably for the online education. From Qh.6, we come to know that 91.7% parents are allowing their children to use the mobile phone. According to Qh.7 only 30% students are using the social media up to 1 h excluding the online lectures. And the last question no. 8 gives a very strong input to the research that 79% students are like to prefer the classroom learning rather than online learning (Tables 2, 3, 4, 5, 6, 7, 8, and 9).
590
D. A. Mahajan and C. Namrata Mahender
Table 2 Frequency and percent for question, with how many social media sites do you have your account? Frequency Valid
Percent
Valid percent
Cumulative percent
1
43
23.8
23.8
23.8
2
52
28.7
28.7
52.5
3
44
24.3
24.3
76.8 100.0
More than 3 Total
42
23.2
23.2
181
100.0
100.0
Table 3 Frequency and percent for question, how much time do you spend on social media for a day? Frequency Valid
Valid percent
Cumulative percent
Up to 30 min
53
29.3
29.4
29.4
30–60 min
58
32.0
32.2
61.7
1–2 h
40
22.1
22.2
83.9
More than 2 h
29
16.0
16.1
100.0
180
99.4
100.0
Total Missing
Percent
System
Total
1
0.6
181
100.0
Table 4 Frequency and percent for question, which social network do you use the most? Frequency Valid
Twitter
Valid percent
Cumulative percent
3
1.7
1.7
1.7
You Tube
52
28.7
29.2
30.9
WhatsApp
68
37.6
38.2
69.1 100.0
Instagram Total Missing
Percent
Facebook
Total
55
30.4
30.9
178
98.3
100.0
3
1.7
181
100.0
Table 5 Frequency and percent for question, which device do you use the most? Frequency Valid
Smartphone
Percent
Valid percent
Cumulative percent
155
85.6
85.6
85.6
Laptop
18
9.9
9.9
95.6
Desktop
7
3.9
3.9
99.4
4
1
0.6
0.6
100.0
181
100.0
100.0
Total
Study of Impact of COVID-19 on Students Education
591
Table 6 Frequency and percent for question, nowadays which application is used for online lecture/education? Frequency Valid
Zoom
Valid percent
Cumulative percent
44
24.3
24.3
24.6
6
3.3
3.4
27.9
You Tube
11
6.1
6.1
34.1
Google Meet
99
54.7
55.3
89.4
Other
19
10.5
10.6
100.0
Total
179
98.9
100.0
Microsoft Teams
Missing
Percent
System
Total
2
1.1
181
100.0
Table 7 Frequency and percent for question, do your parents allow you to use a mobile? Frequency Valid
0
Percent
Valid percent
Cumulative percent
1
0.6
0.6
0.6
Yes
166
91.7
91.7
92.3
No
14
7.7
7.7
100.0
181
100.0
Total
100.0
Table 8 Frequency and percent for question, excluding online lectures of school/college how much time do you use social media? Frequency Valid
Valid percent
Cumulative percent
0–30 min
67
37.0
37.2
37.2
30–60 min
54
29.8
30.0
67.2
1–2 h
39
21.5
21.7
88.9 100.0
More than 2 h Total Missing
Percent
System
Total
20
11.0
11.1
180
99.4
100.0
1
0.6
181
100.0
Table 9 Frequency and percent for question, what type of learning would you like? Frequency Valid
Online learning
Percent
Valid percent
Cumulative percent
38
21.0
21.0
21.0
Classroom learning
143
79.0
79.0
100.0
Total
181
100.0
100.0
592
D. A. Mahajan and C. Namrata Mahender
5 Conclusion and Future Scope This paper investigates the impact of use of social media on education system because of the pandemic during COVID-19. After analyzing 181 students from different schools and colleges, we found the conclusion that the change in education system due to the spread of COVID-19 is negatively impacting the students as they have to use the social media applications for the academic activities, and parents are forced to allow their children to use the smartphone and SNN as it become compulsory. Students prefer the classroom learning instead of online learning platform as it is very difficult for school level students to understand the concept and concentrate on the lectures in front of the mobile or laptop screen. As the future scope, we will further study about the psychological and physical effects of online education on the students.
References 1. Rashid S, Yadav S (2020) Impact of Covid-19 pandemic on higher education and research. Indian J Hum Dev 1–4. https://doi.org/10.1177/0973703020946700 2. Jain S, Agarwal TS (2020) The impact of corona crisis on education sector in India. Int J Indian Psychol 8(3). ISSN 2348-5396 (Online)|ISSN: 2349-3429 (Print), DIP: 18.01.021/20200803, https://doi.org/10.25215/0803.021, http://www.ijip.in 3. Raj U (2020) Indian education system in fight against Covid-19 pandemic. Int J Creative Res Thoughts (IJCRT) 8(7). ISSN: 2320–2882 4. Koul PP, Bapat OJ (2020) Impact of covid-19 on education sector in India. J Crit Rev 7(11). ISSN 2394-5125 5. Jena PK (2020) Impact of pandemic Covid-19 on education in India. Int J Curr Res 12(07):12582–12586 6. Tadesse S, Muluye W (2020) The impact of COVID-19 pandemic on education system in developing countries: a review. Open J Soc Sci 8:159–170 7. Carrillo C, Flores MA (2020) COVID-19 and teacher education: a literature review of online teaching and learning practices. Eur J Teacher Educ 43(4):466–487 8. Mahdy MAA (2020) The impact of COVID-19 pandemic on the academic performance of veterinary medical students. Front Vet Sci 7(594261) 9. Gonzalez T, delaRubia MA, Hincz KP, Comas-Lopez M, Subirats L, Fort S, Sach GM (2020) Influence of COVID-19 confinement on students’ performance in higher education. PLOSONE 15(10):e0239490 10. Chaturvedi K, Vishwakarma DK, Singh N (2021) COVID-19 and its impact on education, social life and mental health of students: a survey. Child Youth Serv Rev 121:105866 11. Said GRE (2021) How did the COVID-19 pandemic affect higher education learning experience? An empirical investigation of learners’ academic performance at a university in a developing country. Hindawi Adva Hum Comput Interact 2021(6649524):10. https://doi.org/ 10.1155/2021/6649524 12. Engzell P, Frey A, Verhagen MD (2021) Learning loss due to school closures during the COVID-19 pandemic. PNAS 2021 118(17):e2022376118 13. Talib MA, Bettayeb AM, Omer RI (2021) Analytical study on the impact of technology in higher education during the age of COVID-19: systematic literature review. Educ Inf Technol 14. Pokhrel S, Chhetri R (2021) A literature review on impact of COVID-19 pandemic on teaching and learning. High Educ Fut 1–9. https://doi.org/10.1177/2347631120983481
A Framework for Analyzing Crime Dataset in R Using Unsupervised Optimized K-means Clustering Technique K. Vignesh, P. Nagaraj, V. Muneeswaran, S. Selva Birunda, S. Ishwarya Lakshmi, and R. Aishwarya Abstract At present, the criminals are becoming more and more sophisticated in committing any sort of crime. Nowadays, the intelligence, law enforcement agencies, and the police department are facing issues in analyzing large volumes of data and classifying the crimes separately. Analysis of crime is very important, so that we can identify patterns or trends in the crimes committed. For this reason, we can use a data mining unsupervised technique known as K-means clustering. Data mining is the process of extracting unknown knowledge from a small or huge dataset, data warehouse, or repositories. Clustering is a process in which the data items are grouped based on a specific attribute. K-means clustering is done based on the means of the data items. In this paper, one can understand about K-means clustering, the procedure, and implementation of the clustering method. This system can be used for analyzing the crimes, understanding the trends and the most crime-prone places. Keywords Data mining · Crime dataset · Preprocessing · K-means clustering · Clustering in R
1 Introduction Data mining is the process of extracting the data that is unknown to us which are promisingly useful information. It is also known as knowledge discovery or knowledge extraction. Data mining helps in discovering hidden patterns and knowledge from the data. At present, it is used in every single place where a large volume of data is stored and is need to be processed. Data mining has three phases. It is described in Fig. 1. K. Vignesh · P. Nagaraj (B) · V. Muneeswaran · S. Selva Birunda · S. Ishwarya Lakshmi · R. Aishwarya Kalasalingam Academy of Research and Education, Krishnankoil, Virudhunagar, India e-mail: [email protected] K. Vignesh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. Saraswat et al. (eds.), Congress on Intelligent Systems, Lecture Notes on Data Engineering and Communications Technologies 114, https://doi.org/10.1007/978-981-16-9416-5_43
593
594
K. Vignesh et al.
Fig. 1 Steps in data mining
• Data preprocessing, • Data extraction, • Data evaluation and presentation. There are two major techniques in data mining known as supervised and unsupervised data mining. Here, supervised technique is used for predictions, and unsupervised technique is a descriptive technique. K-means clustering is an unsupervised learning algorithm. It is widely used to solve the clustering problems. It groups the data into n number of different clusters. By using this technique, the crime dataset can be analyzed and clustered to discover insights of data. The main aims of analyzing the crimes include: • Extraction of crime patterns by analyzing the data, • Prediction of the crime by learning the model using data mining techniques, • Detection of crime.
2 K-Means Clustering 2.1 Definition It is an unsupervised algorithm where the main motive is to classify or group a given set of small or large data into k number of disjoint clusters. Here, the value of k is fixed in the beginning [1].
A Framework for Analyzing Crime Dataset …
595
Fig. 2 Phases of K-means clustering
Step 2: Step 1: Define k centroids one for each cluster
Take each point given in the dataset and find the distance between it and each centroid.
2.2 Implementation The K-means clustering algorithm consists of two phases, and it is represented in Fig. 2, and the steps of K-means clustering are mentioned in Fig. 3. They are define k centroids and find the distance between each centroid. The Euclidean distance is used to determine the distance between the points and the centroids [2]. Distance, d =
(x2 − x1 )2 + (y2 − y1 )2 units
When all the distances are found, the least distance point is considered to the points cluster. When all the points are associated with a cluster, new centroids are found since the inclusion of new points could lead to changes in the cluster centroids. After we update the k number of centroids, we repeat the two phases again resulting in a loop. In each loop, the k centroids may change their position. The loop is continued until the centroids do not move anymore. This is the convergence criterion for clustering. At this point, the loop is terminated, and the k number of clusters is defined. Finally, the model is ready.
2.3 Pros and Cons It is a simple and efficient algorithm which is easy to explain. It has good flexibility, so that easy adjustments can be made. The results can be easily interpreted. The drawbacks are that the clusters are not 100% optimal, and the number of clusters should be decided before the analysis.
596
K. Vignesh et al.
Fig. 3 Steps involved in K-means clustering
3 Literature Review We have analyzed some other papers regarding this topic. Following are short descriptions about them. Vijayarani et al., proposed the data mining techniques that are used to find the pattern and hidden trends. Here, datasets from various sources have been used. The crime is deducted by using apriori algorithm, decision tree concepts, Kmeans clustering, and other data mining methods. Visualization has been done in the form of graphs, bars, and pie charts [3]. Chetan et al., proposed their framework, it has the main objective to show the probability of the crime existing in a country by visualizing it in the Google Maps. No datasets are used in this article. The crime is analyzed and deducted in an easy way [4]. Interpretation of data is used in various applications and services in healthcare
A Framework for Analyzing Crime Dataset …
597
systems [5, 6], and data in the images is visualized for medical image processing [7–18], data and image compression techniques [19–21], healthcare data systems [22–24], Big data analysis [25], IoT [26], data mining [27], cyber-attack analysis [28–33], security threats [34–37], artificial intelligence, and deep learning [28, 38– 41]. Al-Janabi et al., uses knowledge discovery in database (KDD) which is the process of finding valid and potential results. The tools used here for data mining and visualization are WEKA mining software and Microsoft Excel [2]. The crime analysis domain is given in four steps. They are: • • • •
Data collection, Analysis, Dissemination, and Feedback and evaluation.
A brief description about the types of crime analysis is enlisted. Finally, the dating mining techniques such as decision trees and K-mean clustering are used to get the solutions. Agarwal et al., used certain software named Rapid miner tool which is used for the crime analysis. The chosen database is first normalized and then loaded into data miner for further clustering operations. The dataset used here is the offenses recorded by the police in England and Wales by offense and police force area. The results are displayed in the forms of bar charts, and the most homicide committed year is found [42]. Jain et al., demonstrated a system which has been proposed that can predict regions which have high probability of crime occurrence and can visualize the crime prone areas. A mathematical function known as squared error function has been recognized [1]. The steps in K-means algorithm were promptly displayed. The problem is executed online by using PHP scripts, and the result is executed in the form of a google map. It is mentioned in Fig. 4. Ali et al., has created an analysis of crime data, and the main objective of crime analysis is portrayed as an analytical process that provides crime patterns to assist the personals to prevent and suppress the criminal activities. Among the data mining tools such as R, WEKA, KNIME, Orange, Tanagra, and Rapid miner, Rapid miner is used for doing K-means clustering. A brief description about various papers is given. The dataset used here is an offense recorded by the police in India. The output is displayed in the form of bar charts, pie charts, and map form [43]. Krishnamurthy et al., proposed the system for extraction of crime patterns by analysis of available crime, and criminal data is done, then prediction of crime based on the spatial distribution of the existing data is done [44]. A dataset from GitHub is used. The tool used is R. The visualizations are done in the form of pie charts. The homicides per year are plotted. Win et al., gives the clear idea about the K-means clustering algorithm and gives a clear note on the steps to be followed. It has a clear-cut flowchart explaining the process of crime deduction. The mathematical process of K-means clustering is explained in a full flexed manner, and the output is plotted in the form of cluster plots [45].
598
K. Vignesh et al.
Fig. 4 Crimes in different places shown in a map
Soundarya et al., proposed a new recommendation system for predicting the criminal behavior of the users on social network based on the activities of the user. It tells briefly about the weblog dataset and the architecture of the recommendation system [46]. This recommendation system is functioning with nine parts in it. It uses the nine factors to identify the behavior of the user and define the behavior. Genetic weighted K-means clustering is used to group and identify behavior. Two categories of datasets are used. They are: • Facebook dataset, • Weblog dataset. A confusion matrix with the recall and precision is displayed for identifying the effective behavior deduction. The result is displayed in the form of bar charts. Nazeer et al., proposed many methods for improving the K-means algorithm, this paper proposes another method to improve the efficiency of K-means algorithm. It tells about the pseudocode of the K-means clustering algorithm [47]. The whole paper consists of four parts namely • • • •
The k-mean clustering algorithm, The enhanced method, Finding the initial centroids, Assigning data points to clusters.
A Framework for Analyzing Crime Dataset …
599
It briefly talks about all the processes and also gives the time complexity of them. So that, we can find the optimal time complexity and the optimal K-means cluster.
4 Dataset We have used the “US Arrests” dataset which is available in the R library. It has four variables (attributes) and 50 observations. The attributes are murder, rape, assault, and urban pop. It contains the rates of the attributes.
5 Methodology K-means clustering algorithm is an interactive algorithm that helps to partition the given data into distinct non-overlapping clusters such that each data point in the graph belongs to only one group. From Fig. 5, the proposed system, the steps to be followed for optimal clusters are, • Load the dataset to be clustered as a data frame. • View the data frame and study the properties of the dataset such as the number of attributes and number of rows.
Fig. 5 Proposed system for crime analysis using K-means clustering
600
K. Vignesh et al.
• Preprocess the data, so as to avoid any null values and reduce noisy data. • Find the optimal number of clusters (value of K) using the elbow method. • Define the centroids and find the Euclidean distances from each points the centroids until no more centroids coincide. • Finally plot the obtained points in a graph, and the clusters are obtained. This is the proposed system for the crime analysis using K-means clustering.
6 Implementation 6.1 Tool Used There are many data mining tools available Rapid miner, KNIME, Python, Orange, etc. Here, the R tool is used for clustering the data and visualizing the crimes. It is a free software environment used for statistical computing. It is open-source software. Visualizations can be done in a simple and effective manner. There are many inbuilt packages for data analytics such as dplyr and ggplot2. We can even build our own packages using the software.
6.2 Packages Used Package is an efficient way to organize, store our work, and share it with others. There are many inbuilt packages in r. They are stored under the directory called “library.” The different packages used for K-means clustering are (Fig. 6),
Fig. 6 Clustering in R
A Framework for Analyzing Crime Dataset …
601
Fig. 7 Study on the data
• • • •
Corrplot—Used to plot the graph of correlation matrix Cluster—Used for plotting clusters Factoextra—Used for quick and easy visualization Ggplot2—Used for graphs
6.3 Implementation of K-Means Clustering First, the data is loaded into the Studio as a data frame, and it is represented in Fig. 7. Then, the data frame is studied in terms of, • Minimum values, • Number of attributes, • Median, mean and mode of each attribute, etc. Then, the data is preprocessed. In this step, we check for any null values in the data. If there are any null values, the null values should be removed. So that the data can be noise free and full. Then, the dataset is further studied by plotting a correlation graph using the corrplot package. And the correlation graph is denoted in Fig. 8. Then, the data is scaled within a scale, and then, the training data is formed and stored in another data frame. Using this newly created dataset, the optimal number of clusters (value of k) is found using bend method. It is achieved by using apply function. From Fig. 9, we can understand that by bend method, the optimal number of clusters would be 4 since the bend is at the x intercept 4. This is made clear by another graph for our understanding. From Fig. 10, we can clearly come to a conclusion that the optimal number of clusters is 4. Next, the K-means clustering is applied by using the function K-means (data, k …), so that the dataset can be grouped into k number of clusters, in our case, four clusters. Finally, the clusters are plotted and displayed using the function fviz_cluster (km, data…) from the factoextra package which is used specially to plot clusters.
602
K. Vignesh et al.
Fig. 8 Correlation graph
Fig. 9 Bend method
From Fig. 11, we can see that the data is visualized in the form of clusters by using K-means clustering.
A Framework for Analyzing Crime Dataset …
Fig. 10 Optimal number of clusters (value of k)
Fig. 11 Final clusters
603
604
K. Vignesh et al.
7 Result and Discussion From Fig. 11, we can understand the visualization in an easy way. We can clearly see four clusters in different colors for easy interpretation. • In the cluster 1, it is in dark blue colors. The crime rates in this cluster are very low. So, we can come to a conclusion that the security need for the cities in the cluster is low. • In cluster 2, it is in green colors. Here also, we can conclude the same as cluster 1 since the crime rate is low. • In cluster 3, it is in red colors. The crime rates are a little bit higher compared to clusters 1 and 2. So, we can come to a conclusion that, medium security is needed in the cities. • Whereas in cluster four, the crime rates are so high comparing to the three clusters. Hence, we can conclude that more security is needed in those cities. • There are some points lying out of the clusters, and they are called the outliers which are the exceptional cases. Thus, the clustering is done by using K-means algorithm using the statistical tool R.
8 Conclusion The K-means algorithm is used for clustering large datasets. But this approach does not always result in optimal solutions as the accuracy of the final clusters depends on the selection of initial centroids. Yet it is a simple and efficient approach. It can be used in the crime analysis in an efficient manner. Crime prediction helps the police department to identify crimes in a specific area and conclude with deductions. This paper discussed about K-means clustering, crime analysis, R statistical tools, and the method of implementation of the K-means clustering. This paper will be useful for the researchers, students, and professionals who need help in any of the mentioned fields or the ones aching to help the police with the crime analysis.
9 Future Enhancement K-means clustering is a simple yet efficient and popular unsupervised machine learning algorithms. That is why; we have chosen K-means clustering for crime analysis in this paper. In our opinion, both crime and crime analysis will increase in the forthcoming future. Since technology will advance in course of time, the style of committing crime will also get sophisticated. So, the crime analysis will also get enhanced involving AI devices, automatic crime deducting applications, sensors, etc.
A Framework for Analyzing Crime Dataset …
605
References 1. Jain V, Sharma Y, Bhatia A, Arora V (2017) Crime prediction using K-means algorithm. GRD J Glob Res Dev J Eng 2(5) 2. Al-Janabi KBS (2011) A proposed framework for analyzing crime data set using decision tree and simple k-means mining algorithms. J Kufa Math Comput 1(3):8–24 3. Vijayarani S, Suganya E, Navya C (2020) A comprehensive analysis of crime analysis using data mining techniques. Int J Comput Sci Eng (IJCSE) 4. Wadhai CG, Kakade TP, Bokde KA, Tumsare DS (2018) Crime analysis using K-means clustering. Int J Eng Res Technol (IJERT) 07(04) 5. Junwei L, Ramkumar S, Emayavaramban G, Thilagaraj M, Muneeswaran V, Rajasekaran MP, Hussein AF (2018) Brain computer interface for neurodegenerative person using electroencephalogram. IEEE Access 7:2439–2452 6. Jialu G, Ramkumar S, Emayavaramban G, Thilagaraj M, Muneeswaran V, Rajasekaran MP, Hussein AF (2018) Offline analysis for designing electrooculogram based human computer interface control for paralyzed patients. IEEE Access 6:79151–79161 7. Muneeswaran V, Rajasekaran MP (2017, March) Beltrami-regularized denoising filter based on tree seed optimization algorithm: an ultrasound image application. In: International conference on information and communication technology for intelligent systems. Springer, Cham, pp 449–457 8. Muneeswaran V, Rajasekaran MP (2019) Local contrast regularized contrast limited adaptive histogram equalization using tree seed algorithm—an aid for mammogram images enhancement. Smart intelligent computing and applications. Springer, Singapore, pp 693–701 9. Muneeswaran V, Rajasekaran MP (2019) Automatic segmentation of gallbladder using bioinspired algorithm based on a spider web construction model. J Supercomputing 75(6):3158– 3183 10. Muneeswaran V, Rajasekaran MP (2016, December) Analysis of particle swarm optimization based 2D FIR filter for reduction of additive and multiplicative noise in images. In: International conference on theoretical computer science and discrete mathematics. Springer, Cham, pp 165–174 11. Muneeswaran V, Rajasekaran MP (2018) Gallbladder shape estimation using tree-seed optimization tuned radial basis function network for assessment of acute cholecystitis. Intelligent engineering informatics. Springer, Singapore, pp 229–239 12. Nagaraj P, Muneeswaran V, Reddy LV, Upendra P, Reddy MVV (2020, May) Programmed multi-classification of brain tumor images using deep neural network. In: 2020 4th international conference on intelligent computing and control systems (ICICCS). IEEE, pp 865–870 13. Kanagaraj H, Muneeswaran V (2020, March) Image compression using HAAR discrete wavelet transform. In: 2020 5th international conference on devices, circuits and systems (ICDCS). IEEE, pp 271–274 14. Muneeswaran V, Rajasekaran MP (2019) Automatic segmentation of gallbladder using intuitionistic fuzzy based active contour model. Microelectronics, electromagnetics and telecommunications. Springer, Singapore, pp 651–658 15. Perumal B, Kalaiyarasi M, Deny J, Muneeswaran V (2021) Forestry land cover segmentation of SAR image using unsupervised ILKFCM. Materials today proceedings 16. Muneeswaran V, Nagaraj P, Godwin S, Vasundhara M, Kalyan G (2021, May) Codification of dental codes for the cogent recognition of an individual. In: 2021 5th international conference on intelligent computing and control systems (ICICCS). IEEE, pp 1387–1390 17. Perumal B, Deny J, Devi S, Muneeswaran V (2021, May) Region based skull eviction techniques: an experimental review. In: 2021 5th international conference on intelligent computing and control systems (ICICCS). IEEE, pp 629–634 18. Varma CG, Nagaraj P, Muneeswaran V, Mokshagni M, Jaswanth M (2021, May) Astute segmentation and classification of leucocytes in blood microscopic smear images using titivated K-means clustering and robust SVM techniques. In: 2021 5th international conference on intelligent computing and control systems (ICICCS). IEEE, pp 818–824
606
K. Vignesh et al.
19. Li L, Muneeswaran V, Ramkumar S, Emayavaramban G, Gonzalez GR (2019) Metaheuristic FIR filter with game theory based compression technique—a reliable medical image compression technique for online applications. Pattern Recogn Lett 125:7–12 20. Nagaraj P, Muneeswaran V, Kumar AS (2020, May) Competent ultra data compression by enhanced features excerption using deep learning techniques. In: 2020 4th international conference on intelligent computing and control systems (ICICCS). IEEE, pp 1061–1066 21. Nagaraj P, Rajasekaran MP, Muneeswaran V, Sudar KM, Gokul K (2020, August) VLSI implementation of image compression using TSA optimized discrete wavelet transform techniques. In: 2020 third international conference on smart systems and inventive technology (ICSSIT). IEEE, pp 667–670 22. Muneeswaran V, Nagaraj P, Dhannushree U, Lakshmi SI, Aishwarya R, Sunethra B (2021) A framework for data analytics-based healthcare systems. Innovative data communication technologies and application. Springer, Singapore, pp 83–96 23. Nagaraj P, Deepalakshmi P (2020) A framework for e-healthcare management service using recommender system. Electron Gov Int J 16(1–2):84–100 24. Vamsi AM, Deepalakshmi P, Nagaraj P, Awasthi A, Raj A (2020) IOT based autonomous inventory management for warehouses. EAI international conference on big data innovation for sustainable cognitive computing. Springer, Cham, pp 371–376 25. Muneeswaran V, Bensujitha B, Sujin B, Nagaraj P (2020) A compendious study on security challenges in big data and approaches of feature selection. Int J Control Autom 13(3):23–31 26. Nagaraj P, Muneeswaran V, Rajasekaran MP, Sudar KM, Sumithra M (2021) Implementation of automatic soil moisture dearth test and data exertion using internet of things. Emerging technologies in data mining and information security. Springer, Singapore, pp 511–517 27. Nagaraj P, Aakash M, Arunkumar M, Balananthanan K, Dharanidharan A, Rajkumar C (2020) Analysis of data mining techniques in diagnalising heart disease. Intell Syst Comput Technol 37:257 28. Muneeswaran V, Nagaraj MP, Rajasekaran MP, Chaithanya NS, Babajan S, Reddy SU (2021, July) Indigenous health tracking analyzer using IoT. In: 2021 6th international conference on communication and electronics systems (ICCES). IEEE, pp 530–533 29. Ramakala R, Thayammal S, Ramprakash A, Muneeswaran V (2017, December) Impact of ICT and IOT strategies for water sustainability: a case study in Rajapalayam-India. In: 2017 IEEE international conference on computational intelligence and computing research (ICCIC). IEEE, pp 1–4 30. Perumal B, Muneeswaran V, Pothirasan N, Reddy KRM, Pranith KSS, Chaitanya K, Kumar RK (2021, July) Bee eloper: a novel perspective for emancipating honey bees from its comb using a contrivable technique. In: AIP conference proceedings, vol 2378, no 1. AIP Publishing LLC, p 020003 31. Nagaraj P, Muneeswaran V, Sudar KM, Ali RS, Someshwara AL, Kumar TS (2021, May) Internet of things based smart hospital saline monitoring system. In: 2021 5th international conference on computer, communication and signal processing (ICCCSP). IEEE, pp 53–58 32. Muneeswaran V, Rajasekaran MP (2016, March) Performance evaluation of radial basis function networks based on tree seed algorithm. In: 2016 international conference on circuit, power and computing technologies (ICCPCT). IEEE, pp 1–4 33. Sudar KM, Deepalakshmi P, Nagaraj P, Muneeswaran V (2020, November) Analysis of cyberattacks and its detection mechanisms. In: 2020 fifth international conference on research in computational intelligence and communication networks (ICRCICN). IEEE, pp 12–16 34. Sudar KM, Deepalakshmi P, Ponmozhi K, Nagaraj P (2019, December) Analysis of security threats and countermeasures for various biometric techniques. In: 2019 IEEE international conference on clean energy and energy efficient electronics circuit for sustainable development (INCCES). IEEE, pp 1–6 35. Sudar KM, Beulah M, Deepalakshmi P, Nagaraj P, Chinnasamy P (2021, January) Detection of distributed denial of service attacks in SDN using machine learning techniques. In: 2021 international conference on computer communication and informatics (ICCCI). IEEE, pp 1–5
A Framework for Analyzing Crime Dataset …
607
36. Sudar KM, Nagaraj P, Deepalakshmi P, Chinnasamy P (2021, January) Analysis of intruder detection in big data analytics. In: 2021 international conference on computer communication and informatics (ICCCI). IEEE, pp 1–5 37. Sudar KM, Lokesh DL, Chowdary YC, Chinnasamy P (2021, January) Gas level detection and automatic booking notification using IOT. In: 2021 international conference on computer communication and informatics (ICCCI). IEEE, pp 1–4 38. Nagaraj P, Deepalakshmi P, Romany FM (2021) Artificial flora algorithm-based feature selection with gradient boosted tree model for diabetes classification. Diabetes Metab Syndr Obes Targets Ther 14:2789 39. Sujith Kumar V (2020) Perceptual image super resolution using deep learning and super resolution convolution neural networks (SRCNN). Intell Syst Comput Technol 37:3 40. Nagaraj P, Muneeswaran V, Ali RS, Kumar TS, Someshwara AL, Pranav J (2020, September) Flexible Bolus insulin intelligent recommender system for diabetes mellitus using mutated Kalman filtering techniques. In: Congress on intelligent systems. Springer, Singapore, pp 565– 574 41. Sharan ES, Kumar KS, Madhuri G (2021, July) Conceal face mask recognition using convolutional neural networks. In: 2021 6th international conference on communication and electronics systems (ICCES). IEEE, pp 1787–1793 42. Agarwal J, Nagpal R, Sehgal R (2013) Crime analysis using k-means clustering. Int J Comput Appl 83(4) 43. Ali WA, Alalloush H, Manasa KN (2020) Crime analysis and prediction using K-means clustering technique. EPRA Int J Econ Bus Rev (EPRA IJRD) 5(7) 44. Krishnamurthy R, Kumar JS (2012) Survey of data mining techniques on crime data analysis. Int J Data Min Tech Appl 01(02) 45. Win T, Phyo EE (2019) Predicting of crime detection using K-means clustering algorithm. Int J Eng Trends Appl (IJETA) 6(3) 46. Soundarya V, Kanimozhi U, Manjula D (2017) Recommendation system for criminal behavioral analysis on social network using genetic weighted K-means clustering. J Comput 12(3):212–220 47. Nazeer KA, Sebastian MP (2009, July) Improving the accuracy and efficiency of the k-means clustering algorithm. In: Proceedings of the world congress on engineering, vol 1. Association of Engineers, London, pp 1–3
Seed Set Selection in Social Networks Using Community Detection and Neighbourhood Distinctness Sanjeev Sharma and Sanjay Kumar
Abstract In recent years, the analysis on social networks has evolved so much. A particular piece of information can be passed from one user to another, and as there are many links between the nodes of the network, the same information can be received by a large number of users just by the ongoing process of information transmission between the adjacent nodes of the social network. But a social network can even have millions or perhaps billions of nodes, so if someone is to send a particular message to all the users by ourselves, it could be very time consuming and inefficient. So, it would be better if small set of nodes are chosen initially, called the seed set, and let them pass the information to the major part of the remaining network. These selected nodes are also called spreader nodes, such a set should be chosen from a large number of nodes. An approach using community detection and local structure of the nodes has been proposed to find out the seed set. Keywords Social networks · Seed set · Centrality · Community detection · Influence · Distinctness
1 Introduction All the popular social Websites like ‘Facebook’, ‘Twitter’, ‘Instagram’, ‘LinkedIn’, ‘Reddit’, etc., use social networks for better user outreach, recommendations, and many other use cases. Even the E-commerce Websites like ‘Amazon’, ‘Flipkart’, etc., use them for better product recommendations, understanding the clients’ sentiments, etc. One of the most important uses of social networks is the ‘information flow’ that can happen within the network. But one cannot send the same message to all the nodes present in a social network, hence we pass the message only to the spreader S. Sharma (B) · S. Kumar Department of Computer Science and Engineering, Delhi Technological University, Main Bawana Road, New Delhi 110042, India S. Kumar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. Saraswat et al. (eds.), Congress on Intelligent Systems, Lecture Notes on Data Engineering and Communications Technologies 114, https://doi.org/10.1007/978-981-16-9416-5_44
609
610
S. Sharma and S. Kumar
nodes and let the message flow within the network via the diffusion process. This is also called as influence maximization [1] because it is the influence of initially chosen seed nodes, that the same message has been transmitted to many nodes. In computer science, the process of finding out the optimal seed set, which has the maximum influence and minimum size, is termed as a NP-hard problem as in social networks [2], there can be billions of edges between the nodes, we cannot analyze each one of those. Influence maximization is also used by popular companies or people to spread their desired information to most of the network users. There are two things which are most important when it comes to influence maximization: • Finding out the seed set or spreader nodes • The information diffusion model [3, 4] which used to simulate the process of information transmission. The process of information diffusion can also be compared with the spread of any communicable disease, where an infected person can transmit the disease to other people who are not yet infected from the disease. This process can go on and can infect the majority of the population. In the proposed algorithm, we have used the ‘susceptible infected recovered (SIR)’ [5] model to simulate the information diffusion process. Node centrality measures [6] are used to rank the nodes of a social network on the basis of their importance. More the importance, more is the chances of picking that node as one of the chosen spreader nodes from where the diffusion process shall be starting. After ordering the nodes according to their importance, we pick some top nodes from the set and make them the seed set. There are basically three types of centrality measures available. Local structure-based: these measures exploit the local structure around a node to find out the importance of that particular node. These have lesser time complexities than other measures as we only examine the neighbourhood of a node. They can keep track of the local topologies, degrees, paths going from some particular edge, etc. Semi-local structure-based: these measures not only use the local structure around a node, but also the global structure to rank the nodes according to their importance. Global structure-based: these measures exploit the global structure of the social network to find out the importance of nodes and rank them accordingly. They can use all pairs shortest paths and use it to find the importance of some edge, which will in turn help us to find the importance of the nodes attached to that edge. These are more efficient than the local structure-based centrality measures but have high time complexity. One more important thing which needs to be considered whilst finding out the seed set is that it should have the least interference or the overlapped influence effect of spreader nodes. Hence, the spreader nodes should be chosen in a way such that they are reasonably far apart from each other so that it has maximum influence over the graph with minimum overlapping [7]. We have tried to demonstrate the spreader nodes along with other normal nodes in Fig. 1. Let us now see few of the popular centrality measures which are used in social networks. Degree centrality [8] is based on the degree of nodes. A node can have two types of degrees, i.e. in-degree and out-degree. It is a local centrality measure as it takes care
Seed Set Selection in Social Networks …
611
Fig. 1 Dummy graph showing seed set in green colour
of only the local structure around a node [9]. More the degree of a node, more is the degree centrality. Closeness centrality calculates the distance between a particular node and all the other nodes. It basically helps in finding out how close a node is to other vertices. It is a global structure-based centrality measure. It has a time complexity of O(n3). Betweenness centrality [10] also uses the global structure of the social network. It uses the concept of finding out the number of times a vertex is present between the shortest routes of any two vertices. PageRank centrality is used to sequence Web pages according to their importance. A node is considered to be important if it is connected to other important nodes. It is also used by ‘Google’ to rank their Web pages according to their importance. The Web pages can be thought of as nodes of social networks and the links to other pages, contained in Web pages can be considered as edges of social networks [11]. K-shell centrality [12] basically visualizes the network in terms of layers, the nodes which are present in the inner layers have more K-shell centrality and the nodes present on the outer levels or layers have less K-shells centrality. So, sometimes it is not able to detect the hubs present on the peripherals of the network. H-Index centrality [13] considers the local structure around a node; it is equal to the largest value ‘h’ such that a node has at least ‘h’ adjacent nodes and all of them must have a degree which is greater than or equal to ‘h’. So, it performs better than degree centrality as it kind of takes care of a larger neighbourhood around a node. The different centrality measures focus on different aspects of the social graph, some focus on the local structure around a node, some focus on the global structure, some applies machine learning algorithms, and some combines multiple aspects of the social graph. It is better not to restrict ourselves to a single measure and try to combine the effect of global as well as local structure around a node when we come up with a centrality measure.
612
S. Sharma and S. Kumar
2 Related Work After selecting the seed set, next step is to find out the influence of that seed set over the social network. It will help to find out how many new nodes, the original information transmitted from the initially chosen spreader nodes. It will help to find out the efficiency of an algorithm by comparing the performance metric ‘infection scale’ later on. Information diffusion can be thought of as an epidemic spreading situation. The susceptible infected recovered (SIR) [14] model is used in this paper for the proposed algorithm to simulate the information diffusion process. In this SIR model, there are three types of nodes which are S-susceptible, I-infected nodes, and R-recovered nodes. S-susceptible nodes: These have high chances of getting infected, or in other words, there is a high chance that the information flow will reach this node. I-infected nodes: These are already infected or have the information which needs to be multicast to many nodes. R-recovered nodes: These nodes had been infected but now they have recovered, or in other words, they cannot spread the information any more. Another popular information diffusion model is ‘independent cascade’ (IC) [3]. In IC model, if a node ‘u’ is already infected, then there is a probability attached to every edge associated with node u, which tells the probability of successful transmission of the information from that infected node to its adjacent nodes. In a social network, there can be millions of nodes and billions of edges, so there can be regions within the network which contain the nodes having a strong connection between them, but not with other nodes outside that region. These regions can be termed as communities. Community detection [15] also helps to select the seed set [16] in such a way that spreader nodes are a bit far apart from each other [17]. One or more nodes can be picked in the seed set from a particular reasonably sized community and that may be enough to spread the information within that particular community. It can be done for all the communities. Community detection helps to find a seed set which has minimum interference and maximum influence over the network. It can find out the communities by Brute force method, but it will not be efficient as there can be a high number of nodes in the social network. One good method to find out the communities present in our social graph is the ‘Girvan Newman’ [18] method which uses the betweenness centrality measure to remove the edges one by one till we have finally got our desired number of communities. In every iteration, it finds out the betweenness centrality associated with every edge, then it picks up the edge having the largest betweenness centrality and removes it from the graph. It keeps on doing this step, until it has not got the communities. An edge having the largest value of betweenness centrality can be removed from the graph, it will help in restructuring the graph which will have proper communities. It has been tried to demonstrate the different communities present in the graph in Fig. 2.
Seed Set Selection in Social Networks …
613
Fig. 2 Dummy graph showing different communities present in it
3 Methods The proposed algorithm can be used to find out the seed set for a social network using which it can have a good influence over the network. It tries to make sure that the spreader nodes are far away from each other as it would be using community detection. It will also find out a local structure-based centrality measure called ‘distinctness centrality’ which along with a community score attached to a node, can be used to rank the nodes according to their importance. The steps are as follows: (1)
(2)
(3)
(4)
First of all, community detection is performed using the ‘Girvan Newman’ method. It used the betweenness centrality to find out the different communities. A community score is attached to each node. The idea that a node belonging to a community containing a greater number of nodes should have a higher community score. Also, a distinctness centrality is associated with every node. It helps us to find out how many distinct nodes we can reach up to ‘k’ hops, if we start from the current node. Depth first search is used to find out the distinctness centrality, during the depth first search, we keep on updating a ‘set’, containing the new nodes we have reached from the source node. It only performed the depth first search up to k-hops. There can be regions where it is having nodes of more degree centrality, but actually they are strongly connected only within themselves, so it will not be able to reach to more nodes in these regions compared to the regions where it can reach to a greater number of distinct number of nodes up to k-hops, if someone starts from a source node. This centrality helps to rank the nodes accordingly such that the latter nodes can be used in the seed set. Now, it has two scores associated with every node, i.e. community score and distinctness score.
614
S. Sharma and S. Kumar
(5)
Only one or more nodes can be picked from one community which have the sum of both the scores greater than some threshold value. So, community detection helped to spread the spreader nodes and distinctness centrality helped us to find the more important nodes within a community using the local structure around that node. Finally, pick top ‘X’ nodes and select them as the spreader nodes from where it will start the information diffusion process.
(6)
(7)
4 Datasets For testing the efficiency of the algorithm, the popular ‘Zachary’s Karate Club’ [19] dataset has been used, containing 34 nodes and 78 edges as well. It has an average degree of 4.5. It is a social network within a karate club where edges represent friendships and nodes represent students of karate club at US university in the 1970s. The second dataset used is ‘social circles: Facebook’ [20], it contains the friend’s list of the users from Facebook. The third one is the Cond Mat dataset [21]. The description of the graphs is given in Table 1. The Zakhary’s Karate Club dataset has been showed in Fig. 3. Table 1 Information regarding the graph datasets used for testing our algorithm
Graph
No. of edges
No. of nodes
Average degree
Zakhary’s Karate Club
78
34
4.5
Facebook
88,234
4039
43
Cond Mat
93,497
23,133
8
Fig. 3 Zakhary’s Karate Club dataset demonstration
Seed Set Selection in Social Networks …
615
5 Performance Metrics In this paper, two performance metrics to judge the efficiency of the algorithm which are infection scale and recovered nodes. They help in identifying the influence impact of an initially chosen seed set. A brief description of these metrics is as follows: A.
Infection Scale
The infection scale is the percentage of the network infected by the spreaders over a specific time period. Infections start with the spreaders and spread throughout the network list of concerned centralities of every node in the Toy network. The infection scale rises over time and then gradually decreases as the nodes heal. B.
Recovered Nodes
Only after a node has been infected and successfully infected its neighbours is it recovered. As a result, a measure of the total number of recovered nodes gives an idea of the proportion of the network that has been able to affect the network’s infection. The number of spreaders initially chosen in the network affects the number of recovered nodes. It calculated the average of all retrieved nodes in our research.
6 Results The proposed algorithm has been tested based on infection scale and recovered nodes performance metrics. The distinctness method was able to produce more infection scale and a better rate of recovered nodes per spreaders fraction. Figures 4, 5, and 6 show the infection scale versus time evaluation of the different algorithms which
Fig. 4 Infection scale versus time for Zakhary’s Karate Club dataset
616
S. Sharma and S. Kumar
Fig. 5 Infection scale versus time for Facebook social circles dataset
Fig. 6 Infection scale versus time for Cond Mat dataset
have mentioned in the figures. Similarly, Figs. 7, 8, and 9 display the recovered nodes versus spreader fraction graphs of various algorithms. The number of nodes chosen as spreaders as a percentage of the total number of nodes in the network is known as the spreaders fraction. The ‘distinctness’ centrality seemed to have performed better than the usual centrality measures.
Seed Set Selection in Social Networks …
617
Fig. 7 Recovered nodes versus spreaders fraction for Zakhary’s Karate Club
Fig. 8 Recovered nodes versus spreaders fraction for Facebook social circles
7 Conclusions and Discussion The distinctness method performs better in the graphs which have a community relation amongst it, as it tries to select fewer nodes from a particular community. The algorithm could be enhanced for future works by introducing any other measure along with so that it could work better in the social graphs where there is no community present in the graph. Machine learning algorithms could also be used to improve the algorithm’s efficiency [22].
618
S. Sharma and S. Kumar
Fig. 9 Recovered nodes versus spreaders fraction for Cond Mat dataset
References 1. Kempe D, Kleinberg J, Tardos E (2003) Maximizing the spread of influence through a social network. In: KDD, pp 137–146 2. Li Y, Fan J, Wang Y, Tan K-L (2018) Influence maximization on social graphs: a survey. IEEE Trans Knowl Data Eng 1–1. https://doi.org/10.1109/TKDE.2018.2807843 3. Goldenberg J, Libai B, Muller E (2001) Talk of the network: a complex systems look at the underlying process of word-of-mouth. Mark Lett 12(3):211–223 4. Kumar S, Saini M, Goel M, Panda BS (2021) Modeling information diffusion in online social networks using a modified forest-fire model. J Intell Inf Syst 56(2):355–377 5. Satsuma J, Willox R, Ramani A, Grammaticos B, Carstea AS (2004) Extending the SIR epidemic model. Physica A Stat Mech Appl 336(3–4):369–375 6. Das K, Samanta S, Pal M (2018) Study on centrality measures in social networks: a survey. Soc Netw Anal Min 8–13 7. Kumar S, Panda BS, Aggarwal D (2020) Community detection in complex networks using network embedding and gravitational search algorithm. J Intell Inf Syst 1–22 8. Freeman LC (1978) Centrality in social networks conceptual clarification. Soc Netw 1:215–239 9. Kumar S, Lohia D, Pratap D, Krishna A, Panda BS (2021) MDER: modified degree with exclusion ratio algorithm for influence maximization in social networks. Computing 1–24 10. Freeman LC (1977) A set of measures of centrality based on betweenness. Sociometry 40(1):35–41 11. Kumar S, Panda A (2021) Identifying influential nodes in weighted complex networks using an improved WVoteRank approach. Appl Intell 1–15 12. Kitsak M et al (2010) Identification of influential spreaders in complex networks. Nat Phys 6(11):888 13. Zareie A, Sheikh Ahmadi A (2018) EHC: extended H-index centrality measure for identification of users’ spreading influence in complex networks. Physica A 14. Liu Y, Tang M, Zhou T, Do Y (2016) Identify influential spreaders in complex networks, the role of neighbourhood. Physica A Stat Mech Appl 452:289–298 15. Huang H, Shen H, Meng Z, Chang H, He H (2019) Community-based influence maximization for viral marketing. Appl Intell 49(6):2137–2150
Seed Set Selection in Social Networks …
619
16. Kumar S, Hanot R (2021) Community detection algorithms in complex networks: a survey. In: Advances in signal processing and intelligent recognition systems (SIRS). Communications in computer and information science, 2020, vol 1365. Springer, Singapore 17. Kumar S, Singhla L, Jindal K, Grover K, Panda BS (2021) IM-ELPR: influence maximization in social networks using label propagation-based community structure. Appl Intell 1–19 18. Newman MEJ (2004) Fast algorithm for detecting community structure in networks. Phys Rev E 69(6):066133. https://doi.org/10.1103/physreve.69.066133 19. https://networkx.org/documentation/stable//auto_examples/graph/plot_karate_club.html. Last accessed on 10 Oct 2021 20. https://snap.stanford.edu/data/egonets-Facebook.html. Last accessed on 10 Oct 2021 21. Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: densification and shrinking diameters. ACM Trans Knowl Discov Data (ACM TKDD) 1(1) 22. Bhowmik A, Kumar S, Bhat N (2019) Eye disease prediction from optical coherence tomography images with transfer learning. Engineering applications of neural networks. EANN 2019. communications in computer and information
An Optimized Active Learning TCM-KNN Algorithm Based on Intrusion Detection System Reenu Batra, Manish Mahajan, and Amit Goel
Abstract A new network structure can be designed for optimization of network flow management known as software-defined network (SDN). Many of network technologies have been moved from traditional networks to SDN because of their static architecture and decentralized property. An efficient network management coupled with network monitoring can be achieved with help of SDN-based network structure. The overall network performance can be increased by configuring the network programmatically. Most of applications mainly rely on SDN-based network structures as it isolates the forwarding packet mechanism from the routing task of the network. This results in reduced network loads on single module and generates efficient network. With the rapid growth of Internet, technology flow rate of data over network is also increasing. This increase in flow rate results in rapid increase in distributed denial of service (DDoS) attacks over the network. As a result, performance of network may degrade because of non-availability of resources to the intended user. DDoS attacks consume the network bandwidth and resources resulting in disrupted network service to the device connected to Internet. Machine learning and data mining techniques can be used for detection of attacks over network. Simulation of OpenFlow switches, RYU controllers, and other modules over SDN can result in better network management and detection of attack over network. Keywords Algorithm · Denial of service (DoS) · Attacks · Software-defined network (SDN) · Optimization · Intrusion
R. Batra (B) SGT University, Gurugram, Haryana, India M. Mahajan Department of Computer Science, Amity University, Mohali, India A. Goel Galgotias University, Greater-Noida, Uttar Pradesh, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. Saraswat et al. (eds.), Congress on Intelligent Systems, Lecture Notes on Data Engineering and Communications Technologies 114, https://doi.org/10.1007/978-981-16-9416-5_45
621
622
R. Batra et al.
1 Introduction In previous times, many of networking applications were rely on traditional network structures. In today’s time, tradition networks have been replaced by software-defined network (SDN) because of its functionalities. SDN network is basically a high bandwidth network provides many benefits like load balancing, reduced network cost, and more coarse security [1]. Besides of these features, SDN also provides more flexibility in network usage and operations to be performed on networks. Network control plane and data forwarding plane are the two main components of SDN architecture. Network control plane mainly consists of a single controller or several controllers. A controller known as brain of SDN. The main characteristic of network control plane is that it is directly programmable. Resources over SDN need to manage by administrators and these resources can be secured with help of SDN programs. Data over SDN are basically handled by three layers: application, control, and data layers. The work of these components/layers can be supervised by a person named system manager. Figure 1 depicts the complete architecture of SDN with its components. Allocation of different resources can be done on application plane. On the other hand, management of all network entities can be done on control plane [2]. Data plane is responsible for handling all networking devices. SDN-based organizations can create their new services and models for running their dynamic applications.
Application
SDN Controller Control Plane
SDN Protocols
Physical Network Infrastructure Plane Fig. 1 SDN architecture
An Optimized Active Learning TCM-KNN Algorithm …
623
The communication between SDN controller and various networking devices can be done with help of protocol named OpenFlow (OF) protocol. For any of device who wants to communicate with network controller must support OF protocol. SDN can make use of this interface to do changes in flow table which allows network administrator to change in traffic flow when required in order to maintain efficient management of network. There are interfaces in between all three planes and these interfaces are described with help of application programming interface (APIs). One API named southbound API (S-API) is used to enable the communication between controller, switches, and other lower-level components. On the other hand, higher level components make their communications with help of northbound API (N-API) [3]. All the switches and controllers in SDN networks are OF supported. OpenFlow basically acts as medium of communication between controllers, switches, and other devices. Some of open-source controller platforms are available like Beacon, OpenDaylight, and Floodlight.
2 Related Work In the past age, several intrusion detection mechanisms and techniques were there for finding out anomaly over the network flow. It was preferable to use data mining mechanisms. One of common data mining method was used mining audit data for automated models for intrusion detection (MADAM ID). This model uses association rules to gain high accuracy in detection. Other commonly used method is audit data analysis and mining (ADAM) [4]. The main benefit of ADAM is that it can be used to find out known attacks as well as unknown attacks. It also makes use of association rules and uses classification for anomaly detection over offline environment. A real-time data mining-based intrusion detection method after ADAM is intrusion detection using data mining techniques (IDDM). IDDM uses the concept of association rules, meta rules, and characteristic rules [5]. IDDM mainly uses network data and network information to find out any deviation on network transaction. A few machine learning intrusion detection algorithms came into picture. These algorithms mainly include random forest (RF), decision tree (DT), neural network, and support vector machines (SVMs). Some of these algorithms can be used only in supervised environment. On the other hand, some of them can be used in both supervised and unsupervised anomaly detection environment. With the advancement in various machine learning algorithms, a better detection efficiency can be achieved. Traditional machine learning algorithms in real-time environment does not gives a satisfactory result on anomaly detection. Therefore, it generates a need of well promising and effective anomaly detection mechanism in real-time environment.
624
R. Batra et al.
3 Distributed Denial of Service Attacks (DDoS) Over SDN There are many algorithms used over SDN for detection of attacks. The result shows that the algorithm performs well as compared to other algorithms by giving high precision, low false rate. Also, this algorithm has better adaption to SDN as compared to others [6]. DDoS attacks mainly effect the performance of SDN network. DoS attacks mainly directed toward the controller in SDN. The switches over SDN come under the control of attacker and attacker keeps the controller busy in resolving the queries of network. Spoofing attack is an example of DDoS attacks in which attacker sends several overwhelming requests, thereby flooding the server. As a result, server gets break down and requests from the authorized user is not handled by the server [7]. This led to a DDoS attack on server. This type of DDoS attack therefore obstructs the permissible user to use the network service and disturbs the network flow (Fig. 2). DDoS attacks result in wastage of computation time as the attacker continuously sends requests to the controller so that it does not response to the legitimate user requests [8]. It becomes easy to find out DDoS attack over SDN if controller finds out traffic coming from a single source continuously. Continuous monitoring and new policies over SDN make spoofing less effective. When the attacker takes controller over switch it may lead to man in middle (MiM) DDoS attack which can be overcome by if the controller verifies every switch before entering in to the SDN. Based on data plane attacks and control plane attacks, we have two categories of DDoS attacks.
Fig. 2 DDoS attacks over SDN
An Optimized Active Learning TCM-KNN Algorithm …
625
3.1 Data Plane DDoS Attacks All networking devices used over SDN can be handled with help of data plane. Attacks over data plane mainly categorized in two classes, namely volumetric attacks and protocol exploitation attacks. In case of volumetric attacks, a number of requests are sent to the victim by the attacker. As a result, victim gets flooding of requests and remains busy all time [9]. Sometimes the victim gets crashed because of such heavy load. As a result, the request from the legitimate user remains get pending. Some examples of these types of attacks are ICMP flood, UDP flood, Smurf attack. On the other hand, protocol exploitation attacks have intension to grasp all network resources as well as application resources [10]. As a result, intended user not able to fulfill its request because of resources used by attacker. Resources like memory, bandwidth, hardware resources, and software resources get exhausted by the attacker and it disrupts the functioning of intended user. SYN-flood is an example of protocol exploitation attacks. In this attack, attacker continuously sends the SYN packets over the network and does not wait for any acknowledgment from other side, i.e., server. As a result, networks got flooded and crashes. In this type of attack, server is continuously busy in allocating the memory to the SYN packets coming from attacker. Ping of death is another example of this type of attack [11].
3.2 Control Plane DDoS The main module of control plane is controller who mainly control all the switches used over network. In the control plane attacks, attacker sends a huge amount of flow to the switch. As a result, switch unable to process packet from the intended source [12]. A large of bandwidth also exhausted in processing the packets from the attacker side. This results in network break down because of non-availability of resources. There may be case when the attacker takes overall control of controller in his hand and a result controller is not able to further control the switches.
4 DDoS Detection Over SDN As the number of packets from a source increases rapidly, then it is possible to detect DDoS attack over the network. There are some of strategies used for detection of attacks over SDN.
626
R. Batra et al.
4.1 Detection of Volumetric Attacks As in volumetric attacks, attacker sends a huge number of packets. These volumetric attacks can be detected by setting up a threshold value for the packets that defines of packets flowing per second [13]. If the no. of packets flowing in one second crosses the threshold value that indicates the volumetric DDoS attack over the network. SFlow tool was used by the authors in order to monitor the flow over the network. Authors implemented a FlowTrApp mechanism for detection of attack by setting up flow parameters like flow duration, flow per second, etc. These values need to be evaluated on every flow of packet for detection of attacks.
4.2 Detection of Protocol Exploitation Attacks Protocol exploitation attacks mainly target the nodes and resources over network. In this, the number of packets is less as compared to volumetric attacks [14]. These types of attacks are easy to find out as packet can be analyzed properly before entering network. Hashing module and flow aggregator are some of the mechanisms developed for analyzing the flow tables. One effective mechanism AVANT GUARD developed for creating a fake controller that firstly checks the flow connection then approves the request.
5 Detection Using Machine Learning Machine learning can be applied for classification of packets in normal and abnormal classes. In machine learning, algorithms need to be trained for classification based on some data set. Decision tree (DT), support vector machine (SVM), and Naïve Bayes (NB) are algorithms which can be used for anomaly detection over SDN [15]. In network anomaly detection domain, TCM-KNN is one of well-suited algorithm that mainly combines transductive confidence learning (TCM) method with K-nearest neighbor (KNN). KNN is basically a classical algorithm used for classification of data values to gain high confidence in detection of traffic [10]. TCM-KNN is a machine learning and data mining method mainly used for outlier detection and pattern recognition [16]. The high detection rate and low false rate results values of TCM-KNN prove its effectiveness in detection of anomalies.
An Optimized Active Learning TCM-KNN Algorithm …
627
5.1 Methods Used for Anomaly Detection The consideration of input space of TCM-KNN is needed before use of TCM-KNN for detection. We measure furthermore five factors for every Web transaction as specified: (1) delay in one-way data, (2) delay in request/answer, (3) loss of packets, (4) length of total transaction, and (5) variation in latency (jitter). These parameters jointly capture a range of QoS needs for applications. Initially, there is a need to map DDoS attacks on a functional vector. Afterward, a pattern can be created for the Web server’s typical state using these feature vectors to draw the distinction between unusual and regular traffic [17]. Genetic algorithms (GA) can be used to do TCM-instance KNN’s selection process in order to save costly computing costs. The training data set can be denoted as TR with instances and all sub-sets of TR consist of the search space associated with instance selection. The chromosomes must then constitute TR subgroups. This is done by employing a binary display [18]. A chromosome is a gene with two potential states: 0 and 1 (one for each case in TR). If the value associated with gene is 1, then it means that corresponding instance is included in the chromosome sub-set of TR. This instance will not happen if it is 0. The selected chromosomes would be considered as a reduced optimal training data set for TCM-KNN after operations of the GA method. Four well-known GAs, i.e., the generation-genetic algorithm (GGA), population-based incremental learning (PIL), catastrophic mutation and heterogeneous recombination (CMHR), and steady state (SS) can be used for accomplishing instance selection requirements. Here is the algorithm of TCM-KNN with confidence value.
6 Background of TCM-KNN Algorithm In past decade, a method named transduction was used to provide trust mechanisms for finding out a point as belonging to one of a number of pre-defined classes. The transductive confidence machines (TCMs) were the first to implement using random forest (RF) and calculate the confidence. The TCM trust measure is based on uniform randomness measures or their approximation. It also provides some means of measuring the confidence for each classification. It is denoted as sorted series (in ascending order) of K-nearest neighbors (KNN) algorithm [19]. The distances (Euclidean distance in this paper) of a point i are to calculate the distance between y pairs of points from other points that have the same y denoted as Di (refer Fig. 3). y −y Moreover, Di j is the jth shortest distance in this sequence and Di is the sorted sequence of distance having points which have different classification from y. For a grouping that is not the same as y’s any point has a number assigned to it. The individual strangeness measure is a method for determining how weird a person is. This metric defines the peculiarity of a point in comparison with the rest of the points. The strangeness measure for a point i with label y can be calculated as
628
R. Batra et al.
Fig. 3 TCM-KNN algorithm for anomaly detection over SDN
(1)
where k is the nearest k neighbors. Here, strangeness can be measured as having ratio of k-nearest neighbor from the same class to sum of k-nearest neighbor from other classes. It can also be stated from above equation that strangeness of an instance will increase if the distance between the points of a same class becomes large and strangeness of an instance will also increase if the distance of a point from other points belonging to different class is small. In similar, manner p-value of an instance point can also be calculated. (2) In Eq. (2), # stands for the set’s cardinality, which is defined as the number of elements in a finite set. In this, enew is the test point’s strangeness value.
7 TCM-KNN with Active Learning Success of a machine learning method for intrusion detection is inextricably linked to the availability of data. Algorithms based on machine learning will further reduce the need for having a specialist and develop more self-contained defense solutions [20]. To be efficient, the TCM-KNN algorithm (refer Fig. 3) also requires such a mechanism that effectively minimize the computational load by limiting the size of the training data range. TCM-KMM requires a mechanism that helps in mainly reducing the computational cost and workload in anomaly detection process without compromising with detection performance. Active learning is a mechanism that can be used with TCM-KNN in order to enhance the detection performance by reducing the training data. In active learning
An Optimized Active Learning TCM-KNN Algorithm …
629
approach, data are divided into two instances types: labeled and unlabeled instances. Data with labeled instances denoted with TR and there is a pool of unlabeled data denoted by U. In active learning approach, there is an active trained learner L. L is basically trained on basis of training data. Another module in active learning is query module denoted with symbol Q. Q’s work is to make decision on which instance from U will be selected for labeling and then further will be added in TR. Q always selects the most informative instances among all instances stored in U. Active learning can be applied on different tasks which help in reducing training data, thereby increasing the performance. In order to incorporate active learning method in TCM-KNN concept of uncertain sampling can be used. It basically measures the confidence of classifier on unlabeled data instances. It mainly calculates the p-value for a classification i = 1, 2, 3, …, n (refer Fig. 4). After finding out p-values, these values are sorted in descending order in order to get two highest p-values, i.e., pj and pk . pj and pk are the corresponding p-values for classification j and k [21]. Prediction of unlabeled instances depends upon pj. For performing prediction task value of pj must be high enough. On the other hand, pk value can be used to find out the confidence level. There are four different cases of pj and pk that help to get the predicted value of instances. Case 1: If pj is high and pk is low, then it shows prediction having high credibility and high confidence value. Case 2: If pj is high and pk is also high, then it shows prediction having high credibility and low confidence value.
Fig. 4 TCM-KNN algorithm with active learning for anomaly detection
630
R. Batra et al.
Case 3: If pj is low and pk is high, then it shows prediction having low credibility and low confidence value. Case 4: If pj is low and pk is low, then it shows prediction having low credibility and high confidence value. After finding out the values of pj and pk , quality of information can be finding out by calculating a new value C i = |pj − pk |. C i is also known as closeness factor. A threshold value µ can be defined with respect to value of C i . If the C i < µ, then a decision is made to include the instance in to training data set and removed from unlabeled pool.
8 Experimental Results Step by step experiments have been performed on NSL-KDD data set in order to prove the virtue of TCM-KNN algorithm. Effectiveness of TCM-KNN can be evaluated by employing active learning with TCM-KNN. To perform experiments, a machine with windows operating system, intel 5® core processor, 3 GHz, 1 GB RAM configuration is used. A machine learning framework named Weka is used to implement the data set. Weka is a framework consisting of machine learning algorithms and tools used for classification, association, clustering, and visualization tasks. NSL-KDD data set is used for intrusion detection system because of its relevancy property so that different methods can be compared with help of data set. NSL-KDD data set mainly consists of four categories collectively containing 24 different attacks. Categories mainly include probe attacks, denial of service attacks (DoS), user to root (U2R) attacks, and remote to local (R2L) attacks. For evaluating the performance of TCM-KNN, NSL-KDD data are sampled into two different data sets. First, data set is basically a training data set containing 49,400 instances including 9470 normal instances, 39286 denial of service (DoS) instances, 127 user to root (U2L) instances, 112 remote to local (R2L) instances, and 405 probe instances. For the untrained testing data sets, 12,350 instances are used for implementing method. For performing the experiments, it is required to firstly normalize the data set values. Some parameters such as detection rate can be used to evaluate the performance of TCM-KNN. Detection rate can be identified with help of true positive (TP) rate and false positive (FP) rate. TP rate can be calculated as ratio of number of detected intrusion instances to the number of intrusion instances in testing data set [22]. In the same manner, FP rate is the ratio of number of instances which are in actual normal but detected as false intrusion instances to the total number of normal instances in the testing data set. To evaluate the performance of active learning-based TCM-KNN, an active and reduced data set with 12 instances have been taken which includes one instance from normal class and other instance from intrusion classes of four different attack groups. A pool of unlabeled 500 instances is also taken for testing data set. With these data values, comparison of performance of TCM-KNN with random sampling and TCM-KNN with active learning can be depicted in Fig.
An Optimized Active Learning TCM-KNN Algorithm …
631
100
Accuracy %
80 60
Random Sampling
40 Acve learning 20 0 0
500
1000
1500
2000
2500
Number of labled instances Fig. 5 Random sampling versus active learning
Figure 5 shows that effectiveness of TCM-KNN with active learning can be seen in term of detection accuracy. A significant accuracy can be achieved with TCMKNN with active learning by having a smaller number of instances as compared to instances in TCM-KNN with random sampling. An accuracy of 99.7% can be marked out with only 40 instances in TCM-KNN active learning method. On the other hand, to gain same accuracy in TCM-KNN with random sampling, it requires around 2000 instances. So, it can be saying that a good enough accuracy can be boosted in TCM-KNN with active learning method with a smaller number of instances, i.e., it reduces the data set size and increases the detection rate [23]. After selecting 5600 data instances from the 98,000 original instances, an accuracy of 100% (2600 abnormal points are all properly identified) of anomalous points with true positive (TP) (refer Fig. 9) and only 1.28% false positive (FP) in the actual network environment. The TP is still high (99.38%) in a real network environment after GA-based instance selection optimization. In addition, the time period of training and detection is all significantly reduced (refer Figs. 6, 7, and 8). In this, most important and interesting result are that the time period for detection of an anomaly is reduced. This allows us to deal with large numbers of anomalies in the real network environment online, using our optimized TCM-KNN, and to take appropriate countermeasures to mitigate them as quickly as possible. Below are the figures describing the comparison of original TCM-KNN and optimized TCM-KNN.
9 Conclusion and Future Work This paper depicted a novel intrusion detection method, TCM-KNN with active learning approach. This approach benefits in terms of high accuracy and high detection rate as compared to random sampling-based TCM-KNN. Based on reduced
632
R. Batra et al.
Fig. 6 Training time improvement
Fig. 7 Detection time improvement
Fig. 8 True positive rate comparison
Fig. 9 False positive rate comparison
data set, a series of experiments have been performed for calculating the effectiveness of TCM-KNN with active learning. It gives improved training time, improved detection time, and TP with high accuracy value. The work can be extended in
An Optimized Active Learning TCM-KNN Algorithm …
633
future by combining TCM-KNN with fuzzy logic approach. This may further aim at optimizing detection performance of intrusion detection methods in real environments. Also, the implementation of TCM -KNN with active learning can be done in unsupervised learning domain for anomaly detection.
References 1. Anithaashri T, Ravichandran G, Baskaran R (2019) Security enhancement for software defined network using game theoretical approach. Comput Netw 157:112–121 2. Todorova MS, Todorova ST (2016) DDoS attack detection in SDN-based VANET architectures. Master Appl Sci 175 3. Behal S, Kumar K, Sachdeva M (2018) D-face: an anomaly based distributed approach for early detection of DDoS attacks and flash events. J Netw Comput Appl 111:49–63 4. Newman LH (2018) Github survived the biggest DDoS attack ever recorded. Wired 1 5. Kupreev O, Badovskaya E, Gutnikov A (2019) DDoS attacks in Q1 6. Hoque N, Kashyap H, Bhattacharyya DK (2017) Real-time ddos attack detection using FPGA. Comput Commun 110:48–58 7. Dayal N, Maity P, Srivastava S, Khondoker R (2016) Research trends in security and DDoS in SDN. Secur Commun Netw 9(18):6386–6411 8. Salloum SA, Alshurideh M, Elnagar A, Shaalan K (2020) Machine learning and deep learning techniques for cybersecurity: a review. In: Joint European-US workshop on applications of invariance in computer vision. Springer, pp 50–57 9. Prasad KM, Siva VS, Nagamuneiah J, Nelaballi S (2020) An ensemble framework for flowbased application layer DDoS attack detection using data mining techniques. In: ICT analysis and applications. Springer, pp 9–19 10. Chen W, Xiao S, Liu L, Jiang X, Tang Z (2020) A DDoS attacks traceback scheme for SDNbased smart city. Comput Electr Eng 81:106503 11. Agrawal N, Tapaswi S (2018) Low rate cloud DDoS attack defense method based on power spectral density analysis. Inf Process Lett 138:44–50 12. Yassin W, Udzir NI, Muda Z, Sulaiman MN et al (2013) Anomaly-based intrusion detection through k-means clustering and Naives Bayes classification. In: Proceedings of the 4th international conference on computer informatics ICOCI 13. Tan Z, Jamdagni A, He X, Nanda P, Liu RP, Hu J (2014) Detection of denial-of-service attacks based on computer vision techniques. IEEE Trans Comput 64(9):2519–2533 14. Saied A, Overill RE, Radzik T (2016) Detection of known and unknown DDoS attacks using artificial neural networks. Neurocomputing 172:385–393 15. Wang B, Zheng Y, Lou W, Hou YT (2015) DDoS attack protection in the era of cloud computing and software-defined networking. Comput Netw 81:308–319 16. Cui Y, Yan L, Li S, Xing H, Pan W, Zhu J, Zheng X (2016) SD-Anti-DDoS: fast and efficient DDoS defense in software-defined networks. J Netw Comput Appl 68:65–79 17. Fallahi N, Sami A, Tajbakhsh M (2016) Automated flow-based rule generation for network intrusion detection systems. In: 24th Iranian conference on electrical engineering (ICEE). IEEE, pp 1948–1953 18. Asadollahi S, Goswami B (2017) Experimenting with scalability of floodlight controller in software defined networks. In: International conference on electrical, electronics, communication, computer, and optimization techniques (ICEECCOT). IEEE, pp 288–292 19. Shaghaghi A, Kaafar MA, Buyya R, Jha S (2020) Software-defined network (SDN) data plane security: issues, solutions, and future directions. In: Handbook of computer networks and cyber security. Springer, pp 341–387 20. Dai Y, He J, Wu Y, Chen S, Shang P (2019) Generalized entropy plane based on permutation entropy and distribution entropy analysis for complex time series. Physica A 520:217–231
634
R. Batra et al.
21. Oshima S, Nakashima T, Sueyoshi T (2010) DDoS detection technique using statistical analysis to generate quick response time. In: International conference on broadband, wireless computing, communication and applications. IEEE, pp 672–677 22. Thomas T, Vijayaraghavan AP, Emmanuel S (2020) Introduction to machine learning. In: Machine learning approaches in cyber security analytics. Springer, pp 17–36 23. Bansal A, Mahapatra S (2017) A comparative analysis of machine learning techniques for botnet detection. In: Proceedings of the 10th international conference on security of information and networks, pp 91–98. 2013 IEEE 7th international conference on intelligent data acquisition and advanced computing systems (IDAACS), vol 2. IEEE, pp 655–660
Ensemble Model of Machine Learning for Integrating Risk in Software Effort Estimation Ramakrishnan Natarajan
and K. Balachandran
Abstract The development of software involves expending a significant quantum of time, effort, cost, and other resources, and effort estimation is an important aspect. Though there are many software estimation models, risks are not adequately considered in the estimation process leading to wide gap between the estimated and actual efforts. Higher the level of accuracy of estimated effort, better would be the compliance of the software project in terms of completion within the budget and schedule. This study has been undertaken to integrate risk in effort estimation process so as to minimize the gap between the estimated and the actual efforts. This is achieved through consideration of risk score as an effort driver in the computation of effort estimates and formulating a machine learning model. It has been identified that risk score reveals feature importance and the predictive model with integration of risk score in the effort estimates indicated an enhanced fit. Keywords Risk score · Estimated effort · Actual effort · Effort driver · Predictive model
1 Introduction There are various activities that are involved in software project management such as estimation, planning, tracking and monitoring, and close-out. The estimation of software development effort is a critically significant activity in the areas of software engineering as well as the management of software projects [1]. Effective estimation of a software project helps to manage and control the projects more efficiently and effectively. At a macro-level, project estimation involves estimation of size, effort, time, and cost. R. Natarajan (B) School of Business and Management, CHRIST (Deemed To Be University), Bangalore, India e-mail: [email protected] K. Balachandran Department of Computer Science and Engineering, School of Engineering and Technology, CHRIST (Deemed To Be University), Bangalore, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 M. Saraswat et al. (eds.), Congress on Intelligent Systems, Lecture Notes on Data Engineering and Communications Technologies 114, https://doi.org/10.1007/978-981-16-9416-5_46
635
636
R. Natarajan and K. Balachandran
The ability to accurately predict the effort required for development of software influences the success of the project. Effort estimation of software development is a highly critical activity [2]. The linkage between project risk management practices and creation of value has been perceived by practitioners [3]. The variation of actual effort/schedule spent in comparison with the planned effort/schedule results in project overruns and also negatively impacts the success of the project. An understanding of the level of overall risk in a project facilitates taking proactive risk-based decisions regarding the project. A quantitative model for risk analysis could be deployed to predict the probability of attaining success in a software project. A holistic risk management approach would need to equally consider risks in the project and also the risks of the organization. Estimates are predictions of project performance in future and based on existing knowledge and methods of software effort estimation. Hence, these estimates do not appropriately reflect the actual outcome. The association of uncertainty to effort estimates may provide the user a perception of the extent of accuracy of the estimates and facilitate in taking the right decisions. The focus of this research is to enhance the software effort estimation process through integration of project risk score and to minimize the gap between the estimated and actual efforts.
2 Theoretical Background Effort estimation in software development projects and the ability to perform with reference to these estimations and achieve project success is an important aspect. However, it continues to be an inadequately mastered discipline. The effectiveness with which tasks and resources for a software project could be scheduled depends on the accuracy of estimated effort [4]. Better estimates in software projects enable project managers to adequately plan for the software life cycle activities [5]. Both over-estimation and under-estimation create challenges and cause an impact on software projects [6]. Under-estimation of effort could lead to overruns in schedule and budget, and this could result in cancelation of the project. Over-estimation could act as a threat for organizational competitiveness. However, issues in effort estimation continue to exist due to uncertainty in the input information. The accurate estimation of effort in a software project is indeed a challenge in industry. A large number of studies indicate bias in human judgment. Further, a lack of awareness of this substantiates the existence of systematic bias in estimates [7]. It may also be required to account for project uncertainty while managing expectations related to estimation accuracy. The investment of more time in the detailed planning process could help in the reduction of estimation errors and extent of deviation from the planned effort estimates [8]. Effort estimation methods always underestimate the quantum of effort needed for completing a software development project. There are various estimation models in practice such as COCOMO, FPA, and SLIM. Effort estimates could also be arrived
Ensemble Model of Machine Learning …
637
at using experts’ judgment, wherein effort estimation is done by estimators utilizing logical reasoning and their expertise [9]. There are various models for estimation of effort, but it is challenging to arrive at a specific model having more accuracy on a specific dataset [10]. However, there is no specific single estimation method that provides the best estimates [11]. It has also been reported by researchers in the area of machine learning and statistics that arriving at an average of estimates from multiple methods is better than using a solo method. Further, techniques based on artificial intelligence have the potential to enhance the efficiency of software development [12]. The concept of risk is applicable to software projects [13]. There are many risks associated with a globally distributed software development environment. Risks get accumulated in a software project [14]. Risks need to be considered to effectively plan, execute, and deliver a software project in a timely manner and within budget while achieving the desired level of quality. Risk is one of the biggest challenges but it has not been adequately addressed by researchers in terms of budget, time, and resources [15]. The extent of deviation between the actual effort expended on the completion of a software project and the effort estimate depends on project characteristics and other factors/constraints that impact the method used for effort estimation. According to research studies, risk is inadequately conceived and performed in practice. Diversity, usage advanced technologies, and complexity lead to exponential diversification of risk factors [16]. Risk focus is needed to ensure performance of software development projects considering the relationship between risk and performance of software development projects. A project manager could take important decisions and also plan/allocate resources better if effort in a software development project could be predicted with accuracy. Based on statistical analysis, it was determined that risk identification has the highest influence on performance of the software product, followed closely by risk response. Hence, awareness of risk management practices is important for a project manager to have an improved project success rate [17]. Project risks have an influence on various parameters, and this illustrates the significance of considering project risks while arriving at the effort estimates [18]. A model is built to depict a real-world phenomenon in a domain. The existence of redundant and less relevant features in a model could make it ineffective computationally and consume more time and cost. This could be avoided using a feature selection approach wherein the most important features are included in the model which in turn, facilitates in enhancing the predictive power of the model. Hence, feature selection is done in the pre-processing stage of model building. Feature selection could be performed by using three methods, namely filter, wrapper, and embedded methods. In the filter method, verification of the relevant features with the output attributes obtained from different statistical methods is performed. The wrapper method relates to the generation of a sub-set of the features and utilizing this for training the model. Features get included or eliminated from the sub-set in an iterative manner until the most appropriate sub-set of features is determined. This is done to ensure that the chosen features/variables have the most relevance in predicting the target value [19]. The importance of predictors could be ranked using sequential forward selection and backward selection [20]. The embedded method of
638
R. Natarajan and K. Balachandran
feature selection involves a combination of filter and wrapper methods. It utilizes algorithms with in-built criteria for feature selection and has a higher degree of prediction accuracy. Ridge and LASSO regression are examples for the embedded method of feature selection.
3 Methodology In this research, integration of project risk score in the effort estimation process is proposed. A standard effort estimation process such as COCOMO and function point analysis involves computation of the project size and then applying the parameters for the specific estimation method with reference to the project size. This would yield the effort estimates for software development. The proposed effort estimation process involves computation of the project size, computation of the risk score in the project considering the probability of occurrence of the risk and the severity of impact, applying this risk score as a function of the parameters for software development effort estimation, and arriving at the risk-integrated effort estimates. The methodology focuses on building a model with the integration of risk score into effort estimates and which could learn over a period of time. This would help to minimize the gap between estimated and actual efforts. Quantitative research design involving data collection from software projects and analysis has been chosen for this research. Data from software projects were collected from NASA dataset and also from organizational software projects. The industry standard NASA dataset from which data has been collected maintains a repository of validated information from submitted projects. These are published and formally available for research purposes. A total of 115 software projects including those from the NASA dataset and organizational software projects were considered. Data pre-processing was done to identify any missing values. The dataset was subjected to validation test to ensure that there are no anomalies/missing information in the dataset. Various tests were performed on the dataset for normality, collinearity, autocorrelation, and heteroscedasticity prior to model building. Analysis of variance (ANOVA) and regression analysis were also performed. A machine learning approach was adopted after dividing the dataset into training and test datasets. As part of the model building process, the important variables were identified using the feature selection approach. Three approaches, namely filter, wrapper, and embedded methods, were considered for feature selection. The embedded method was adopted considering its higher prediction accuracy. Least absolute shrinkage and selection operator (LASSO) regression which is an embedded method of feature selection was utilized for feature selection considering its potential for enhanced prediction accuracy and model interpretability. Model building was done using an ensemble machine learning approach. This approach was finalized after comparing different models that were run using SPSS modeler.
Ensemble Model of Machine Learning …
639
Data pertaining to effort estimates and risks in projects obtained from industry respondents was considered for validation. The effort estimation techniques adopted in these projects included COCOMO and function point analysis. Neural network models (multilayer perceptron models) were built without inclusion of risk score and with inclusion of risk score. The models were subjected to comparison with reference to area under the curve values.
4 Model Building and Analysis The descriptive statistics report yielded the mean and standard deviation values of 2.42 and 1.162, respectively, for the project risk score. When the dataset was subjected to Anderson–Darling test, the data was found to be not normally distributed. The Spearman rank correlation test conducted had a correlation coefficient greater than zero (0.173) between the two variables, project risk score, and effort indicating a movement in the positive direction. Regression analysis was performed on the dataset with risk score included and this yielded a R squared value of 0.910. It was proved through variance inflation factor (VIF) that multi-collinearity does not exist. It was also indicated through statistical tests that autocorrelation does not exist. ANOVA test was done using “estimated effort” as the response variable. The output is shown in the following analysis of variance table (Table 1). The analysis of variance provides a summary of the extent of variance in the data that could be attributed to the individual factors (variables). This is indicated in the F-values as generated in the ANOVA table. Table 1 Analysis of variance table
Variable
Mean square value
F value
Size KSLOC
137,891,726
1042.8807
Project risk
15,980
0.1209
Staff application knowledge
270,631
2.0468
Software logical complexity
23,185
0.1753
Customer participation Staff team skills Tools use
1240
0.0094
385,120
2.9127
45,908
0.3472
Allowance contingency percent
292,400
22.1158
Residuals
132,222
–
640 Table 2 Selection of important variables through LASSO regression
R. Natarajan and K. Balachandran Variable
Adjusted R squared
Project size (in kilo lines of code)
0.885
Project size, contingency allowance %
0.900
Project size, contingency allowance %, project risk
0.904
4.1 Feature Selection for Model Building The software effort estimation dataset that was considered had many independent variables such as project size, project sisk, staff application knowledge, software logical complexity, customer participation, staff team skills, tools used, and contingency allowance percentage for effort estimation. The consideration of not so important features in the dataset could reduce the model accuracy and also cause the learning of the model to be based on features that are not relevant. Hence, it is important to identify the features that contribute most to estimated effort (prediction variable or dependent variable). LASSO regression was deployed on the dataset. During LASSO regression, all variables except project size, contingency allowance percentage, and project risk were removed in the backward step. The adjusted R square values for the abovementioned three variables that remained after elimination of the other variables are indicated in Table 2. The three predictors, namely project size in kilo source lines of code, contingency allowance percentage, and project risk, were chosen automatically by the regression model. The model had an adjusted R square value was 0.885 when project size was the only chosen variable. When project size and contingency allowance percentage were considered as predictors, it yielded an adjusted R square value of 0.900 for the model. The automatic selection of project risk as the third predictor resulted in an increase in the adjusted R squared value to 0.904.
4.2 Modeling Using Ensemble Machine Learning Approach Modeling using ensemble machine learning approach was deployed. An ensemble learning approach was chosen since it could provide a systematic solution through the combination of the predictive power of multiple machine learning models. Model selection after comparison of models. Extreme gradient boost algorithm which is a refined implementation of gradient boosting was considered for ensemble learning. The XGBoost model which uses extreme gradient boosting was chosen after comparing various models as part of the model selection process. SPSS modeler was utilized to deploy a model selection node for the dataset. Various models such as linear modeling, generalized linear model, regression, and XGBoost were run, and
Ensemble Model of Machine Learning … Table 3 Comparison of models for model selection
641
Model
Build time (min)
Correlation
Relative error
Regression