150 28 13MB
English Pages 565 [543] Year 2023
Algorithms for Intelligent Systems Series Editors: Jagdish Chand Bansal · Kusum Deep · Atulya K.Nagar
Ashish Kumar Tripathi Darpan Anand Atulya K. Nagar Editors
Proceedings of World Conference on Artificial Intelligence: Advances and Applications WCAIAA 2023
Algorithms for Intelligent Systems Series Editors Jagdish Chand Bansal, Department of Mathematics, South Asian University, New Delhi, Delhi, India Kusum Deep, Department of Mathematics, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, India Atulya K. Nagar, School of Mathematics, Computer Science and Engineering, Liverpool Hope University, Liverpool, UK
This book series publishes research on the analysis and development of algorithms for intelligent systems with their applications to various real world problems. It covers research related to autonomous agents, multi-agent systems, behavioral modeling, reinforcement learning, game theory, mechanism design, machine learning, meta-heuristic search, optimization, planning and scheduling, artificial neural networks, evolutionary computation, swarm intelligence and other algorithms for intelligent systems. The book series includes recent advancements, modification and applications of the artificial neural networks, evolutionary computation, swarm intelligence, artificial immune systems, fuzzy system, autonomous and multi agent systems, machine learning and other intelligent systems related areas. The material will be beneficial for the graduate students, post-graduate students as well as the researchers who want a broader view of advances in algorithms for intelligent systems. The contents will also be useful to the researchers from other fields who have no knowledge of the power of intelligent systems, e.g. the researchers in the field of bioinformatics, biochemists, mechanical and chemical engineers, economists, musicians and medical practitioners. The series publishes monographs, edited volumes, advanced textbooks and selected proceedings. Indexed by zbMATH. All books published in the series are submitted for consideration in Web of Science.
Ashish Kumar Tripathi · Darpan Anand · Atulya K. Nagar Editors
Proceedings of World Conference on Artificial Intelligence: Advances and Applications WCAIAA 2023
Editors Ashish Kumar Tripathi Department of Computer Science and Engineering Malaviya National Institute of Technology Jaipur Jaipur, Rajasthan, India
Darpan Anand Department of Computer Science and Engineering Sir Padampat Singhania University Udaipur, India
Atulya K. Nagar School of Mathematics, Computer Science and Engineering Liverpool Hope University Liverpool, UK
ISSN 2524-7565 ISSN 2524-7573 (electronic) Algorithms for Intelligent Systems ISBN 978-981-99-5880-1 ISBN 978-981-99-5881-8 (eBook) https://doi.org/10.1007/978-981-99-5881-8 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore Paper in this product is recyclable.
Preface
This book gathers outstanding research papers presented at the World Conference on Artificial Intelligence: Advances and Applications (WCAIAA 2023), held on March 18–19, 2023, at Sir Padampat Singhania University, under the technical sponsorship of the Soft Computing Research Society, India. The conference is conceived as a platform for disseminating and exchanging ideas, concepts, and results from academia and industry researchers to develop a comprehensive understanding of the challenges of intelligence advancements in computational viewpoints. This book will help strengthen friendly networking between academia and industry. We have tried our best to enrich the quality of the WCAIAA 2023 through the stringent and careful peer-review process. This book presents novel contributions to advances in artificial intelligence and its applications and serves as reference material for advanced research. We have tried our best to enrich the quality of the WCAIAA 2023 through a stringent and careful peer-review process. WCAIAA 2023 received many technical contributed articles from distinguished participants from home and abroad. After a rigorous peer review, only 43 high-quality papers were finally accepted for presentation and the final proceedings. This book presents 43 research papers on advances in artificial intelligence and its applications and serves as reference material for advanced research. Jaipur, India Udaipur, India Liverpool, England
Ashish Kumar Tripathi Darpan Anand Atulya K. Nagar
v
Contents
1
A Pragmatic Study of Machine Learning Models Used During Data Retrieval: An Empirical Perspective . . . . . . . . . . . . . . . . . . . . . . . Ankush R. Deshmukh and P. B. Ambhore
2
Smart Knowledge Management for IT Services . . . . . . . . . . . . . . . . . . Sishil Surendran and Rashmi Agarwal
3
Patient-Centric Electronic Health Records Management System Using Blockchain Based on Liquid Proof of Stake . . . . . . . . . Yash Jaiswal, Ayushi Maurya, Ashok Kumar Yadav, and Arun Kumar
4
AI Driving Game Changing Trends in Project Delivery and Enterprise Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sashreek Krishnan and L. R. K. Krishnan
5
Prediction of Children Age Range Based on Book Synopsis . . . . . . . . P. Baby Maruthi and Jyothsna Manchiraju
6
Predicting Credit Card Churn Using Support Vector Machine Tuned by Modified Reptile Search Algorithm . . . . . . . . . . . . . . . . . . . . Marko Stankovic, Luka Jovanovic, Vladimir Marevic, Amira Balghouni, Miodrag Zivkovic, and Nebojsa Bacanin
7
8
Comparison of Deep Learning Approaches for DNA-Binding Protein Classification Using CNN and Hybrid Models . . . . . . . . . . . . B. Siva Jyothi Natha Reddy, Sarthak Yadav, R. Venkatakrishnan, and I. R. Oviya Exploring Jaccard Similarity and Cosine Similarity for Developing an Assamese Question-Answering System . . . . . . . . . Nomi Baruah, Saurav Gupta, Subhankar Ghosh, Syed Nazim Afrid, Chinmoy Kakoty, and Rituraj Phukan
1 13
25
35 51
63
79
87
vii
viii
9
Contents
Artificial Neural Network Modelling for Simulating Catchment Runoff: A Case Study of East Melbourne . . . . . . . . . . . . . Harshanth Balacumaresan, Md. Abdul Aziz, Tanveer Choudhury, and Monzur Imteaz
99
10 Effective Decision Making Through Skyline Visuals . . . . . . . . . . . . . . 119 R. D. Kulkarni, S. K. Gondhalekar, and D. M. Kanade 11 A Review Paper on the Integration of Blockchain Technology with IoT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Anjana Rani and Monika Saxena 12 Survey and Analysis of Epidemic Diseases Using Regression Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Shruti Sharma and Yogesh Kumar Gupta 13 Hybrid Pre-trained CNN for Multi-classification of Rice Plants . . . . 151 Sri Silpa Padmanabhuni, Abhishek Sri Sai Tammannagari, Rajesh Pudi, and Srujana Pesaramalli 14 Cauliflower Plant Disease Prediction Using Deep Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 M. Meenalochini and P. Amudha 15 Disease Detection and Prediction in Plants Through Leaves Using Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 Satyam R. D. Dwivedi, N. Sai Gruheeth, P. Bhargav Narayanan, Ritwik Shivam, and K. Vinodha 16 Classification of Breast Cancer Using Machine Learning: An In-Depth Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 Shweta Saraswat, Bright Keswani, and Vrishit Saraswat 17 Prediction of Age, Gender, and Ethnicity Using Haar Cascade Algorithm in Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . 205 D. Lakshmi, R. Janaki, V. Subashini, K. Senthil Kumar, C. A. Catherine Aurelia, and S. T. Ananya 18 Augmentation of Green and Clean Environment by Employing Automated Solar Lawn Mower for Exquisite Garden Design . . . . . . 221 T. Mrunalini, D. Geethanjali, E. Anuja, and R. Madhavan 19 A Lightweight Solution to Intrusion Detection and Non-intrusive Data Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 Mahnaz Jarin, Mehedi Hasan Mishu, Abu Jafar Md Rejwanul Hoque Dipu, and A. S. M. Mostafizur Rahaman
Contents
ix
20 Efficiency of Cellular Automata Filters for Noise Reduction in Digital Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 Imran Qadir and V. Devendran 21 Effective Mutants’ Classification for Mutation Testing of Smart Contracts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 R. Sujeetha and K. Akila 22 Scheming of Silver Nickel Magnopsor for Magneto-Plasmonic (MP) Activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 Shruti Taksali, Sonam Maurya, and Amruta Lipare 23 Heart Stroke Prediction Using Different Machine Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 Tarun Madduri, Vimala Kumari Jonnalagadda, Jaswitha Sai Ayinapuru, Nivas Kodali, and Vamsi Mohan Prattipati 24 Credit Card Fraud Detection Using Hybrid Machine Learning Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 Yamini Gujjidi, Amogh Katti, and Rashmi Agarwal 25 Smart Air Pollution Monitoring System for Hospital Environment Using Wireless Sensor and LabVIEW . . . . . . . . . . . . . . 317 P. Sathya 26 Mining Optimal Patterns from Transactional Data Using Jaya Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 Honey Sengar and Akhilesh Tiwari 27 Accurate Diagnosis of Leaf Disease Based on Unsupervised Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 S. Jacily Jemila, S. Mary Cynthia, and L. M. Merlin Livingston 28 Modified Teaching-Learning-Based Algorithm Tuned Long Short-Term Memory for Household Energy Consumption Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 Luka Jovanovic, Maja Kljajic, Aleksandar Petrovic, Vule Mizdrakovic, Miodrag Zivkovic, and Nebojsa Bacanin 29 Chaotic Quasi-oppositional Chemical Reaction Optimization for Optimal Tuning of Single Input Power System Stabilizer . . . . . . 363 Sourav Paul, Sneha Sultana, and Provas Kumar Roy 30 Network Intrusion Detection System for Cloud Computing Security Using Deep Neural Network Framework . . . . . . . . . . . . . . . . 377 Munish Saran and Ritesh Kumar Singh 31 Comparison of Activation Functions in Brain Tumour Segmentation Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387 Amisha Nakhale and B. V. Rathish Kumar
x
Contents
32 Detection of Alzheimer’s Disease Using Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401 D. J. Jovina and T. Jayasree 33 Performance Evaluation of Multiple ML Classifiers for Malware Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413 Md. Masroor Fahim, Mahbuba Sharmin Mim, Tahmid Bin Hasan, and Abu Sayed Md. Mostafizur Rahaman 34 An Analysis of Feature Engineering Approaches for Unlabeled Dark Web Data Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429 Ashwini Dalvi, Vedashree Joshi, and S. G. Bhirud 35 Anomaly Detection to Prevent Sensitive Data Exposure Using GMM Clustering Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439 Shivangi Mehta, Lataben J. Gadhavi, and Harshil Joshi 36 Design Analysis and Fabrication of Mono Composite Leaf Spring by Varying Thickness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451 Mangesh Angadrao Bidve and Manish Billore 37 Real-Time Driver Drowsiness Detection System Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457 Apash Roy and Debayani Ghosh 38 Nature-Inspired Information Retrieval Systems: A Systematic Review of Literature and Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463 Bhushan Inje, Kapil Nagwanshi, and Radhakrishna Rambola 39 Deep Learning Based Smart Attendance System . . . . . . . . . . . . . . . . . 485 Prabhjot Kaur, Mridul Namboori, Aditi Pandey, and Keshav Tyagi 40 Prediction of Clinical Depression Through Acoustic Feature Sampling Using Deep Learning and Random Forest Technique Based on BDI-II Scale of Psychiatry . . . . . . . . . . . . . . . . . . 497 Pratiksha Meshram and Radha Krishna Rambola 41 Optical Character Recognition and Text Line Recognition of Handwritten Documents: A Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . 513 Prarthana Dutta and Naresh Babu Muppalaneni 42 Neural Style Preserving Visual Dubbing . . . . . . . . . . . . . . . . . . . . . . . . . 525 Masooda Modak, Anirudh Venugopal, Karthik Iyer, and Jairaj Mahadev
Contents
xi
43 Advanced Pointer-Generator Networks Based Text Generation . . . . 537 Narayana Darapaneni, Anwesh Reddy Paduri, B. G. Sudha, Adithya Kashyap, Roopak Mayya, C. S. Thejas, K. S. Nagullas, Ashwini Kulkarni, and Ullas Dani Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549
About the Editors
Dr. Ashish Kumar Tripathi (Member, IEEE) received his M.Tech. and Ph.D. degrees in Computer Science and Engineering from Delhi Technological University, Delhi, India, in 2013 and 2019 respectively. He is currently working as an Assistant Professor with the Department of Computer Science and Engineering, Malviya National Institute of Technology (MNIT), Jaipur, India. His research interests include big data analytics, social media analytics, soft computing, image analysis, and natural language processing. Dr. Tripathi has published several papers in international journals and conferences including IEEE transactions. He is an active reviewer for several journals of repute. Prof. Darpan Anand is a Qualified Ph.D. (CS) contributing over 20+ years in Teaching (CS) Computer Science and Engineering at various top universities and engineering schools/institutes with 20+ years in Teaching (CS), Research as well as academics, and 6+ years of industrial experience including 2+ years of onsite software development. Currently spearheading efforts as Professor and Head of Department (CSE) at Sir Padampat Singhania University, Udaipur, India. Possessing a flair for teaching with the proven ability to apply the best practice-based and innovation-oriented teaching-learning practices, Outcome-based education in engineering education as well as a collaborative approach to research and a decentralized and supportive style to academic administration. He is an academic leader in Teaching, Curriculum Design and Improvisation, Supervise Departmental research efforts, Technical Evaluation of Prospective faculty, Budgetary and Academic Management of the Department, and targets outcome of projects. Current research interests include Information Security, Software Defined Network, Data ScienceAI&ML, and e-governance. He has guided several Ph.D. and PG Dissertations. He is an author/co-author of more than 60+ research papers (indexed in SCI, ESCI, Scopus, etc.), various books and book chapters (IET, Springer, and Elsevier), patents, etc. He is also a member of various esteemed research associations such as IEEE, ACM, IAENG, TAEI, CSI, AIS, CSTA, etc.
xiii
xiv
About the Editors
Dr. Atulya K. Nagar is the Foundation Chair as Professor of Mathematics at Liverpool Hope University and Pro-Vice-Chancellor for Research since October 2019. He is the Dean of Faculty of Science from May 2014 to September 2019; and Head of School of Mathematics, Computer Science, and Engineering from September 2007 to August 2022. A mathematician by training, he possess multi-disciplinary expertise in nonlinear mathematics, natural computing, bio-mathematics and computational biology, operations research, and control systems engineering. He has an extensive background and experience of working in Universities in the UK and India. He is also an expert reviewer for the Biotechnology and Biological Sciences Research Council (BBSRC) grants peer-review committees for Bioinformatics Panel; Engineering and Physical Sciences Research Council (EPSRC) for High Performance Computing Panel; and serve on the Peer-Review College of the Arts and Humanities Research Council (AHRC) as a scientific expert member. He has edited volumes on Intelligent Systems, and Applied Mathematics; he is the founding series editor for Springer Book Series on Algorithms for Intelligent Systems (AIS) and the Editorin-Chief of the International Journal of Artificial Intelligence and Soft Computing (IJAISC) until 2021. Dr. Nagar has published over 200 publications in prestigious publishing outlets and journals such as the Journal of Applied Mathematics and Stochastic Analysis; the International Journal of Advances in Engineering Sciences and Applied Mathematics; the International Journal of Foundations of Computer Science; the IEEE Transactions on Systems, Man, and Cybernetics; Discrete Applied Mathematics; Fundamenta Informaticae; IET Control Theory and Applications, to name a few.
Chapter 1
A Pragmatic Study of Machine Learning Models Used During Data Retrieval: An Empirical Perspective Ankush R. Deshmukh and P. B. Ambhore
1 Introduction Data retrieval approach modelling is a multi-domain effort that includes data collection, pre-processing, data and information segmentation, feature representation with selection, data classification & categorization, and post-processing procedures. A typical data retrieval model [1] that uses feedback-based query matching is showcased in Fig. 1, wherein 2 operation layers can be observed. Layer 1 is the ‘System’ layer, that uses data acquisition, representation, and organization operations, while, Layer 2 is ‘User’ layer, that uses query, representation, and matching operations. Based on these operations, objects (images, texts, audio, video, etc.) are retrieved, and feedback is given to different parts of the model. Based on this feedback, the model is able to tune its internal weights, which assists in continuous optimization of the retrieval process. Researchers offer a huge number of distinctively varied models [2–6], each with its own set of operating characteristics. The remainder of this paper analyses such models in terms of deployment-specific nuances, application-specific benefits, functional gaps, and context-specific future research opportunities. Based on the findings of this survey, Researchers will be able to create models that are appropriate for their application-specific use cases. Part 3 contrasts the models under consideration in terms of retrieval delay, retrieval accuracy, computational complexity, scalability, and deployment cost. This section also proposes evaluation of a novel Data Retrieval Metric (DRM), which combines these evaluation parameters, and assists in identification of high-performance models suited for large-scale deployments. Finally,
A. R. Deshmukh (B) · P. B. Ambhore Government College of Engineering, Amravati, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_1
1
2
A. R. Deshmukh and P. B. Ambhore
Fig. 1 A typical data retrieval model with feedback-based update operations
this text is concluded with various context-specific and application-specific conclusions about the reviewed models, and recommends methods to further optimize their performance.
2 Literature Review Each retrieval model has distinct internal operating qualities, according to researchers. According to [1], Ensuring the periodic authenticity of actual data in cyber-physical devices is critical for successful decision making and system operation. The vast bulk of real-time data retrieval research assumes that the data is always retrievable. Scheduling algorithms focus on real-time choices under temporal validity restrictions. Because of intermittent data availability, this assumption is false in many real-time applications. In a similar spirit, the authors of [2] note that road safety concerns, such as congestion and accidents, have become a major worry due to the increase in automobiles. Safer driving requires rapid access to road safety data. NDN, a unique kind of network, may improve data retrieval. According to [3], Vehicular Cloud (VC) is a novel automotive technology that intends to successfully and economically recover data. NDN’s data-centric strategy might make VC simpler. Therefore, researchers should employ NDN to get VC. The rapid mobility of autos, as well as structural incompatibilities between NDN and vehicular networks, make installing NDN in vehicle circumstances difficult. Despite the fact that paired and rich-labelled ground truth is expensive, researchers note in [4] that sketch-to-photo retrieving has numerous applications. Image retrieval data is simple to gather. According to [5], the internet has a massive amount of data that is constantly obtained, analyzed, and used by a large number of people. The majority of
1 A Pragmatic Study of Machine Learning Models Used During Data …
3
Web material is unstructured. There are websites, books, journals, and files provided. Extracting useful information from such a big amount of data is challenging and time-consuming. According to [6], online video sharing has become one of the most important Internet services, with billions of videos saved in the cloud. [6] To find interesting films in huge data, viewers need a personalized video retrieval system. Growing video large data and “cold start” concerns must be handled. Uncertain user demand for tailored retrieval results is another problem. According to [7], big data necessitates a network architecture that enables safe and effective information retrieval via in-network caching. Huge data places enormous strain on network infrastructure. This desire is met by information-centric networking (ICN). The article [8] outlines how the land parameters retrieving model, or LPRM, is utilized around the world to create microwave-based surface soil moisture measurements (SSM). In the first version of the LPRM method, SSET was generated from Ka-band (36.5 GHz) BT. This was used to compute SSM. SSET from AMSR-E Ka-band BT data was validated against MODIS LST data in this study. The RMSE climbs from 1.98 to 11.42 as the FWS of an AMSR-E pixel grows from 0–0.01 to 0.15–0.4. (K). RMSEs were kept below 2.0 K when validating SSET data obtained with K-band (18.7 and 23.8 GHz) BT for FWS values ranging from 0 to 0.4. Regardless of FWS, this was correct. To improve SSET predictions and SSM retrievals, the researchers replaced the K-band BT with the Ka-band BT in the first iteration of the LPRM. The correlation coefficients (R-values) between SSM collected using the increased LPRM and GLDAS measures are 0.78, 0.75, and 0.77, respectively, for FWS ranges of 0–0.01, 0.01–0.05, 0.05–0.15, and 0.15–0.4. S’s R-values are 0.78, 0.65, 0.56, and 0.34, in that order. Under water-land mixed situations, the enhanced LPRM may provide more accurate SSM retrievals than the standard LPRM. The demand for graphical data storage has grown over the past few years, according to [9]. This trend is being driven by the proliferation of multimedia instructional services and apps for portable devices in both private and commercial situations. This is elaborated on in [10], where it is stated that the exponential growth of mult-media-instructional content in today’s mobile social-networks compels edge computing to deal with serious security and online massive data processing challenges. According to [11], cloud computing is a unique technology due to the growing desire for indefinite storage and retrieval services. Several studies on ordered multikeyword search using encrypted data in the cloud have been conducted. Concerns regarding privacy prompted this study [12] investigates the impact of adding spaceborne SAR data in an Arctic regional ice monitoring system for a period of time. Over 7000 RADARSAT-2 HH-HV Scan SAR Wide images were taken over the Canadian Arctic and nearby waterways in 2013. To reduce inaccurate SAR retrievals, a quality-control technique was employed. 3D Cyber Physical Systems (CPS) data has been created and used for autonomous driving, unmanned aerial vehicles, and more [13]. 3D Cyber Physical Systems were also studied (CPS). It is crucial for urban perception to recover 3D objects from enormous amounts of 3D CPS data. C3DOR-Net, domain from beginning to conclusion adaption system for cross-domain 3D object retrieval, is proposed by the researchers. This system learns an embedding space from end to end for 3D objects from many
4
A. R. Deshmukh and P. B. Ambhore
domains. To evaluate the methodology, the researchers employ two cross-domain methods. (1) Monochromatic image-based 3D object retrieval at SHREC’19. Item retrieval from CAD-to-CAD on two 3D datasets (NTU and PSB). Experiments suggest that the stated method may enhance cross-domain retrieval. Current sketchanalysis work examines drawings of static objects or environments, according to [14]. This work proposes FG-SBVR, a novel cross-modal retrieval issue. In this retrieval problem, a sketch sequence is used to find a target video. According to [15], the enhanced computing capabilities of asset devices like as smartphones have enabled a wide range of research areas, including the extraction of pictures from enormous data archives for Internet of Things applications. Large-scale remote sensing image retrieval (RSIR) has emerged as one of the most demanding data mining jobs, gaining significant academic interest [16]. Deep learning has aided RSIR’s recent success. This has three problems. First, RSIR data sets have unevenly distributed categories. A innovative method for mining samples has been developed. Instead of establishing a sample mining border threshold and minimizing the impact of artificial and objective elements on network training, this technique uses “misplaced” top-k samples along the top-k nearest neighbor decision boundary. “High-k MS” Second, there is a tension between optimization and loss minimization. Based on batch retrieval results, researchers introduced a unique result ranking loss (RRL). As a plug-in, RRL is paired with local losses. This enhanced retrieval accuracy. Finally, processing a large quantity of high-resolution pictures may cause training to take longer and the network to come to a standstill. This work offers a worldwide optimization model that is based on feature space and retrieval results (FSRR). Work in [17] explains how to utilize hashing methods to retrieve comparable images from large datasets. Deep designs provide discriminative feature representations. Contributions include: (1) The researchers propose a semi-supervised loss on labelled and unlabeled data to reduce empirical error. This type of loss maintains semantic similarity while also capturing relevant neighbors for effective hashing. (2) The researchers describe an online graph building strategy that uses deep feature generation during training to encapsulate semantics neighbors in a semi-supervised deep hashing network with labelled and unlabeled input. This network makes extensive use of both labelled and unlabeled data. In experiments on five standard datasets, this methodology outperforms state-of-the-art hashing algorithms [18] investigated retrieval techniques based on the Internet of Things (IoT). The Internet of Things (IoT) uses the Internet to connect, monitor, control, and manage things, objects, and their surroundings. According to [19], hash coding is widely used in large-scale photo retrieval, including approximate nearest neighbour search. Hash algorithms may learn and create effective and compact binary codes with semantic annotations like class labels and pairwise similarities. Images are compressed to save space, according to [20]. This is due to the rapid expansion of the RS image archives. For large-scale CBIR challenges, existing content-based RS image retrieval (CBIR) solutions require decoded images, which is computationally intensive. They believe that doing so will allow them to avoid the constraint. The proposed system begins by (1) code blocks for decoding linked with poor wavelet resolution and (2) rejecting the pictures that
1 A Pragmatic Study of Machine Learning Models Used During Data …
5
are least related to the quest picture based on similarities between the query image’s coarse resolution wavelet characteristics and the archive photos. The European Organization for the Exploitation of Meteorological Satellites plans to launch Metop-SG A1 in 2023, according to [21]. MWS is a passive microwave radiometer that operates between 23.8 and 229 GHz. It functions similarly to the Advanced Technology Microwave Sounder (ATMS) on JPSS satellites. According to [22], the rise of big data has shifted information from a single medium to photographs, texts, films, and audio recordings, among others. The proliferation of multimedia data presents a challenge for cross-media retrieval technology, notably how to effectively recover multimedia data with multiple modalities but the same meaning. As more photographs are stored online, image retrieval is becoming increasingly important, as [23] explains. In recent years, picture hashing has been studied to improve image similarity calculations. A DAH retrieval model is proposed in this study. When an attention module and a convolutional neural network are combined, they produce highly representable hash codes. DAH possesses the following characteristics: Using the CIFAR-10 dataset, researchers put DAH to the test against ten cutting-edge image retrieval algorithms, confirming its excellent accuracy. The DAH Mean Average Precision (MAP) is greater than 92% for 12-, 24-, 36-, and 48-bit CIFAR-10 hash codes. This is a higher proportion than current procedures. [24] expands on this by providing data and baseline methodologies for cross-lingual image annotation and retrieval. Researchers propose the COCO-CN dataset. It adds Chinese words and tags to MS-COCO. Researchers are developing a RACAS to improve annotation acquisition. This technique gives an annotator tags and statements pertinent to the picture’s content. According to [25], medical technology, information systems, electronic medical records, portable devices, wearable and smart gadgets, and other technologies are transforming healthcare systems. Research in [26] indicated that most IoT data stored and processed on the cloud. The open nature of the cloud causes major data security and privacy risks when retrieving data from the cloud. This paper proposes PERT for the cloud-aided Internet of Things. Sketch-based image retrieval (SBIR) recovers natural pictures based on humandefined principles [27]. Image-based retrieval methods are also described in [28], where contours of the retrieved results are similar but lack semantic information. Because of the “one-to-many” semantic category matching link between hand-drawn and natural-images, the same hand-drawn image might represent a range of different things from the perspective of the user. This work focuses on reusing motion data in video sequences, whereas [28] extends on this and examines how content-based human motion capture (MoCap) data retrieval makes this possible. Realistic MoCap data retrieval requires excellent accuracy and a natural user interface. According to [29], more people and businesses are moving their data and applications to cloud servers. Customers encrypt critical data before uploading it to cloud. Customers should not rely on public cloud servers. As a result, cipher—text retrieval has developed as a critical area of research. With the emergence of big data, archiving and retrieving data have become a focus of academic investigation, according to [30]. Hashing methods, which compress high-dimensional data into binary codes, have
6
A. R. Deshmukh and P. B. Ambhore
gained popularity. According to [31], conceptual and relational data models of OLTP applications are commonly built and maintained up to date by adhering to the idea of normalization, which eliminates data duplication. According to the study in [32], the rapid proliferation of multimedia data and Cloud Computing has increased interest in Secure-Image Archival and Extractor Systems (SIARS) on the cloud (CC). While [33] created a CNN semantic re-ranking approach to better sketch-based picture retrieval, we concentrate on enhancing the rankings’ accuracy (SBIR). Unlike existing approaches, the recommended system uses CNN category information to measure image similarity. According to [33], social media networks like Twitter, Facebook, and Flicker, as well as digital image taking devices, have led to an influx of pictures. The amount of digital photo archives accessible to academics has skyrocketed in the previous decade. Using the AT&T, MIT Vistex, and Brodatz Texture picture repositories, the provided descriptor achieves average retrieval precision (ARP) of 66, 92, and 83%, Average-retrieval-recall (ARR) of 66, 98, and 76%, and average-retrieval-specificity (ARS) of 98, 98, and 76%. Studies show that a learning-based strategy increases the efficiency of the proposed DMLHP descriptor. This enhancement results in 95% AT&T, 92% BT, and 99% MIT Vistex scores (similarity measure). According to the trials, the proposed texture descriptor outperforms LNIP, LtriDP, LNDP, LDGP, LEPSEG, and CSLBP in CBIR. According to [34] investigates how spatial and spectral compression affect statistically based retrieval. Experiments have demonstrated that a certain measure of compression may improve retrieval accuracy, even while information quality is lost during coding. Researchers have developed two separate algorithms, both with intriguing advantages: The first uses extremely high compression, which preserves retrieval performance comparable to uncompressed data; the second uses moderate to high compression, which increases performance. Both are advantageous. The second contribution to understanding by the researchers focuses on the origins of these benefits. According to [35], due to the rapid development of multimodal data from the internet and social media, cross-modal retrieval has become a required and valuable activity in recent years. According to [36] analyzes the privacy protection difficulty of designing a government cloud system for data storage and exchange. This challenge relates to the models’ security characteristics. According to [37], manifold learning approaches aren’t suited for picture retrieval. These approaches can’t handle query pictures and have a high processing cost, which is troublesome for huge databases. Researchers offer the IME layer to investigate intrinsic manifolds with inadequate data. The weights of this layer are determined through unsupervised offline learning. According to [38] offers further information. Because of the need to manage huge volumes of data, massive-scale remote-sensing-image-retrieval (RSIR) is gaining prominence in remote sensing. Visual retrieval algorithms search a database for images that are similar to a query image and present the results to the user. Notwithstanding the success of various rehabilitation procedures, a doubt remains. Can researchers obtain the semantic labels of returned identical photos to assist in image analysis and processing? In this article, researchers reframe image retrieval as retrieving visual and semantic information.
1 A Pragmatic Study of Machine Learning Models Used During Data …
7
According to [39], efficient picture retrieval from various image collections is a must nowadays. For content-based image retrieval, primitive image signatures are necessary (CBIR). Picture signatures are algorithmically descriptive and easily identifiable visual contents that are used to index and retrieve comparable results. The method is tested on nine well-known image datasets. Caltech-101, ImageNet, Corel-10000, 17-Flowers, COIL, Corel-1000, Caltech-256, tropical fruits, and the Amsterdam texture collection (A LOT). These datasets are classified as shape, color, texture, spatial, and complicated. Seven benchmark descriptors are thoroughly tested: maximally stable extremal region (MSER), accelerated robust features (SURF), Gaussian difference (DoG), red green blue local binary pattern (RGBLBP), histogram of oriented gradients (HOG), scale invariant feature transform (SIFT), and local binary pattern (LBP). For a wide range of image-semantic datasets, the presented technique provides high precision, recall, average retrieval accuracy and recall, and mean average precision and recall rates. The results, research methods, and enhancements are compared here. Social multimedia programs create millions of photographs everyday [40]. According to [41], social networks that let users actively contribute photos with descriptive captions have led to an exponential surge in picture sharing. Multi-view hashing supports large-scale social picture retrieval by encoding multi-view characteristics into compact binary hash codes. According to [42], CBRSIR is an important approach for mining and analyzing massive volumes of remote sensing data. According to [43], multispectral radiometers need a spectrum response function (SRF). This parameter influences radiometric calibration and quantification. In-flight SRF errors are commonly caused by pre-launch contamination or post-launch degradation. This study uses FDA to get SRFs from multispectral radiometers. Functional data analysis is used to compare hyperspectral sounders.
3 Pragmatic Analysis and Comparison Based on a thorough examination of existing retrieval models, it was discovered that these models differ greatly in terms of their underlying functioning features. Hence, in order to assist readers in identifying the best extractor models, this section compares them in terms of retrieval delay (D), retrieval accuracy (A), computationalcomplexity (CC), scalability (S), and deployment-cost (DC). According to its internal module deployment characteristics, these parameters were quantized as Low-ValueRange (LVR = 1), Medium-Value-Range (MVR = 2), High-Value-Range (HVR = 3), and Very-High-Value-Range (VHVR = 4). The following comparison can be seen in Table 1. Based on this examination, NDN VC [3, 7], ICN [9], IES CBIR [9], C3D OR Net [13], RSIR [16], DTW [18], S3C MR [22], RAC AS [24], CRHM [29], CNN SBIR [33], and CI-GAN [42] have superior accuracy and can thus be employed for high-performance retrieval applications. According to Table 1, NDN [2], OIRS [5], ICN [7], IES CBIR [9], PPL TSL [10], RIOPS [12], FG SBVR [14], CNN [15],
8
A. R. Deshmukh and P. B. Ambhore
Table 1 Performance evaluation of different reviewed models Model
A
CC
D
DC
S
TCT [1]
HVR
HVR
MVR
MVR
HVR
NDN [2]
HVR
MVR
HVR
HVR
VHVR
NDN VC [3]
VHVR
HVR
HVR
HVR
VHVR
IHDA [4]
MVR
VHVR
HVR
HVR
MVR
OIRS [5]
HVR
MVR
HVR
HVR
MVR
SP [6]
HVR
VHVR
HVR
MVR
HVR
ICN [7]
VHVR
MVR
LVR
MVR
HVR
FH [44]
MVR
HVR
MVR
HVR
LVR
LPRM [8]
HVR
VHVR
MVR
HVR
HVR
PIR [45]
HVR
VHVR
HVR
HVR
MVR
IES CBIR [9]
VHVR
MVR
HVR
HVR
MVR
PPL TSL [10]
HVR
MVR
MVR
HVR
VHVR
ER MRS [11]
MVR
HVR
HVR
VHVR
HVR
RIOPS [12]
LVR
MVR
MVR
HVR
HVR
C3D OR Net [13]
VHVR
HVR
HVR
MVR
HVR
FG SBVR [14]
HVR
MVR
HVR
MVR
HVR
CNN [15]
HVR
MVR
MVR
HVR
VHVR
RSIR [16]
VHVR
HVR
MVR
HVR
VHVR
SSDH [17]
MVR
MVR
HVR
HVR
VHVR
DTW [18]
VHVR
MVR
LVR
HVR
MVR
ZSH [19]
MVR
HVR
MVR
MVR
HVR
PMKS [20]
HVR
MVR
HVR
HVR
MVR
MiRS [21]
HVR
MVR
MVR
HVR
HVR
S3C MR [22]
VHVR
MVR
MVR
LVR
HVR
DAH [23]
MVR
HVR
MVR
HVR
HVR
RAC AS [24]
VHVR
HVR
MVR
MVR
HVR
PERT [26]
HVR
MVR
HVR
MVR
MVR
SBIR [27]
MVR
HVR
MVR
HVR
HVR
MoCap [28]
MVR
HVR
HVR
HVR
MVR
CRHM [29]
VHVR
MVR
HVR
HVR
VHVR
OLTP [31]
HVR
MVR
HVR
HVR
HVR
RSIR [46]
MVR
HVR
MVR
MVR
HVR
SIARS [32]
HVR
MVR
HVR
HVR
MVR
CNN SBIR [33]
VHVR
MVR
HVR
MVR
VHVR
DML HP [33]
MVR
HVR
LVR
HVR
MVR
TCRS [36]
HVR
MVR
HVR
LVR
HVR
IME [37]
MVR
MVR
HVR
MVR
HVR (continued)
1 A Pragmatic Study of Machine Learning Models Used During Data …
9
Table 1 (continued) Model
A
CC
D
DC
S
RSIR [38]
HVR
MVR
HVR
HVR
HVR
SIFT LBP [39]
HVR
MVR
LVR
MVR
HVR
UAP MH [41]
HVR
MVR
HVR
HVR
VHVR
CI-GAN [42]
VHVR
HVR
MVR
MVR
VHVR
FDA [43]
MVR
HVR
MVR
HVR
HVR
SSDH [17], and DTW [18] have lesser complexity and can thus be employed for retrieval applications that require low-performance processing. Similarly, ICN [7], DTW [18], DML HP [33], and SIFT LBP [39] have lower latency and can thus be employed for high-speed extraction applications, as shown in Table 1. Table 1 and Fig. 1 show that S3C MR [22], TCRS [36], TCT [1], ICN [7], and SP [6] all have low deployment-costs and can thus be used for low-cost retrieval applications. According to Table 1, NDN [2], NDN VC [3], PPL TSL [10], CNN [15], RSIR [16], SSDH [17], CRHM [29], CNN SBIR [33], UAP MH [41], and CI-GAN [42] offer superior scalability and can thus be employed for highly scalable retrieval applications. These measures were merged to create a new Data Retrieval Metric (DRM) that can be calculated using Eq. 1, DRM =
1 1 1 S A + + + + 4 CC D DC 4
(1)
Based on this assessment, it can be observed that ICN [7], S3C MR [22], SIFT LBP [39], DTW [18], CNN SBIR [33], TCRS [36], CI-GAN [42], RSIR [16], CRHM [29], PPL TSL [10], CNN [15], RAC AS [24], NDN VC [3], NDN [2], and C3D OR Net [13] highlight better DRM efficiency, thus, can be used for good accuracy, low delay, low complexity, cost effective and high scalability data retrieval application scenarios. Based on this evaluation, researchers can develop appropriate models for their application-specific and performance-specific retrieval applications.
4 Conclusion and Future Scope This work analysed numerous cutting-edge retrieval models in terms of both qualitative as well as quantitative performance to facilitate readers in identifying the best models for their application-specific use cases. Based on this research, it was discovered that machine learning and bioinspired models outperform others due to their feature augmentation and representation capabilities. It was further observed that NDN VC, ICN, IES CBIR, CD OR Net, RSIR, DTW, SC MR, RAC AS, CRHM, CNN SBIR, and CI-GAN have higher accuracy, while NDN, OIRS, ICN, IES CBIR, PPL TSL, RIOPS, FG SBVR, CNN, SSDH, and DTW have lower complexity, also,
10
A. R. Deshmukh and P. B. Ambhore
ICN, DTW, DML HP, and SIFT LBP have lower delay, thus can be used for highspeed & better performing retrieval applications. It was also observed that SC MR, TCRS, TCT, ICN, and SP, have deployment costs, NDN, NDN VC, PPL TSL, CNN, RSIR, SSDH, CRHM, CNN SBIR, UAP MH, and CI-GAN have higher scalability, while ICN, SC MR, SIFT LBP, DTW, CNN SBIR, TCRS, CI-GAN, RSIR, CRHM, PPL TSL, CNN, RAC AS, NDN VC, NDN, and CD OR Net showcase better DRM performance, thus, can be used for high accuracy, low delay, low complexity, low cost & high scalability data retrieval application scenarios. In future, researchers can combine these models via fusion techniques like data association, state estimation, decision trees, etc. to further improve their performance. A combination of these models will improve delay, accuracy, and other metrics. Moreover, researchers can also integrate Q-Learning, and incremental learning models to further optimize retrieval performance under large-scale database scenarios.
References 1. Fu C et al (2019) Real-time data retrieval in cyber-physical systems with temporal validity and data availability constraints. IEEE Trans Knowl Data Eng 31(9):1779–1793. https://doi.org/ 10.1109/TKDE.2018.2866842 2. Wang X, Cai S (2021) Efficient road safety data retrieval in VNDN. IEEE Syst J 15(1):996– 1004. https://doi.org/10.1109/JSYST.2020.2977388 3. Wang X, Wang X, Wang D (2021) Cost-efficient data retrieval based on integration of VC and NDN. IEEE Trans Veh Technol 70(1):967–976. https://doi.org/10.1109/TVT.2021.3049795 4. Yang F, Wu Y, Wang Z, Li X, Sakti S, Nakamura S (2021) Instance-level heterogeneous domain adaptation for limited-labeled sketch-to-photo retrieval. IEEE Trans Multimedia 23:2347–2360. https://doi.org/10.1109/TMM.2020.3009476 5. Asim MN, Wasim M, Ghani Khan MU, Mahmood N, Mahmood W (2019) The use of ontology in retrieval: a study on textual, multilingual, and multimedia retrieval. IEEE Access 7:21662– 21686. https://doi.org/10.1109/ACCESS.2019.2897849 6. Feng Y, Zhou P, Xu J, Ji S, Wu D (2019) Video big data retrieval over media cloud: a contextaware online learning approach. IEEE Trans Multimedia 21(7):1762–1777. https://doi.org/10. 1109/TMM.2018.2885237 7. Li R, Asaeda H, Wu J (2020) DCAuth: data-centric authentication for secure in-network bigdata retrieval. IEEE Trans Netw Sci Eng 7(1):15–27. https://doi.org/10.1109/TNSE.2018.287 2049 8. Song P et al (2019) An improved soil moisture retrieval algorithm based on the land parameter retrieval model for water-land mixed pixels using AMSR-E data. IEEE Trans Geosci Remote Sens 57(10):7643–7657. https://doi.org/10.1109/TGRS.2019.2915346 9. Ferreira B, Rodrigues J, Leitão J, Domingos H (2019) Practical privacy-preserving contentbased retrieval in cloud image repositories. IEEE Trans Cloud Comput 7(3):784–798. https:// doi.org/10.1109/TCC.2017.2669999 10. Zhou P, Wang K, Xu J, Wu D (2019) Differentially-private and trustworthy online social multimedia big data retrieval in edge computing. IEEE Trans Multimedia 21(3):539–554. https://doi.org/10.1109/TMM.2018.2885509 11. Sun J, Hu S, Nie X, Walker J (2020) Efficient ranked multi-keyword retrieval with privacy protection for multiple data owners in cloud computing. IEEE Syst J 14(2):1728–1739. https:// doi.org/10.1109/JSYST.2019.2933346 12. Komarov AS, Caya A, Buehner M, Pogson L (2020) Assimilation of SAR ice and open water retrievals in environment and climate change Canada regional ice-ocean prediction system.
1 A Pragmatic Study of Machine Learning Models Used During Data …
13.
14.
15.
16.
17. 18.
19.
20.
21.
22. 23. 24. 25.
26.
27.
28.
29.
30.
11
IEEE Trans Geosci Remote Sens 58(6):4290–4303. https://doi.org/10.1109/TGRS.2019.296 2656 Liu A, Xiang S, Nie W, Song D (2019) End-to-end visual domain adaptation network for cross-domain 3D CPS data retrieval. IEEE Access 7:118630–118638. https://doi.org/10.1109/ ACCESS.2019.2937377 Xu P, Liu K, Xiang T, Hospedales T, Ma Z, Guo J, Song Y-Z (2020) Fine-grained instance-level sketch-based video retrieval. IEEE Trans Circ Syst Video Technol 1–1. https://doi.org/10.1109/ TCSVT.2020.3014491 Mehmood I et al (2019) Efficient image recognition and retrieval on IoT-assisted energyconstrained platforms from big data repositories. IEEE Internet Things J 6(6):9246–9255. https://doi.org/10.1109/JIOT.2019.2896151 Fan L, Zhao H, Zhao H (2021) Global optimization: combining local loss with result ranking loss in remote sensing image retrieval. IEEE Trans Geosci Remote Sens 59(8):7011–7026. https://doi.org/10.1109/TGRS.2020.3029334 SSDH: semi-supervised deep hashing for large scale image retrieval, https://doi.org/10.48550/ arXiv.1607.08477 Younan M, Elhoseny M, Ali AE-MA, Houssein EH (2021) Data reduction model for balancing indexing and securing resources in the internet-of-things applications. IEEE Internet Things J 8(7):5953–5972. https://doi.org/10.1109/JIOT.2020.3035248 Zou Q, Cao L, Zhang Z, Chen L, Wang S (2022) Transductive zero-shot hashing for multilabel image retrieval. IEEE Trans Neural Netw Learn Syst 33(4):1673–1687. https://doi.org/10.1109/ TNNLS.2020.3043298 Preethy Byju A, Demir B, Bruzzone L (2020) A progressive content-based image retrieval in JPEG 2000 compressed remote sensing archives. IEEE Trans Geosci Remote Sens 58(8):5739– 5751. https://doi.org/10.1109/TGRS.2020.2969374 Lee Y-K et al (2021) Preliminary development and testing of an EPS-SG microwave sounder proxy data generator using the NOAA microwave integrated retrieval system. IEEE J Sel Top Appl Earth Observ Remote Sens 14:3151–3161. https://doi.org/10.1109/JSTARS.2021.306 1946 Zheng X, Zhu W, Yu Z, Zhang M (2021) Semi-supervised learning based semantic cross-media retrieval. IEEE Access 9:75049–75057. https://doi.org/10.1109/ACCESS.2021.3080976 Li X et al (2020) Image retrieval using a deep attention-based Hash. IEEE Access 8:142229– 142242. https://doi.org/10.1109/ACCESS.2020.3011102 COCO-CN for cross-lingual image tagging, captioning and retrieval. https://doi.org/10.48550/ arXiv.1805.08661 Nazir S et al (2020) A comprehensive analysis of healthcare big data management, analytics and scientific programming. IEEE Access 8:95714–95733. https://doi.org/10.1109/ACCESS. 2020.2995572 Wang T, Yang Q, Shen X, Gadekallu TR, Wang W, Dev K (2022) A privacy-enhanced retrieval technology for the cloud-assisted internet of things. IEEE Trans Industr Inf 18(7):4981–4989. https://doi.org/10.1109/TII.2021.3103547 Qi Q, Huo Q, Wang J, Sun H, Cao Y, Liao J (2019) Personalized sketch-based image retrieval by convolutional neural network and deep transfer learning. IEEE Access 7:16537–16549. https://doi.org/10.1109/ACCESS.2019.2894351 Ren T, Li W, Jiang Z, Li X, Huang Y, Peng J (2020) Video-based human motion capture data retrieval via MotionSet network. IEEE Access 8:186212–186221. https://doi.org/10.1109/ACC ESS.2020.3030258 He H, Chen R, Liu C, Feng K, Zhou X (2021) An efficient Ciphertext retrieval scheme based on homomorphic encryption for multiple data owners in hybrid cloud. IEEE Access 9:168547– 168557. https://doi.org/10.1109/ACCESS.2021.3135050 Sun H, Fan Y, Shen J, Liu N, Liang D, Zhou H (2020) A Novel semantics-preserving hashing for fine-grained image retrieval. IEEE Access 8:26199–26209. https://doi.org/10.1109/ACC ESS.2020.2970223
12
A. R. Deshmukh and P. B. Ambhore
31. Koji´c NM, Mili´cev DS (2020) Equilibrium of redundancy in relational model for optimized data retrieval. IEEE Trans Knowl Data Eng 32(9):1707–1721. https://doi.org/10.1109/TKDE. 2019.2911580 32. Devaraj AFS et al (2020) An efficient framework for secure image archival and retrieval system using multiple secret share creation scheme. IEEE Access 8:144310–144320. https://doi.org/ 10.1109/ACCESS.2020.3014346 33. Wang L, Qian X, Zhang Y, Shen J, Cao X (2020) Enhancing sketch-based image retrieval by CNN semantic re-ranking. IEEE Trans Cybern 50(7):3330–3342. https://doi.org/10.1109/ TCYB.2019.2894498 34. García-Sobrino J, Laparra V, Serra-Sagristà J, Calbet X, Camps-Valls G (2019) Improved statistically based retrievals via spatial-spectral data compression for IASI data. IEEE Trans Geosci Remote Sens 57(8):5651–5668. https://doi.org/10.1109/TGRS.2019.2901396 35. Cheng Q, Zhou Y, Fu P, Xu Y, Zhang L (2021) A deep semantic alignment network for the cross-modal image-text retrieval in remote sensing. IEEE J Sel Top Appl Earth Observ Remote Sens 14:4284–4297. https://doi.org/10.1109/JSTARS.2021.3070872 36. Yong L, Hefei L, Xiujuan S, Bin Y, Kun W (2021) Keyword semantic extended top-k Ciphertext retrieval scheme over hybrid Government cloud environment. IEEE Access 9:155249–155259. https://doi.org/10.1109/ACCESS.2021.3128933 37. Xu J, Wang C, Qi C, Shi C, Xiao B (2019) Iterative manifold embedding layer learned by incomplete data for large-scale image retrieval.Trans Multi 21(6):1551–1562. https://doi.org/ 10.1109/TMM.2018.2883860 38. Deep hashing learning for visual and semantic retrieval of remote sensing images. https://doi. org/10.48550/arXiv.1909.04614 39. Ahmed KT, Afzal H, Mufti MR, Mehmood A, Choi GS (2020) Deep image sensing and retrieval using suppression, scale spacing and division, interpolation and spatial color coordinates with bag of words for large and complex datasets. IEEE Access 8:90351–90379. https://doi.org/10. 1109/ACCESS.2020.2993721 40. Zhang Z, Zhou F, Qin S, Jia Q, Xu Z (2020) Privacy-preserving image retrieval and sharing in social multimedia applications. IEEE Access 8:66828–66838. https://doi.org/10.1109/ACC ESS.2020.2984916 41. Zheng C, Zhu L, Cheng Z, Li J, Liu A-A (2021) Adaptive partial multi-view hashing for efficient social image retrieval. IEEE Trans Multimedia 23:4079–4092. https://doi.org/10.1109/TMM. 2020.3037456 42. Xiong W, Lv Y, Zhang X, Cui Y (2020) Learning to translate for cross-source remote sensing image retrieval. IEEE Trans Geosci Remote Sens 58(7):4860–4874. https://doi.org/10.1109/ TGRS.2020.2968096 43. Xu N et al (2022) In-flight spectral response function retrieval of a multispectral radiometer based on the functional data analysis technique. IEEE Trans Geosci Remote Sens 60:1–10. Article no: 5604210. https://doi.org/10.1109/TGRS.2021.3073097 44. Xu Y, Zhao X, Gong J (2019) A large-scale secure image retrieval method in cloud environment. IEEE Access 7:160082–160090. https://doi.org/10.1109/ACCESS.2019.2951175 45. Guo T, Zhou R, Tian C (2021) New results on the storage-retrieval tradeoff in private information retrieval systems. IEEE J Sel Areas Inf Theory 2(1):403–414. https://doi.org/10.1109/ JSAIT.2021.3053217 46. Shao Z, Zhou W, Deng X, Zhang M, Cheng Q (2020) Multilabel remote sensing image retrieval based on fully convolutional network. IEEE J Sel Top Appl Earth Observ Remote Sens 13:318– 328. https://doi.org/10.1109/JSTARS.2019.2961634
Chapter 2
Smart Knowledge Management for IT Services Sishil Surendran
and Rashmi Agarwal
1 Introduction IT Service providers of Infrastructure support’s focus is on Core Enterprise Services. Where it manages the mission-critical systems of its customers including but not limited to Compute, Network, Storage, Compliance, and Security. A managed infrastructure helps deliver better availability and uptime and reduces the expense and challenges of a self-managed infrastructure. One of the niche areas of expertise for Core Enterprise Services is iSeries-managed platform support services. iSeries also known as IBMi (which can be interchangeably used later in this paper) [1] is one of three operating systems that run on IBM Power Systems. The IBMi operating system provides a robust architecture and higher availability to help achieve business resilience and cost-efficient operation. Core IBMi applications have been an integral part of many organizations’ success for decades in performing their business operations. The IBMi platform support ecosystem has multiple teams. The operations teamL1 Support, monitor, and review the alerts that the system generates, which are converted into tickets also known as incidents, that are assigned to the team. L1 works on the incidents and fixes the issues reported, if the team is unable to resolve them or if it is beyond their support boundary then escalates to L2. The L2 team engages the Vendor Support or the Subject Matter Expert (SME) as the situation demands. All these teams refer to the KB articles that include the SOP, FAQs, troubleshooting guide, etc. to perform these duties.
S. Surendran (B) · R. Agarwal REVA Academy for Corporate Excellence, REVA University, Bengaluru, Karnataka 560064, India e-mail: [email protected] R. Agarwal e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_2
13
14
S. Surendran and R. Agarwal
There is a shortage in the availability of skilled IBMi resources [2]. The training opportunities, both offline and online are limited and the demand for certifications is lesser compared to the other trending technologies in the market. It is also very important to have the industry, account, and customer-specific knowledge of IBMi. All these factors limit the availability of trained talent from the open market. To effectively address these concerns, the service providers train the new talents from the campus (graduate hires) and infuse them into the support pool. When the Pandemic impacted the whole world, IBMi’s fresh talent ecosystem was also got affected. Working from home prevented the offline training and knowledge-sharing sessions at the office. There was an increment in the Mean Time to Resolve (MTTR) incidents [3]. Further root cause analysis proved that some of them have resulted from the newly joined team members not following or delayed accessing the right processes to address an incident/issue. On discussion with the larger team and the SMEs, it was identified that KB documents including the SOPs, were organized in a complex and confusing way, which resulted in the information retrieval as a time-taking exercise for the team.
2 Literature Review Knowledge is the main differentiator to get a competitive advantage for an organization in the market and that is fully applicable in the Infrastructure Managed Servicesplatform support space as well. It is one of the most important factors that influence organizational performance and can keep it up in a continuously changing market. To make it more effective within the organization there needs to be a good knowledge management system to be set up. Mostly the idea of the knowledge management system is technology oriented. This perspective that considers knowledge management systems just as technology/software tools, ends up breaking the knowledge management in many organizations. There are four important aspects identified in this study—Technology, Human Resources, Process, and Context, which decide the success of Knowledge management [4]. Studies have shown that a business strategy-oriented knowledge management system can enhance the decision-making process of the management as a long-term benefit, as those decisions would be based on the knowledge and experiences of the organization. Also proposes measures to be taken to preserve the knowledge and industry experience that has been developed over time [5]. When looking at the best way to incorporate these concepts and get the best for the organization depends a lot on a very pragmatic solution that can address these needs, maintain them with minimal effort, and can be used for longer-term is one of the objectives. There were two aspects considered while planning the solution, building an in-house solution from the scratch or identifying a user-friendly and readily available solution that would be easier to implement and manage [6]. Microsoft Azure offers Cognitive Services as a group of services that enables the technical teams to incorporate and inlay AI capabilities within any applications, they
2 Smart Knowledge Management for IT Services
15
created. Azure’s AI capabilities empower users to enhance decision-making with the numerous development platform they offer. One of the Cognitive Services is ‘Language’ which uses the abilities of Natural Language Processing (NLP) to build applications like Custom question answering along with other NLP applications [7]. This adds a real-world conversational layer to the data. The solution can assist in getting the optimal response from a personalized KB of data for any user input. Social networking applications, chatbots, and speech-enabled desktop programs are a few examples of conversational client applications that use Azure questionanswering applications. Solutions are enhanced in relevance using a deep learning ranker, along with exact responses and end-to-end region support. The two key skills of the solutions are listed below. Custom question answering: This enables users to define synonyms and metadata and customize applications like editing question and answer pairs generated from the content source. This also offers a fantastic option for the questions in the form of recommendations depending on the information they receive. Prebuilt question answering: Users are enabled to query a text passage and receive a response. Without any administrative costs for KBs [8]. Chatbots, which are machine agents acting as user interfaces using natural language for data and service providers, have become quite popular. A study that had collected data using an online survey from 146 US citizens who used chatbots and were between the ages of 16 and 55. The study pinpointed key elements that influence chatbot utilization. One of the main factors is that “productivity chatbots help users to obtain timely and efficient assistance or information. Chatbot users also reported motivations on entertainment, social and relational factors, and curiosity about what they view as a novel phenomenon” [9].
3 Objective The objective of this paper is to build an effective knowledge management system that provides easy access and relevant information for the Support teams to perform their tasks. The KB is to have all required information as process documents, SOPs, troubleshooting guides, and vendor product support documents. The solution is to use conversational AI with Azure language services and a bot framework. Integrate that to the MS Teams Communication channel as a chatbot called as ProcessBot for the support teams to test and use it. There are three key components involved in this solution, data (KB), technology (Azure AI services), and people (support team). This is to enable the platform support team to access the information required to fix the Customer issues quickly which can improve Customer experience with increased availability of their business systems. If the platform support team was unable to resolve some of the incident tickets on time or ends up making human errors as they were not able to find the right process to fix the issue, which impacted the business systems availability, SLA contracted with the customer which could lead to potential penalties to be paid and hurt the trust and the credibility of the service provider.
16
S. Surendran and R. Agarwal
4 Methodology 4.1 Business Understanding Large organizations have a complex IT estate that spans different data centers with varied technologies and types of systems. Managed service provider of their IT Infrastructure manages the IT environment to complement the business. The SLAs are fully aligned with the customer’s business requirements by providing 99.9% availability of the critical systems. The technical teams that provide platform support are highly skilled in technology, experienced, and possess industry knowledge of the customers’ business. Noncompliance with the SLAs invites penalties, questions the credibility of the service provider as a trusted partner to the customer, also can adversely impact the customer’s business, as access to their systems and data is at risk. Successful managed service delivery-platform support is provided by ensuring the technical team that performs the activities have got all the required information, access to the systems, and skills to carry out the operations. The team is advised to follow the SOPs and refer to the KBs for any questions while working on these systems. An updated and well-maintained KB is one of the pillars of success in platform-managed services.
4.2 Problem Description The L1 team receives the tickets and works on them. The time from when the ticket is assigned till that is resolved, the mean is taken for similar severity tickets like Medium (Severity 3) and Low (Severity 4) for the analysis. This MTTR is one of the Key Performance Indicators (KPI) for the Service Level Agreement. The trend is not stable for these Medium and Low tickets over the evaluation time. Further analysis was made on the way a ticket was closed, to understand if the process was mentioned correctly for these tickets. This was manually verified based on the resolution code and identified that some of them were not followed, tracked those tickets and their impact on the overall MTTR as shown in Figs. 1 and 2, and identified that those incidents have some significant influence on increasing the MTTR.
4.3 Proposed Solution As in the root cause analysis for higher MTTR, it was found that some of the newly joined team members are taking more time to resolve issues, when feedback was taken from the team members on their usage of KBs some challenges were identified. The challenges are ‘Information spread across multiple documents’ the team ended up checking more documents to get the required information which took more
2 Smart Knowledge Management for IT Services
17
Fig. 1 Weekly MTTR trend for medium-severity-3 ticket
Fig. 2 Weekly MTTR trend for low-severity-4 tickets
time than usual, ‘Searching the documents was not an easy task when they worked on critical issues.’ These challenges directed the thought process towards a smart knowledge management system that can complement the technical teams working on the systems. Where they can easily access the information, that is required to perform the duties and prevent any mistakes which can happen by not following the SOP. Several solutions were reviewed for this and from a data mining perspective, the data need to be made available to the users as per the search request, in an easy-to-use way was the primary objective of the solution. ‘Microsoft Azure AI services’ [10] that use ‘Language Services’ and ‘Bot services’ turned out to be the most effective solution to help to achieve these objectives with their inbuilt AI capability. They also provide easy integration with multiple user interface channels that can be used for the team to communicate. One of the other solutions that were evaluated, was to create an independent Webchat service. However, that was not as effective as getting the Chat assistant on frequently used communication and collaboration mediums like MS Teams. One of the advantages of the Azure solution was the ProcessBot can be used in different ways on multiple MS Teams Channels. This made the integration of the solution easier, more effective, and more meaningful in addressing the requirement. KB to feed the data to cloud storage as shown in Fig. 3 and then use a chatbot as an interface to communicate with the team. The team members can then type the errors or questions on the issues they are working, on onto the chatbot. Chabot to
18
S. Surendran and R. Agarwal
Fig. 3 Proposed solution diagram
fetch the information from the KB stored on the cloud. The Support Team uses MS Teams for communication & collaboration, and they keep the process documents on the SharePoint sites. Easy integration with these channels made Microsoft Azure AI Services the first choice. Additional improvements can be made with the help of Azure bot SDK with Python programming, as required. In case the services offered by the ProcessBot are down or not available, then the support teams are advised to follow the regular practice of directly accessing the SharePoint sites for the KB.
5 Data Understanding and Preparation The data taken for this study is mainly the Technical Documents that assist the Platform Support Engineers in troubleshooting and fixing the issues/incidents that they receive while performing the infrastructure platform support duties. The data from the SOPs, vendor websites, and Ticketing tools like ServiceNow are the main source of KB. Data preparation was performed on the text data to make it more effective on the Language Services in Azure. The different phases in the data preparation part are depicted in Fig. 4. Reviewed all the documents that are to be updated on to the KB in Azure Language Services. Made the changes to the data to have proper questions and detailed answers format. Reviewed all the documents and made the changes accordingly. In the next phase of the study, efforts are to be made to automate this process [7] of adding data to the questions and answers format. Language Resource-Cognitive Service in Azure is a cloud-based service that uses NLP capabilities to understand and analyze data in text format. This service is to build applications that employ the web-based Language Studio, REST APIs, and client libraries [8]. Language Service resources are created with the custom feature of question-answering. It allows the user to create a natural conversational layer
2 Smart Knowledge Management for IT Services
19
Fig. 4 Process flow of data preparation
over their data and finds the best answer for any input from the users’ custom KB of information. Two key capabilities of Question-answering (QA) are custom QA and prebuilt QA [8]. Custom QA enables the users to customize various aspects like editing question and answer pairs taken out from the content source, defining synonyms and metadata, accepting question suggestions, etc. This enables the users to receive a response by querying a text passage without specific management of the knowledge bases. Question–answer gives the best results on static information—QA functions the best when static information is in the KB of responses. This custom-made KB can contain documents such as PDFs and URLs. Request, Question, or Command answered with the same response: when the same question is submitted by different users the answer returned is the same. Filter static information based on metainformation; metadata tags can be added to provide additional filtering options that are relevant to your client’s application. Some of the most used metadata information examples are chit-chat, content type or format, content purpose, and content freshness. Conversational Bots with static information. All these enable the KB to provide a response based on users’ dialogue-based text or commands.
6 Data Modeling and Evaluation 6.1 Modeling As a part of the data modeling of this study, the data is reviewed and prepared to be used with the Bot service. The solution is designed as shown in Fig. 4. Bot is a conversational app that can be used by the end users to interact, using text, graphic
20
S. Surendran and R. Agarwal
images/cards, or speech. Azure offers Bot-Service as a cloud platform, hosting bots and enabling them to be mapped to channels that can be used by the end users to interact like Microsoft Teams, Facebook, or Slack. The Bot-Framework Service, included in Azure Bot-Service, enables the interaction with the user’s bot-connected app and the bot by sending information. The individual channels get more information on the activities they need to perform. Before the bot creation, let us check how a bot engages activity objects to facilitate communication with the users. The Bot-Framework Service initiates a chat request as a user joins the conversation. When a conversation gets started with the Bot Framework Emulator, there are two conversation update events, one from the user who joined the conversation and the other from the bot. It is possible to examine who is included in the members’ added attributes of the activity to distinguish between these conversation update jobs. An activity is any conversation between the user or a channel and the bot. ‘Structure and turns’ in bot applications: In a conversation, users take turns speaking one at a time. A bot frequently responds to user input when used. In the Bot Framework, a turn consists of the user’s activity that the bot receives and any interactions it communicates as a prompt response to the user.
6.2 Evaluation Evaluation of the model involves how to review the whole chat dialogue and their responses to the users’ questions, the demonstration of the chatbot’s insights, and the evaluation answers provided. ProcessBot got a welcome screen and a greeting message to the user and also provides information on the way that answers the questions. The evaluation is done based on the context of the questions asked and the response. This was closely reviewed by the SMEs who prepare the knowledge base for the team and then reviewed the answers by comparing them with the knowledge base that was given, KBs were updated with detailed answers to different scenarios. As a part of the model evaluation, the SMEs prepared 100 questions out of which 50 of them were not part of the KB. Processbot was expected to display the answers to the questions asked, if it was unable to answer then to display the message “Sorry unable to get an answer for the question now, we will add this to our knowledge base soon”. Table 1 gives the summary of the results of the Test. 30 questions that were already part of the KB returned with the right answers however 20 of them even though the answers were part of the KB returned with no answer. 50 questions that were not part of the KB were returned with no answer as Table 1 Results from the first round of testing
KB Answer as per the process doc
Answer azure Correct answer
No answer
Correct answer
30
20
0
50
No answer
2 Smart Knowledge Management for IT Services Table 2 Results from the second round of testing
21
KB Answer as per the process doc
Answer azure Correct answer
No answer
Correct answer
45
05
0
50
No answer
expected. Follow-up actions after the tests were taken, reviewed the KB, and refined the questions in the KB on the Azure language services. Additional tests were carried out to identify other special instances, it was found that even when the answer is a lengthy description, the ProcessBot performed well by collecting the answers as the knowledgebase was designed. Some questions have got multiple turns, users can select more options on the answers rather than giving them in a single lengthy statement, it was giving the option to choose the resolution with the numbers 1, 2, and 3. Once all the issues identified in the first round of testing were addressed, the test was repeated with another set of 100 questions for which 50 of the answer was already added in the KB and the rest of them were not, the results are shown the Table 2.
7 Deployment, Analysis, and Results Deployment was to make ProcessBot available to the L1 and L2 Support teams. For this step, first reviewed and verified the Azure Services, to confirm that all the resources and applications were ready to go, confirmed that from the home page of Portal.azure.com. Once the bot was ready and tested on the web chat, need to get that on to the MS Teams where this was to be used primarily. On the channels section of the Bot Services process page, this information was available. On the settings checked the Channel section and added Teams from the available list of Channels, this got deployed in a few minutes. The Teams Integration allowed the team members to access and communicate with the bot easily. The proto-type link of the Teams integration was generated from the Channels page on the Bot Framework and the link was shared among the Teams L1/L2 engineers. Analyzing the effectiveness of the knowledge management solution was done by verifying its usage, and the feedback received from the team members. Changes based on these were later added to the knowledgebase management design. The team had a very positive reception of the solution. There was 90% positive feedback on this solution and then constructive criticism to improve the solution further. Awareness meetings were scheduled to talk about the intention, the capabilities of the solution, and its limitations. This solution was even shared with the Customer, the Sr Director was informed of the solution and presented with a demo of it, she shared her appreciation for the thoughtful initiative and liked the way the knowledgebase was organized and also recommended improvements and new features that can be added
22
S. Surendran and R. Agarwal
in the future iterations. With furthermore feedback, the necessary enhancements can be added to the solution. To make the communication flow appear more human and genuine, additional interactive sentences and discussion dialogues were added. Chit Chat is an option offered by Azure Language Services that can improve the personality of the bot interface, this had helped in offering more engagement with the end users. More in-depth inquiries and error choices were included along with the round-up navigation dialogues. Review Suggestions, another fantastic feature of Azure ‘languageservices’. The ‘active-learning-suggestions’ feature gives the edge to the users to further improve the quality of the knowledge base, as it recommends alternative questions that are based on user submissions, to question-and-answer pairs. Following the evaluation of these ideas, a decision can be made on whether to include them or not in the existing questions. The knowledge base doesn’t change with the recommendations but is to be accepted by the owner for these modifications to go into effect. These suggestions just add more questions but never change or remove existing questions [11]. As chatbot analytics can estimate the success of the chatbot and the knowledge management solution implemented, they can also offer excellent insights into business growth opportunities and strategic retention. It is crucial to be aware of the chatbot’s advantages and potential by continuously observing and evaluating its performance. Also track the main chatbot metrics, which is an integral part of the business success[12]. Refer to the Fig. 5 for monitoring metrics. Fig. 5 Monitoring Metrics of the Bot usage [13]
2 Smart Knowledge Management for IT Services
23
8 Conclusion and Future Scope The chatbot is being used by L1 and L2 Engineers. There is a visible improvement in the MTTR, the data is being collected to understand trends post the solution implementation. Also working on getting approval for the cost case from upper management for the full-fledged implementation. Reviewing the solution provided and understanding the limitations, and scope of further improvements that Process Bot required as an assistant to the Platform support team. The feedback received from the technical teams, SME and Architect did help to improve the tool and then identify the scope of further enhancements, the Chat responses and solutions received from the bot were as good as those updated on the KB, the challenge is to get all the questions being asked and add them alternative questions if those are addressed already on the KBs. In the next iteration, it is planned to have this rolled out to other supporting teams, including AIX, LINUX, and Multi-Cloud support teams. There’s a vast amount of information on the ServiceNow ticketing tool, need to explore the options of getting that information and coordinating that through this communication channel. One of the challenges that are experienced as of now is Governance and Document control, need to identify a process in place to review the knowledge base that is getting added to the Language Service Managed Source KBs. The documents updated on SharePoint are compliant with the approved process in place for Document control and governance. Automation needs to be employed wherever possible to reduce human effort in this. The limitation of a free version of the Azure services prevented us from exploring more options to customize the knowledge management solution. One of the biggest gains through this endeavor was the realization that certain tasks that are handled by the L2 support can be shared with L1 with the help of this knowledge management solution- Process bot. This would further help us to optimize the human effort and manage the staffing accordingly.
References 1. IBM i operating system—power systems. IBM. https://www.ibm.com/it-infrastructure/power/ os/ibm-i. Accessed 5 Aug 2022 2. Huntington T (2023) IBMI-marketplace-survey-results [Online]. Available: https://www.for tra.com/resources/guides/ibm-i-marketplace-survey-results. Accessed 16 Mar 2023 3. Rumburg J (2012) Metric of the month: mean time to resolve 4. Zouari MBC, Dakhli SBD (2018) A multi-faceted analysis of knowledge management systems. Procedia Comput Sci 138:646–654. https://doi.org/10.1016/j.procs.2018.10.086 5. Córdova FM, Gutiérrez FA (2018) Knowledge management system in service companies. Procedia Comput Sci 139:392–400. https://doi.org/10.1016/j.procs.2018.10.275 6. Singh A, Ramasubramanian K, Shivam S (2019) Introduction to Microsoft Bot, RASA, and Google Dialogflow. In: Building an enterprise chatbot. Apress, pp 281–302. https://doi.org/10. 1007/978-1-4842-5034-1_7
24
S. Surendran and R. Agarwal
7. Moniz A, Gordon M, Bergum I, Chang M, Grant G (2021) Introducing cognitive services. In: Beginning Azure cognitive services. Apress, pp 1–17. https://doi.org/10.1007/978-1-4842-717 6-6_1 8. Azure Cognitive Services, Microsoft Docs. What is question answering? https://docs.mic rosoft.com/en-us/azure/cognitive-services/language-service/question-answering/overview. Accessed 3 Aug 2022 9. Brandtzaeg PB, Følstad A (2017) Why people use chatbots. In: Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and Lecture notes in bioinformatics), vol 10673. LNCS, pp 377–392. https://doi.org/10.1007/978-3-319-70284-1_ 30 10. Azure Cognitive Services, Microsoft Docs What is Azure cognitive service for language. https:// docs.microsoft.com/en-us/azure/cognitive-services/language-service/overview. Accessed 3 Aug 2022 11. Azure Cognitive Services, Microsoft Docs Use active learning with knowledge base—QnA Maker. https://docs.microsoft.com/en-us/azure/cognitive-services/qnamaker/how-to/use-act ive-learning. Accessed 5 Aug 2022 12. Chatbot analytics: essential metrics & KPIs to measure bot success. https://www.revechat.com/ blog/chatbot-analytics-metrics/. Accessed 3 Aug 2022 13. Bot Service, Microsoft Docs. Analyze the telemetry data from your bot. https://docs.micros oft.com/en-us/azure/bot-service/bot-builder-telemetry-analytics-queries?view=azure-bot-ser vice-4.0. Accessed 5 Aug 2022
Chapter 3
Patient-Centric Electronic Health Records Management System Using Blockchain Based on Liquid Proof of Stake Yash Jaiswal, Ayushi Maurya, Ashok Kumar Yadav, and Arun Kumar
1 Introduction Since electronic health records include sensitive data, if they are stored centrally, it is quite conceivable that hackers will use the records for their own benefit. According to studies, the availability of their health information online has been found to increase convenience and satisfaction among the healthcare industry’s stakeholders, but there are difficulties with the workload, security, and privacy. The goal of this work is to develop an EHR management system for patients that uses blockchain technologies due to its features such as security, privacy, interoperability, auditability, availability, access control, and scalability [12, 15]. Electronic health records (EHRs) were designed with limited networks and accessibility, but the passage of time and advancements in technology added many features to them. Traditional EHRs used client–server architecture, but modern EHRs are using blockchain technology. Transparency and communication between patients and healthcare providers are enhanced using blockchain technology. It also enables patients (data owners) to control their data and share it with designated people. Blockchain-enabled EHRs make sure that medical records are real, private, and accurate. They also have extra features like non-repudiation, authorization, and making the medical records easy for the patient to access [4, 5, 8–10].
Y. Jaiswal (B) · A. K. Yadav Rajkiya Engineering College, Azamgarh, UP, India e-mail: [email protected] A. Maurya · A. Kumar Centre for Advanced Studies, Lucknow, Uttar Pradesh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_3
25
26
Y. Jaiswal et al.
1.1 Blockchain Blockchain is a distributed ledger technology that enables users to interact, store, and retrieve data with ensured data authenticity, immutability, and non-repudiation. Due to the distributed nature of blockchain technology, which enables entities and different devices to communicate data to and from their peers, this eliminates the need for a centralized system. A blockchain is a chronological sequence of blocks that contains a list of all the transaction records that are complete and legitimate. Blockchains are used in cryptocurrencies like bitcoin and Ethereum. A chain is formed through each block containing a reference (hash value) that links it to the prior block in the chain. The block that came before a given block is referred to as its parent block, and the very first block is called the genesis block. The genesis block is the only block in the blockchain that does not include any transactions [20, 21]. The goal of taking control of healthcare data away from corporations and giving it back to patients is theoretically achievable, and it is theoretically possible to protect healthcare records from the growing threat of ransomware and data breaches. Blocks are the basic building blocks of the blockchain that are added to the networks by the miners after presenting the Proof of Work (PoW) [17]. A transaction is the transfer of data or assets between nodes in the blockchain. For example, in the bitcoin blockchain network, the transfer of bitcoins from one node to another is identified as a transaction in the network. A nonce is a random number that miners use to compute the PoW, which is used to meet the blockchain network’s target difficulty. A miner needs to find a random number that is eligible to find a required hash value that is less than the target hash value. Computing the nonce is a very difficult and high-computational resource problem, but validating is easy. We cannot store all the transaction data in the block header, so we build a Merkle tree of the hash values of all transactions and store the hash of the root of the Merkle tree built by the transactions in a node. As we have learned, the blockchain is made up of a network of nodes, each of which has a set of blocks, a record of ownership, at its end. How can we establish confidence that each node is saving the authoritative blocks or the same data that is being saved on each and every other node? Emergent consensus, which was Satoshi Nakamoto’s most important idea, used a decentralized method to solve this problem of trust. We refer to this as emergent consensus since there is no predetermined moment or point at which we arrive at it. It is a process that evolves over time. Consensus is the result of the interaction of four different processes that take place independently on network nodes. Using a detailed set of criteria, every full node verifies each transaction. Mining nodes independently aggregate the validated transactions into new blocks and create a new block with the respective consensus algorithm. Every node independently verifies the new blocks before assembling them into a chain. By using the respective consensus algorithm, each node independently chooses the chain that has been selected by the steps in the consensus algorithm [1, 11, 19].
3 Patient-Centric Electronic Health Records Management …
27
1.2 Cryptography While implementing EHRs, data would often travel from one to another, leaving its secure physical environment. Our major concern is protecting the data in the network from malicious elements out there. Once data goes out of hand in a malicious environment, people or organizational bodies with bad intentions can forge the data, either for amusement or for their own benefit. Cryptography can reformat and transform our data, making it secure on its way between computers or nodes. This technology provides a robust level of data security for us, and its foundation is in the fundamentals of top-secret coding, with the addition of some contemporary mathematics. The principles and processes involved in changing a message from its original, understandable form into one that is unintelligible and then back to its original, understandable form are at the core of any artistic or scientific endeavor. In public key cryptography, the encryption key is made public, but it is mathematically impossible to find the decryption key without the information known to the receiver. In symmetric key algorithms, both the sender and the receiver know the encryption and decryption keys. The encryption key is distributed, and consequently, it is straightforward to derive the decryption key from it. There are many situations in which the encryption key and the decryption key are identical [16].
1.3 InterPlanetary File System (IPFS) It is a distributed file system that stores the files on nodes in the network. Moreover, it is decentralized. Each file that is generated on the network receives a hash, which serves as the file’s sole means of network identification. A file gets a new hash after being edited. These characteristics might lead you to believe that this is similar to blockchain, yet it differs slightly from that. It is a straightforward decentralized system used to store data off-chain. Unlike the blockchain, it does not deal with transactions, lacks a consensus process, and does not utilize PoW [13].
2 Literature Review The MedRec proof-of-concept system demonstrates how decentralization and blockchain technologies may aid in the development of secure, interoperable EHR systems. Through the use of Ethereum-based contracts to coordinate a content-access system across various storage and provider locations, the MedRec authentication log restricts access to medical records while providing patients with complete record inspection, care traceability, and data sharing [7]. By using its hash function to desensitize a user’s identification and location information, Xu and colleagues demonstrated a viable blockchain platform that ensures the general public’s privacy in a
28
Y. Jaiswal et al.
decentralized environment while also protecting the identity of COVID-19 patients. In order to achieve instant verification of tamper-proof test results, Eisenstadt and colleagues [6] used a consortium, Ethereum-based blockchain architecture coupled with a mobile application. This was done by using public or private key pairs that were widely distributed to avoid limiting ownership of sensitive keys or data. According to [14], the authors stored the COVID-19 vaccination records of each recipient on-chain on a publicly accessible platform and used an iris extraction technique to authenticate users and locate vaccination records anonymously. This allowed them to conceal the input and prevent the leakage of any personally identifiable information. According to a study of [2], a consortium network blockchain was created for the sharing of COVID-19-related records. (Example: chest CT scans) Prior to being stored on-chain, these reports were recognized and verified by the blockchain. This is accomplished by eliminating reports that are unrelated to COVID-19 by comparing the perceptual hash of each report to already-existing on-chain perceptual hashes. Benchoufi and colleagues [3] employed blockchain’s automatic execution to track clinical trial events in a predetermined chronological order, including voluntary consent. The application of this automatic execution in the context of clinical permission for operations or treatment to thwart medical fraud could be extrapolated further. We assess the main problems for storing and exchanging critical health records over the public network as scalability, interoperability, and storing large volumes of data on-chain after reviewing many works related to EHR and other relevant works. The degree to which several providers can utilize and understand an EHR and read each other’s data is known as interoperability. The quality of healthcare can be standardized and improved through interoperability. The requirement for cost-effective data storage is anticipated to increase along with the anticipated increase in data. If interoperability goals are met, systems facilitating EHR implementations must be scalable and efficient to handle the volume of data in order for network effects to grow. The feature of availability would be compromised if we attempted to store the data using the on-chain approach because it would need too much computational power and be unsustainable. On the other hand, if we choose an off-chain approach, the system’s security would be compromised. Electronic health records, being stored digitally, are prone to tampering with ease. We should come up with a tamper-proof solution for storing health records over the network. The lack of unified interoperability standards has made it much harder for different groups to share data quickly and well.
3 Proposed Model In this section, we will talk about the model that has been suggested for the decentralized application. Patients, medical professionals, and administrative agencies are the three categories of users that interact with the decentralized architecture. Patients have the ability to upload, remove, and control access to their own medical records on the network, as well as the ability to grant or revoke access to registered physi-
3 Patient-Centric Electronic Health Records Management …
29
Fig. 1 Decentralized application architecture
cians. After completing the necessary steps outlined in the law and going through the office of the governing authority (GA), medical professionals can become registered on the network. In the sections that follow, we will talk about the architecture of decentralized applications as well as different use cases. The specific architecture is depicted in Fig. 1. Users of the DApp are able to access its features in accordance with the roles that they have been assigned through the Web Application. It provides the capability to either add documents to the network or remove documents already present on the network. It is also helpful for them to grant the doctors access to their selected health records so that those records can be shared with them. In addition, the patient retains the ability to revoke access to documents from any physician once the course of treatment has been completed. The physicians are able to view the medical records of patients who have given them permission to do so; however, the web application can only be accessed by physicians who have been registered to the network by the GA office in accordance with the required legal procedures. If the patient decides to revoke the doctor’s access to the DApp, then the doctor will not be able to view any documents contained within it. Governing Authorities (GA) have access to the web portal that allows for the addition or removal of doctors. Due to the extremely important nature of medical health records, doctors cannot be added to the network without first having their identity, qualifications, and supporting documents legally verified. Due to the fact that the patient’s records are encrypted, GA is unable to access any of them. The Web application, the blockchain application, and the blockchain network can all be connected to one another through the use of Edge Services as the interface. For the purpose of connecting the web application to the blockchain services and the offchain storage solution, we make use of Web3.js and the IPFS API. While Web3.js is used to connect web applications to blockchain networks and smart contracts, the IPFS API is used to connect web applications to off-chain storage, also known as
30
Y. Jaiswal et al.
decentralized storage over IPFS. Web3.js is used to connect blockchain networks and smart contracts to web applications. The Identity and Access Management Module is the module that authorizes the user and defines their role within the network. It is also the module that manages their access to the network. It confines users to the roles they have been assigned. In this model, smart contracts are carried out on the blockchain whenever authorized users carry out blockchain functions in accordance with the responsibilities assigned to them. In the event that an unauthorized user attempts to run any function on the blockchain, the transactions are rolled back, and they are not added to the blocks. We are unable to store a significant amount of data on the blockchain because doing so would be inefficient from a financial standpoint. Because IPFS is a distributed file system that is open to the public, we encrypt our files before uploading them to the IPFS network. This model uses IPFS as the off-chain data storage. Within the context of this model, the modification of the smart contract’s storage state on the Tezos blockchain network takes place whenever the transaction is carried out on the associated smart contract. New blocks on the Tezos blockchain are generated with the help of Liquid Proof of Stake. As was just mentioned, there are three distinct user types that are supported by our model: patients, medical professionals, and general administrators. As soon as the patient chooses the document, also known as their health record, to be uploaded onto the network, the file is then encrypted with the user’s public key and uploaded to IPFS for storage. The IPFS hash that is retrieved after uploading this file is encrypted with the user’s public key, and then it is sent to the smart contract so that the storage can be updated. The user is informed that the documents they uploaded were successfully uploaded once the transaction has been completed successfully, to delete the file from the IPFS and also to modify the smart contract that was activated in response to this action. The process begins with the user selecting the file that is to be deleted, followed by the patient authenticating themselves through the web application. After the identity and access management module has validated the user, the cryptographic module will retrieve the encrypted hash of the chosen file from the smart contract, decrypt it, and then send a request to the IPFS to delete the file. This will happen once the identity and access management module has completed its validation. The cryptographic module will alert the smart contract to update its storage once the file has been deleted once it has been detected. Patients are required to select the doctor they wish to grant access to from the list of doctors that is presented on the portal, in addition to providing the necessary documentation. Verifying the user’s identity is a prerequisite for moving forward with any of the processes. After the user’s identity has been confirmed, the file is opened with the patient’s private key and encrypted with both the user’s and the doctor’s public keys. Now, the process for uploading it to IPFS and updating the smart contract is exactly the same as the process that was discussed earlier regarding the uploading of the document by the patient. A patient’s identity must first be confirmed before the doctor’s access can be revoked. After that, a cryptographic module must access the patient’s smart contract in order to retrieve an encrypted hash that contains both the patient’s public key and the doctor’s public key. After the patient’s private key is used to decrypt the hash, an inquiry is sent to IPFS in order to delete the right file.
3 Patient-Centric Electronic Health Records Management …
31
After IPFS notifies the cryptographic module that the document has been successfully removed, the smart contract is prompted to update its storage in accordance with the new information. On this portal, physicians are unable to register themselves directly. GA is in charge of conducting eligibility reviews on the doctors and then adding them to the network once they pass. The GA is an organization that verifies the credentials of medical professionals by using a set of established guidelines. If a doctor wishes to become registered on the network, they must first submit their application to the GA office. The GA will then add the doctor’s public key to the smart contracts in order to grant them access to the portal once they have met all of the requirements that have been predetermined. The GA has the authority to remove any doctor from practice who, at any point in time, is discovered to be ineligible or who has been found to have engaged in any form of malpractice. The web application requires the physician to verify his identity before allowing him access to the patient medical records to which he has been granted access. After the doctor’s identity has been confirmed, the documents that he now has permission to access will be displayed on his dashboard. The document is selected by the physician, then the encrypted hash of the file is retrieved from the smart contract and decrypted utilizing the physician’s private key. Following that, the file is retrieved from IPFS. This transaction will be recorded in the smart contract with the current timestamp whenever the doctor accesses the file. By storing the timestamps of the access, the system is made clearer and the access to the health records can be tracked. The end goal is for the patient to be the sole owner of their health records [18].
4 Conclusion To address all of the security, privacy, availability, access, and storage issues with the current EHRs, paper has proposed a solution that stores the health records offchain. The proposed scheme utilized the concept of asymmetric encryption and a new blockchain network called Tezos that runs on LPoS. The new solution utilizes an off-chain storage method to keep the medical records secure. IPFS, which is a solution for off-chain storage, is used so that the problem of storing a significant amount of data can be sidestepped. Additionally, the timestamp of each document access as well as the total number of times each doctor has accessed the documents that have been shared with him are recorded by this solution. On-chain and offchain solutions have their advantages and disadvantages. Consequently, in order to process large amounts of data, we need to investigate the negative effects of both on-chain and off-chain solutions, carry out an in-depth analysis of both solutions, or investigate new alternatives such as cloud computing. Since we already know that the confirmation of a transaction in a blockchain can take at least ten minutes, and since we also know that we need at least five to six confirmations in the blockchain to make a transaction irreversible, we know that this process will take at least an hour. Because of this, a transaction on a blockchain can take up to an hour to be confirmed. To use these technologies in the real world, we need systems with less latency and more security.
32
Y. Jaiswal et al.
References 1. Abbas QE, Sung-Bong J (2019) A survey of blockchain and its applications. In: 2019 international conference on artificial intelligence in information and communication (ICAIIC). IEEE, pp 001–003 2. Ahmad RW, Salah K, Jayaraman R, Yaqoob I, Ellahham S, Omar M (2020) Blockchain and covid-19 pandemic: applications and challenges. IEEE TechRxiv 1–19 3. Benchoufi M, Ravaud P (2017) Blockchain technology for improving clinical research quality. Trials 18(1):1–5 4. Cerchione R, Centobelli P, Riccio E, Abbate S, Oropallo E (2023) Blockchain’s coming to hospital to digitalize healthcare services: designing a distributed electronic health record ecosystem. Technovation 120:102480 5. Chelladurai U, Pandian S (2022) A novel blockchain based electronic health record automation system for healthcare. J Ambient Intell Humanized Comput 1–11 6. Eisenstadt M, Ramachandran M, Chowdhury N, Third A, Domingue J (2020) Covid-19 antibody test/vaccination certification: there‘s an app for that. IEEE Open J Eng Med Biol 1:148– 155 7. Ekblaw A, Azaria A, Halamka JD, Lippman A (2016) A case study for blockchain in healthcare:“medrec” prototype for electronic health records and medical research data. In: Proceedings of IEEE open & big data conference. vol 13, p 13 8. Jiang Y, Xu X, Xiao F (2022) Attribute-based encryption with blockchain protection scheme for electronic health records. IEEE Trans Netw Serv Manag (2022) 9. Kondepogu MD, Andrew J (2022) Secure e-health record sharing using blockchain: a comparative analysis study. In: 2022 6th International conference on intelligent computing and control systems (ICICCS). IEEE, pp 861–868 10. Mahajan HB, Rashid AS, Junnarkar AA, Uke N, Deshpande S.D., Futane PR, Alkhayyat A, Alhayani B (2022) Integration of healthcare 4.0 and blockchain into secure cloud-based electronic health records systems. Applied Nanosci pp 1–14 11. Monrat AA, Schelén O, Andersson K (2019) A survey of blockchain from the perspectives of applications, challenges, and opportunities. IEEE Access 7:117134–117151 12. Nakamoto S (2008) Bitcoin: a peer-to-peer electronic cash system. Decentralized Bus Rev 21260 13. Naz M, Al-zahrani FA, Khalid R, Javaid N, Qamar AM, Afzal MK, Shafiq M (2019) A secure data sharing platform using blockchain and interplanetary file system. Sustainability 11(24):7054 14. Ng WY, Tan TE, Movva PV, Fang AHS, Yeo KK, Ho D, San Foo FS, Xiao Z, Sun K, Wong TY et al (2021) Blockchain applications in health care for covid-19 and beyond: a systematic review. Lancet Digit Health 3(12):e819–e829 15. Risius M, Spohrer K (2017) A blockchain research framework: What we (don’t) know, where we go from here, and how we will get there. Bus Inf Syst Eng 59:385–409 16. Sun J, Zhu X, Zhang C, Fang Y (2011) Hcpp: cryptography based secure ehr system for patient privacy and emergency healthcare. In: 2011 31st International conference on distributed computing systems. IEEE, pp 373–382 17. Vukoli´c M (2015) The quest for scalable blockchain fabric: Proof-of-work vs. bft replication. In: Open problems in network security: IFIP WG 11.4 international workshop, iNetSec 2015, Zurich, Switzerland, October 29, 2015, Revised Selected Papers. Springer, pp 112–125 18. Vyas S, Shukla VK, Gupta S, Prasad A (2022) Blockchain technology: Exploring opportunities, challenges, and applications (2022) 19. Yadav AK (2021) Significance of elliptic curve cryptography in blockchain iot with comparative analysis of rsa algorithm. In: 2021 International conference on computing, communication, and intelligent systems (ICCCIS). IEEE, pp 256–262 20. Yadav AK, Singh K (2020) Comparative analysis of consensus algorithms of blockchain technology. In: Ambient communications and computer systems: RACCCS 2019. Springer, pp 205–218
3 Patient-Centric Electronic Health Records Management …
33
21. Yadav AK, Singh K, Amin AH, Almutairi L, Alsenani TR, Ahmadian A (2023) A comparative study on consensus mechanism with security threats and future scopes: blockchain. Comput Commun 201:102–115
Chapter 4
AI Driving Game Changing Trends in Project Delivery and Enterprise Performance Sashreek Krishnan and L. R. K. Krishnan
1 Introduction Project management is the key to revenue growth, customer satisfaction-retention, and business sustainability. AI plays a significant role in aiding leadership decisionmaking to ensure timely project delivery completion, cost optimization, and capital productivity. It replaces the human interface, ensuring operations consistency, enhancing quality, and reducing waste. AI has been striving to ensure optimum utilization of workforce and financial resources to ensure timely delivery of project commitments. The fundamental principle of good project management is to control project-related activities from start to finish by applying knowledge, skills, tools, and processes to meet stakeholder expectations. Each project has specific and unique requirements. The success and failure of projects depend on a few key factors, like how the project is defined and its scope. Can the project be delivered in a linear sequence, or should the agile way of delivery be considered? The cost limits for the project? Have we built agility in architecture to drive changes quickly to this project? How well we can foresee risk and mitigate it at an early stage. Last but not least, how well the team is equipped with the necessary skill set to execute the project. Many variables affect the timely delivery of projects within cost, quality, and timeline. Although organizations have experience in managing a large number of projects with Organizations still struggle to execute projects effectively with varied types, sizes, and purposes. This is where technology can play a crucial role.
S. Krishnan Vellore Institute of Technology, Chennai, Tamil Nadu, India L. R. K. Krishnan (B) Business School, VIT University, Chennai, Tamil Nadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_4
35
36
S. Krishnan and L. R. K. Krishnan
AI is steadily gaining prevalence in the corporate world. It is being used and applied to assist employees in various tasks. Artificial intelligence in project management can be categorized into the following four types, dependent on context and process. • • • •
Machine learning-based Autonomous management of projects Integration and automation Chatbot assistance.
Among those mentioned above, machine learning-based project management has been proposed to be the most beneficial [1]. Artificial intelligence is seemingly being integrated into project management tools and technology to handle everything from scheduling to evaluating a working team’s behaviors and making recommendations. Studies have indicated that project managers spend more than fifty percent of their time on administrative tasks dealing with check-ins and managing updates. AI bots are more capable of stepping up and handling fewer intensive tasks for the project manager cutting time spent on busywork in half. This is a substantial time saver, allowing PM’s to concentrate on the sophisticated processes that support management strategy. It also allows them to spend more time focusing on their people, empowering them, and finding more efficiency. Projects are often slowed down as project managers have to regularly speak to all team members regarding their needs [2]. With the ability to manage complicated analytics, AI is a clear asset for business efficiency. It helps keep projects within budgetary limits and also within time schedules. AI provides real-time data and project status updates through data visualization, which enables a system to monitor the progress of a project and make educated forecasts about the project’s future. This helps the teams and management discuss the project status and allows informed decision-making on the project duration, cost, and strategy. Using AI-powered data analysis that looks at past projects, we’ll be able to predict, with a much higher degree of confidence, how much a project will cost and how long it will take [3]. The application of AI in project management is on the rise, helping project managers make smart decisions and effectively manage constraints. Project management with the implementation of AI helps to uncover new insights, automate mundane tasks, and understand key project performance parameters. Also, recently, there has been an increased level and prevalence of artificial intelligence throughout multiple industries [1]. This growing popularity is most specific to the project management industry and has been due to the multiple benefits present in its applications. Some of these benefits include, and are associated with, support, accuracy, insight and strategy, elimination of information bias, use of emotional intelligence, and creativity [4].
4 AI Driving Game Changing Trends in Project Delivery and Enterprise …
37
2 Motivation of the Study Artificial intelligence and machine learning are slowly but steadily being introduced in various industries and different work areas. The capability of AI is that it analyzes and processes data in making data projections and improves data predictions. This unique capability helps in improving forecasting project outcomes and scenarios in a much more predictable way. AI is making small strides in project management, too, and this trend is bound to continue this way. It enables early detection of risk scenarios which is a key factor for project success. By 2030, 80% of the work of today’s project management (PM) discipline will be eliminated as AI takes on traditional PM functions [5]. According to a report from KPMG, “AI Transforming the enterprise,” organizations that have invested in AI report an average 15% increase in productivity. In PMI’s “AI@Work” report, project leaders who are at the leading edge of AI and other technology frequently report that the use of AI has cut the time they spend on activities like monitoring progress, managing documentation, and activity and resource planning [6]. Artificial intelligence converses closely with statistics and incorporates the ability to utilize critical thinking in decision-making processes by asking the right questions and applying the appropriate delivery methods [7].
3 Organization Trends Project managers use AI to gain insight into various outcomes, improving decisionmaking quality. The system detects links and a trend in data, reducing extraneous data and allowing managers to focus on the essential details. AI creates the possibility of automated processes and intelligent tools that will reduce manual work. To deliver deep insights into a project or program, AI needs an extensive dataset from which it can learn what works and what doesn’t. One of the most challenging problems in successfully adopting an AI-based project management system is having vast historical datasets and current project information in a standardized form [8]. The application of AI is rising in PM and helps project managers to make intelligent decisions and effectively manage the constraints. In the future, AI will help project managers with the following tasks: defining the scope of a project, aligning with other business areas, analyzing risks, developing project schedules, timelines, budgets, assigning tasks, implementing software, and other technical components, evaluating project outcomes, and tracking issues.
38
S. Krishnan and L. R. K. Krishnan
4 Contribution of AI AI is extensively used along with machine learning driving project management and delivery into new terrain with the help of predictive analytics. Data Science is a big enabler for project management and the contribution of AI is significant. The current trends indicate that AI will continue to support leadership in decisionmaking, changing behaviors at work as rudimentary activities will no longer be handled by men. Man–machine collaboration will drive superior performance and enhance productivity. Cutting edge technology and path breaking activities will be the future of the world of work. AI tools are helping project managers analyze the organization’s project database and forecast the time by which the project can be completed more accurately. Once implemented, AI tools are substantially reducing project costs and increasing company revenues. It is predicting and preventing cost overrun, giving lead indicators that will help the project managers to control the project cost more efficiently. Furthermore, an AI tool aids in reducing human errors and brings in more predictability and consistency. There is possibility to reduce delays by embedding risk identification and assessment along with associated controls in the development cycles. AI can play a crucial role in giving the right indicators for project managers with the right set of project parameters that should be controlled to efficiently deliver the project on time within cost and expected quality.
5 Literature Review Artificial intelligence currently possesses an extensive range of applications in projects. The applications include processes to reduce risks, tracking projects, and identifying anomalies, outliers, or correlations within projects. Robotic process automation is also an artificial intelligence application gradually gaining traction within the management of projects [9]. Artificial intelligence is a complex field with countless complexities, but when implemented appropriately, the technology may significantly enhance productivity and performance and eliminate mistakes. One of the benefits of employing such technology is to decrease errors, particularly in software development projects where a range of flaws can be discovered at any time, which is an essential measure of project quality. Currently, intelligent systems that utilize AI often depend on machine learning and tend to learn from previous data often referred to as training data. The dataset can be problem-specific training data for recruitment purposes [10]. The large amounts of wide-coverage semantic knowledge and the ability to extract it using powerful statistical methods are significant [11]. Significant application advances require deep understanding capabilities, such as information retrieval for using AI [12] and question-answering engines [13]. The well-known problem of high
4 AI Driving Game Changing Trends in Project Delivery and Enterprise …
39
cost and scalability discouraged the development of knowledge-based approaches in the past, more recently, the increasing availability of online collaborative knowledge. In recent PWC’s “Annual Global CEO Survey,” 85% of CEOs agreed that AI would significantly change how they do business in the next five years. In the Middle East, this number was even higher at 91%. 78% of CEOs in the Middle East see AI will be having a more significant impact than the Internet. And yet, from an implementation perspective, only 43% have plans for the next three years, and another 23% have introduced AI in their business but only for limited uses [14]. Large enterprises need to create and enforce formal governance policies, processes, and controls around AI technologies, and in many different areas, including monitoring and managing risk, performance and value; also helping to ensure that the end-to-end lifecycle maintains appropriate levels of trust and transparency; it also requires creating new roles and responsibilities and designating accountabilities; including training teams across an organization [15]. Virtual reality (VR), augmented reality (VR), and mixed reality (MR) solutions with AI tools are revolutionizing project and program management through immersive learning. Immersive learning, which relies on a dynamic environment, forces employees to think for out-of-the-box solutions for project execution. AI games like “Unlock: Project Management,” which are designed to encourage players to think like project managers using high-pressure single-player scenariobased gaming, cover project management skills needed at level 2 training. It develops stakeholders in the areas of project scope, planning, issue management, risk management, reporting, and balancing, which are very important for successful project management [16]. AI provides project managers with a better understanding of potential outcomes, which improves decision-making quality. The system and processes will reduce extraneous data by detecting links and patterns in data, allowing managers to focus on critical facts. AI holds the potential to transform key areas in project management as discussed here under: Business Growth and Sustainability: By using the organization database, AI can help in analyzing business-related data and provide better and more accurate insights into the business. This, in turn, can help build business portfolios in a reliable and sustainable way. The data predictability power of AI helps in better planning and coming out with optimized schedules. Risk Management: AI is more capable of identifying risk early in the project than the traditional way of determining risk. AI can help in identifying risk responses, probabilities, and their impact on the project lifecycle. By using the organization database, AI helps in suggesting corrective and preventive action based on historical data. AI can help in continuously monitoring and tracking progress and help in giving early warning signs to project managers. Resource Management: AI helps plan and manage resources in an optimized way by analyzing the organization’s database. One key factor for any project to succeed is the project team. AI identifies skill gaps within the team and supports to fill the skill gap by identifying the proper training for an individual, which, in turn, will boost the
40
S. Krishnan and L. R. K. Krishnan
learning opportunities for an individual. AI can help in identifying risky employee behaviors. AI-enabled project management tools can direct project managers on when and in which skills a particular employee needs training. It provides feedback about the behavior and competency of the project managers based on their decisionmaking abilities. AI-enabled systems can give early warning to project managers on the possible resource capacity and utilization issues and suggest preventive actions. It can also help in flagging resource under or overutilization, which can help in resource optimization. AI can also help project managers in managing hardware/ system resources in a similar way. Technological Improvements: AI can help in transforming PM by bringing in benefits from other technological breakthroughs such as analytics, robotics process automation (RPA), the IoT, blockchain, and quantum computing. AI can help perform specific analyses and provide insights or perform tasks such as updating project progress reports and schedules accurately. Project managers need to capitalize on the opportunities generated by technological disruption and, in many ways, be the champions of new technologies as they emerge. However, according to PWC’s study, “Workforce of the Future” survey, 73% of people think technology can never replace the human mind. Indeed, AI tools rely heavily on data input from project leaders, and without their guidance, AI systems will not be able to perform distinctly. Project planning could be made more robust by enabling auto-scheduling using programmed logic and rules [8]. The series of past studies indicate that AI’s impact on project performance, business operations is significant and sustainable. Future trends indicate the significant role of AI, ML, and analytics in project performance and business sustainability. Artificial intelligence (AI) influences enterprise performance in multiple ways. The power of AI in predictive analytics is used to interpret vast amounts of data available within the business and identify patterns and trends that may be difficult, time-consuming, and sometimes quite impossible to detect using traditional methods. This can help enterprises make more informed decisions about future investments, products, and services. Another area where these predictive analytics could help an enterprise is in supply chain management. AI is used to predict demand and supply thereby improving inventory management. AI is being used to automate repetitive tasks, freeing up employees to focus on more strategic initiatives. This helps in saving time, cost, and improve quality and efficiency and reduce errors in the enterprise’s operations. Using the AI-powered chatbots and virtual assistants, routine customer inquiries and support tasks can be improved in a consistent way. This could also improve the response times and dependency of time and availability of workforce 24*7. AI has the features to analyze data usage of customer and give more personalized customer experience. This enables to create a wow factor for the customer which in turn can improve the customer loyalty and thereby improve revenue of the business. AI can also be put in use for fraud detection. AI assists enterprises scout for the best candidates for open positions, reducing the time and cost associated in hiring and thereby help build a strong workforce. Consequently, AI can be a powerful tool for improving enterprise performance, but
4 AI Driving Game Changing Trends in Project Delivery and Enterprise …
41
the flip side is that for any organization it is important to implement these technologies responsibly and ethically. Organization should be mindful of the potential impact on employees and customers and always have an eye for alignment with the organizations belief system, values, and goals.
6 Areas of Application of AI in Project Management The areas of application of AI are visible in the areas of workforce allocation, scope management, cost and time forecasting. Workforce Allocation: AI mainly enables allocating resources based on workload, project complexity, need for specialized skills, incidence management, etc. Scope Management: It plays a critical role in defining work breakdown structure (WBS), which is a key for project management planning as it is the basis for scheduling, controlling, and assigning resources. AI can also be applied to develop a comprehensive WBS and reduce the probability of omitting critical tasks from the WBS [17]. Cost Management: Allocating costs on merit and monitoring the price is a critical activity. Time Management: Time management is the key to project delivery, and project managers must under the implications of time and cost overrun. AI helps track all the factors and ensures timely decision-making to avoid project overruns.
7 Conceptual Framework Figure 1 clearly indicates that automation of tasks and ensuring digitization help to seamlessly implement AI in project management. Once AI is implemented, realtime progress tracking is possible, including project cost and program management, which aids business leaders in making accurate decisions based on hard facts. This will result in the timely completion of projects with no time or cost overrun, and accurate leadership decision-making is possible, resulting in better control of results.
Fig. 1 Conceptual framework
42
S. Krishnan and L. R. K. Krishnan
8 Methodology A simple random sample was drawn from leading organizations involved in largescale project management and rollout. Organizations were chosen based on size, value, and volume of projects, including the number of project employees and their exposure to AI in project management. A sample of 55 was drawn, and a structured questionnaire on a five-point Likert scale was administered. After that, personal interviews were conducted with all the 55 respondents over a period three months to verify the responses and understand the nature of the usage of AI in their respective organizations and its positive rub-offs.
8.1 Hypothesis H1: AI significantly improves project delivery by shortening timelines and reducing cost. H2: AI enhances decision-making of project managers thereby enhancing project delivery.
8.2 Analysis and Discussion Various statistical tests were performed to analyze the data and test the hypothesis. The reliability and validity tests indicated high degree of reliability of the questionnaire (Refer Table 1). To calculate co-variance of all the variables in the questionnaire, the Cronbach alpha was calculated (Ref Table 1). This test helps to understand the reliability of the questionnaire. The arrived value is 0.896 and shows it is highly reliable. Cronbach’s alpha is based on the standardized items 0.914 which is very significant. Inter item correlation matrix test was performed and interterm correlation was calculated to find out internal consistency of the variables. The arrived result shows that questions framed were consistent and provides appropriate results for the analysis. Most of the values arrived were positive which shows consistency among the variables in the questionnaire. The correlation matrix test results are depicted in Table 2. Table 2 depicts reproduced and residual correlation. In reproduced correlation matrix, the data indicates positive correlation and there is internal consistency Table 1 Cronbach’s alpha test Cronbach’s alpha
Cronbach’s alpha based on standardized items
No of items
0.896
0.914
19
Q18
0.589 0.26
0.582 0.53
0.411 0.45
0.531 0.528 0.540
0.002 0.007 0.000 0.000 0.000 0.001 0.001 0.001 0.000 0.000 0.000 0.020 0.001 0.002 0.001
0.001 0.002
0.002 0.007 0.000
Q3
Q4
(continued)
0.075 0.138 0.068 0.115 0.018 0.076 0.001 0.019 0.035 0.075 0.103 0.305 0.006
0.000 0.038 0.205 0.187 0.184 0.379 0.014 0.034 0.077 0.073 0.133 0.156 0.011 0.325
0.000 0.001 0.002 0.004 0.001 0.001 0.001 0.007 0.000 0.003 0.017 0.010 0.006 0.000 0.024 0.000
0.000
Q2
Q18 0.455 0.441 0.065 0.348 0.343 0.560 0.540 0.495 0.352 0.504 0.554 0.591 0.513 0.443 0.627 0.333 1.000
Q17 0.279 0.398 0.322 0.073 0.249 0.562 0.528 0.486 0.349 0.297 0.255 0.626 0.461 0.327 0.407 1.000 0.333
Q16 0.456 0.418 0.144 0.180 0.290 0.564 0.531 0.349 0.345 0.396 0.520 0.504 0.445 0.383 1.000 0.407 0.627
Q15 0.347 0.289 0.159 0.204 0.239 0.402 0.450 0.504 0.301 0.626 0.417 0.451 0.290 1.000 0.383 0.327 0.443
Q14 0.323 0.463 0.206 0.256 0.398 0.553 0.411 0.368 0.423 0.463 0.604 0.623 1.000 0.290 0.445 0.461 0.513
Q13 0.296 0.458 0.203 0.290 0.372 0.642 0.532 0.480 0.444 0.484 0.568 1.000 0.623 0.451 0.504 0.626 0.591
Q12 0.381 0.509 0.257 0.443 0.564 0.500 0.582 0.552 0.456 0.505 1.000 0.568 0.604 0.417 0.520 0.255 0.554
Q11 0.448 0.441 0.308 0.204 0.425 0.455 0.268 0.352 0.271 1.000 0.505 0.484 0.463 0.626 0.396 0.297 0.504
Q10 0.341 0.435 0.044 0.295 0.433 0.559 0.589 0.496 1.000 0.271 0.456 0.444 0.423 0.301 0.345 0.349 0.352
0.73
0.436 0.414 0.129 0.171 0.515 0.594 0.738 1.000 0.496 0.352 0.552 0.480 0.368 0.504 0.349 0.486 0.495
Q17
0.438 0.456 0.127 0.212 0.377 0.702 1.0
Q16
Q9
Q15
0.425 0.576 0.118 0.156 0.543 1.000 0.702 0.594 0.559 0.455 0.500 0.642 0.553 0.402 0.564 0.562 0.560
Q14
Q8
Q13
Q7
Q12
0.365 0.517 0.250 0.204 1.000 0.543 0.377 0.515 0.433 0.425 0.564 0.372 0.398 0.239 0.290 0.249 0.343
Q11
0.401 0.343 0.507 1.000 0.204 0.156 0.212 0.171 0.295 0.204 0.443 0.290 0.256 0.204 0.180 0.073 0.348
Q1 0
Q6
Q9
0.443 0.395 1.000 0.507 0.250 0.118 0.127 0.129 0.044 0.308 0.257 0.203 0.206 0.159 0.144 0.322 0.065
Q8
Q5
Q7
Q4
Q6
0.536 1.000 0.395 0.343 0.517 0.576 0.456 0.414 0.435 0.441 0.509 0.458 0.463 0.289 0.418 0.398 0.441
Q5
1.000 0.536 0.443 0.401 0.365 0.425 0.438 0.436 0.341 0.448 0.381 0.296 0.323 0.347 0.456 0.279 0.455
Q4
Q3
Q3
Q2
Q2
Sig. (1-tailed) Q1
Correlation
Correlation matrixa
Table 2 Correlation matrix
4 AI Driving Game Changing Trends in Project Delivery and Enterprise … 43
0.007 0.001 0.379 0.018 0.001 0.000 0.000 0.000
Q9
Q1 0
Q11
Q12
Q13
Q14
Q15
Q16
Q17
Q18
Q17 0.000 0.001 0.325 0.006 0.007 0.000 0.000 0.000 0.006 0.000 0.000 0.000 0.000 0.001 0.000 0.008
Q16 0.024 0.002 0.011 0.305 0.039 0.000 0.000 0.000 0.006 0.017 0.036 0.000 0.000 0.010 0.002
0.008
0.002 0.000
0.003 0.010 0.001
0.020 0.001 0.000 0.000
Q15 0.000 0.001 0.156 0.103 0.020 0.000 0.000 0.006 0.007 0.002 0.000 0.000 0.001 0.003
Q14 0.006 0.020 0.133 0.075 0.045 0.002 0.000 0.000 0.016 0.000 0.001 0.000 0.020
Q13 0.010 0.000 0.073 0.035 0.002 0.000 0.001 0.004 0.001 0.000 0.000 0.000
0.000 0.000 0.000 0.000 0.000
0.000 0.000 0.001 0.000 0.036 0.000
0.000 0.000 0.000 0.000 0.002 0.017 0.000
Q12 0.017 0.000 0.077 0.019 0.004 0.000 0.000 0.000 0.001 0.000 0.000
Q11 0.003 0.000 0.034 0.001 0.000 0.000 0.000 0.000 0.000 0.000
Q10 0.000 0.001 0.014 0.076 0.001 0.000 0.029 0.006 0.027
0.027 0.000 0.001 0.001 0.016 0.007 0.006 0.006
0.000 0.006 0.000 0.000 0.004 0.000 0.006 0.000 0.000
0.000 0.000 0.029 0.000 0.000 0.001 0.000 0.000 0.000 0.000
0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.002 0.000 0.000 0.000
0.001 0.001 0.184 0.115 0.000 0.000 0.000
Q9
0.001 0.000 0.187 0.068 0.003 0.000
Q8
0.000 0.003 0.000 0.001 0.001 0.000 0.004 0.002 0.045 0.020 0.039 0.007
Q7
Q8
Q6
Q7
Q5
0.001 0.000 0.205 0.138 0.000
Q4
0.004 0.000 0.038 0.075
Q3
Q6
Q2
Q5
Correlation matrixa
Table 2 (continued)
44 S. Krishnan and L. R. K. Krishnan
4 AI Driving Game Changing Trends in Project Delivery and Enterprise …
45
between the variables. Value arrived in reproduced matrix has to be close to the original matrix indicating that there is a great deal of variance and they are highly correlated. When the value is more than 0.40, it indicates that there is a correlation between the variables. When correlation values are too high it indicates multicollinearity, in reproduced correlation matrix item Q14, Q12, Q13, Q7 show value more than 0.60, in residual most of the values are significant and positive. Model Summaryb Hypothesis was tested under regression analysis (refer Table 3), R square value 74% was predicted from the independent variables; this shows the strength of association between the variables and also called co-efficient of determinant. The adjusted Rsquare = 59.4% attempts most honest value of R square. Table 4, Anova table shows regression, residual and total, from the total of 19.176 sum of square 14.199 independent variables were measured, residual of 4.977 was not measured by the independent variable. Here degree of freedom is associated with the source of variance, the model has 19 variables (19–1 = 18), so the model has 18 DF. To test the significance of the predictors from the analysis, F values were calculated. From the analysis it is clear that value calculated is statistically significant. Thus, it is proven that artificial intelligence creates positive impact on business sustainability. Coefficientsa Refer Table 5, the parameter estimates clearly show the confidence intervals which help to estimate the co-efficient value. The p-value obtained here is less than alpha, indicating statistical significance and also the estimate helps to find the relationship between dependent variable and independent variable. Table 3 Regression analysis Model
R
R Square
Adjusted R Square
Std. error of the estimate
1
0.860a
0.740
0.594
0.3944
Table 4 Anova ANOVAa Model 1
Sum of squares
df
Mean square
F
Sig
Regression
14.199
18
0.789
5.072
0.001
Residual
4.977
32
0.156
Total
19.176
50
46
S. Krishnan and L. R. K. Krishnan
Table 5 Coefficient Model
1
(Constant)
Unstandardized coefficients
Standardized coefficients
B
Beta
Std. error
t
Sig
0.791
0.577
1.371
0.180
−0.008
0.058
−0.017
−0.142
0.888
2. Project managers have −0.018 improved their decision-making with the help of AI tools
0.094
−0.027
−0.186
0.853
3. In your current role you are empowered to take decisions about implementing AI technologies in project management
0.076
0.081
0.128
0.942
0.353
−0.115
0.097
−0.180
−1.193
0.242
5. Does your organization use AI project tools for decision-making?
0.020
0.073
0.040
0.279
0.782
6. With the use of AI in decision-making you are willing to take accountability
0.193
0.100
0.252
1.930
0.062
1. Which industry describes your most relevant work experience?
4. Are you trained in using AI tools in project management?
(continued)
8.3 Limitations of the Study AI employs machine learning algorithms wherein it has to go through years of historical data to learn its purpose; hence the predictability of the AI tool depends on how good the organization’s historical data is. Our research also shows some concern among project managers that AI might fail to oversee the essential soft skills to deal with team members. The analysis was restricted to leading information technology companies, and a simple random sample was collected across various companies. The study did not cover non-IT sectors and small/medium companies. The study is limited to AI’s impact in project planning, implementation, and delivery and did not cover other organizational aspects directly impacting project performance.
4 AI Driving Game Changing Trends in Project Delivery and Enterprise …
47
Table 5 (continued) Model
Unstandardized coefficients
Standardized coefficients
B
Beta
Std. error
t
Sig
7. AI helps in managing risk efficiently
0.369
0.150
0.434
2.467
0.019
8. AI can help manage time efficiently
0.012
0.164
0.012
0.073
0.943
9. AI can help manage cost efficiently
0.024
0.122
0.026
0.199
0.844
10. AI promotes transparent processes in the management of people
0.130
0.118
0.176
1.101
0.279
−0.246
0.155
−0.303
−1.583
0.123
12. AI ensures compliance of system and processes
0.146
0.135
0.170
1.084
0.286
13. AI assists in skill gap analysis for project deployment
0.106
0.106
0.141
1.006
0.322
−0.044
0.113
−0.055
−0.389
0.700
15. AI tools deliver value to customer
0.167
0.146
0.166
1.139
0.263
16. AI helps in resource allocation in project delivery
0.038
0.129
0.045
0.293
0.771
−0.062
0.155
−0.068
−0.402
0.691
0.060
0.076
0.110
0.789
0.436
11. AI tools ensure business sustainability
14. AI integrates customer and user interfaces
17. AI directly contributes to business results 18. How would you rate your readiness for AI?
48
S. Krishnan and L. R. K. Krishnan
9 Impact on Project Managers AI could be an enabler but cannot replace the necessary skills that require a human presence to understand the current project situation and take necessary action. There are apprehensions, that they might have difficulty learning and coping with the fast-improving technological advancements. Adding to that, cost factors are also an essential aspect of introducing artificial intelligence. The small- and mediumscale industries cannot afford AI as a recruitment tool; hence it is necessary for the proper pricing for affordability [18]. The data sample shows that managers are optimistic about AI’s change in project management practices. It will help to reduce a significant amount of workload and decision-making risks and focus on other essential responsibilities. It is evident from the survey that the professionals are confident they will undoubtedly be able to adapt by learning about AI tools’ functionalities and educating themselves for smooth project delivery.
10 Conclusion The study established the business criticality of adopting AI in project execution for business success and sustainability. Based on the statistical analysis and in-depth discussions on the impact of artificial intelligence on project delivery and implementation, we have been able to conclude that AI is making giant strides in the areas of project delivery and ensuring business sustainability. AI tools are helping optimize costs, project completion deadlines, and capital productivity. The study indicates the readiness of industry to adopt AI in project management, which enhances project performance. Discussions with project managers indicate that project managers believe AI can help in analyzing the complexity of a project and aid them in executing the project in a much-controlled way. The study concludes with the findings established from the empirical evidence that AI significantly improves project delivery by shortening timelines and reducing costs. It also enhances the decision-making of project managers, thereby enhancing project delivery and implementation. The range of past studies aligns significantly with this study’s findings. The major outcomes of the study indicate that AI has a substantial impact on performance and driving business sustainability. The novelty of the study is retained by highlighting the essence of project management which is cost optimization, on time performance, and leadership decision-making with focus on enterprise performance and business sustainability.
4 AI Driving Game Changing Trends in Project Delivery and Enterprise …
49
References 1. Wang Q (2019) How to apply AI technology in project management. PM World J VIII(III). https://pmworldlibrary.net 2. Schmelzer R (2019) AI in project management. https://www.forbes.com/sites/cognitiveworld/ 2019/07/30/ai-in-project-management/?sh=3a614a4b4a00 3. Diffendal J (2021) 5 implications of artificial intelligence for project management. https:// www.pmi.org/learning/publications/pm-network/digital-exclusives/implications-of-ai 4. Manyika J, Sneader K (2018) As machines increasingly complement human labor in the workplace, we will all need to adjust to reap the benefits. https://www.mckinsey.com/ 5. Marques C (2021) How artificial intelligence can help in project management. https://www. modis.com/en-be/insights/blog/ 6. Hill S et al (2019) AI transforming the enterprise 8 key AI adopted trends. https://assets.kpmg/ content/dam/kpmg/tr/pdf/2021/03/ai-trends-transforming-the-enterprise.pdf 7. Rose D (2016) Data science create teams that ask the right questions and deliver real value. Apress, Berkeley, CA 8. PWC (2018) A virtual partnership? How artificial intelligence will disrupt project management and change the role of project managers 9. Ong S, Uddin S (2020) Data science and artificial intelligence in project management: the past, present and future, data science and artificial intelligence in project management, vol 7, no 4, Issue 22. https://doi.org/10.19255/JMPM02202 10. Janiesch C, Zschech P, Heinrich K (2021) Machine learning and deep learning. Electron Mark 31:685–695. https://doi.org/10.1007/s12525-021-00475-2 11. Weikum G et al (2009) Database and information-retrieval methods for knowledge discovery. Commun ACM 52(4):56–64. https://doi.org/10.1145/1498765.1498784 12. Chu-Carroll J, Prager J (2007) An experimental study of the impact of information extraction accuracy on semantic search performance. In: Proceedings of the Sixteenth ACM conference on information and knowledge management. Lisbon, Portugal, 6–9, pp 505–514 13. Ferrucci DA et al (2010) Building Watson: an overview of the deep QA project, vol 31, no. 3. https://doi.org/10.1609/aimag.v31i3.2303 14. PWC (2017) Sizing the prize: What’s the real value of AI for your business and how can you capitalise? https://www.pwc.com/gx/en/issues/analytics/assets/pwc-ai-analysis-siz ing-the-prize-report 15. Hill S (2021) AI Transforming the enterprise, KPMG. https://advisory.kpmg.us/articles/2019/ 16. Totem (2021) Unlock: project management. https://www.totemlearning.com/onlinetrainingc ourses/projectmanagementskills 17. Babu AJ (2017) Reinventing the role of Project manager in the artificial intelligence era, Project management National conference, India. https://www.pmi.org.in 18. Bughin J, Hazan E, Ramaswamy S et al (2017) Artificial intelligence the next digital frontier? https://www.mckinsey.com
Chapter 5
Prediction of Children Age Range Based on Book Synopsis P. Baby Maruthi and Jyothsna Manchiraju
1 Introduction Books are a means to learning. Books invoke curiosity and are helpful in nurturing imagination, especially in children and young adults. Traditionally, books are available for purchase at bookstores. They would also be available in public and private libraries. With the advent of internet and desktop publishing technologies, books started to be available for viewing and purchase on the internet. In recent times, e-commerce companies such as Amazon, Flipkart, Snapdeal, Paytm Mall, BooksWagon.com are some of the online e-bookstores where people can purchase a variety of books. In addition, libraries are also made online [1] bringing e-books for readers. A book can be considered as many attributes like book title, author, publisher, language, cataloguing information (ISBN number), book content, and book synopsis. These attributes are useful to organize and index books in various ways. For example, books can be organized alphabetically, by language, by genre, by age group, etc. Organizing the books according to these various criteria provides a lot of flexibility to meet the needs of customers and readers. When a book is made available online, the bookseller enters a few standard details regarding the book, such as ‘Book Title’, ‘Author Name’, ‘Book Synopsis’, ‘Price’. In some cases, the suggested age group may also be included as part of these details; typically, it serves as a guide for readers and those wishing to purchase the book. The availability of this information can also help the people and bookstore persons to arrange books based on age factor. While the suggested age group can be
P. Baby Maruthi (B) Dayananda Sagar University, Bangalore, India e-mail: [email protected] J. Manchiraju Upgrad Education Pvt. Ltd, Chennai, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_5
51
52
P. Baby Maruthi and J. Manchiraju
entered manually, it is tedious to do so. Therefore, there is requirement for developing a system which can recommend the age range for a given book.
2 Review of Literature The problem of predicting age based on text content processes text data in English blog posts, telephone transcripts and breast cancer forums to predict age of the author/source using a linear regression approach [2]. Domain adaptation techniques are used to train the text corpora jointly and separately. An analysis of differences in predictive features across joint and specific aspects of the regression model is also undertaken. Pentel [3] proposed an approach for detecting the age of the author of very short texts collected from social media platforms, blog comments, and internet forums. Text features based on complexity of words, count of words per sentence, frequency of complex words, count of syllables, and frequency of commas were extracted. Text classification was made based on popular algorithms such as Support Vector Machine (SVM), K-Nearest Neighbor (KNN), C4.5 [5], Logistic Regression (LR), AdaBoost [4], and Naïve Bayes (Webb, 2010). Marquardt et al. [6] introduce an approach for guessing the author’s age and gender based on a dataset of English, Spanish language posts from various genres such as blog posts, Twitter posts, hotel reviews, and social media posts. MRC psycholinguistic database for English datasets, Linguistic Inquiry and Word Count dictionary (LIWC) for hotel review corpus, Sentiment using SentiStrength tool, stylistic features, spelling and grammar, emoticons, and other features were utilized. A closely related work is that of Blandin et al. [7] which proposes an approach for recommended age range for French texts. The dataset is a collection of texts derived from tales, novels, magazines, and newspapers. Embeddings, lexical information, typography, morphosyntax, verbal tenses, genders, syntactic dependencies, logical connectors, phonetics, sentiments, and emotions were considered for feature extraction. Standard regression, multi-task regression, and a combination of classification and standard regression were the machine learning models used. Text classification methods are also studied for the research. Zhang et al. [8, 9] propose the processing of texts through of Bidirectional Long-Short-Term Memory (BiLSTM). The text is firstly converted to features using n-gram techniques, and the features are then processed through BiLSTM model to generate predictions based on one-vs-one and one-vs-rest manner. Yao et al. [10] brought the new fast Text algorithm for generating high-quality text representation features. The fast Text algorithm is more accurate than the traditional bag-of-words models. The fast Text model forms a hidden layer which has the average value of words corresponding to the n-grams. The hidden layer is modified into the output layer. The paper compares the performance of classification models like Support Vector Machine, K-Nearest Neighbor, and Naïve Bayes with fast Text algorithm. TF-IDF is a technique used in statistical feature extraction. Liu et al. [5, 11, 12] worked over improving TF-IDF algorithm. The words with high-weight TF-IDF are identified using the word frequency and
5 Prediction of Children Age Range Based on Book Synopsis
53
file frequency scores. A new term called the weighting factor is introduced. The paper used Word2vec model to represent text word vectors. Weighted Word2vec is obtained by weighting the word weight and word vector.
3 Proposed System This study is about predicting the recommended age range of the book taking the book synopsis as input. The feature extraction techniques are performed over the input which is the book synopsis. The extracted features are processed through machine learning models and the recommended age range of the book is generated as output. The diagrammatic representation of the proposed system is shown in Fig. 1.
3.1 Data Collection The dataset is available and sourced from Kaggle, and it has three tabular structures in comma-separated value (csv) format, namely children_stories.csv, children_ books.csv, and books_data.csv files. These files contain the records of book title, author, book description, and reading age values which are necessary for this study.
3.2 Data Cleaning The data are sorted for completeness and necessary data cleaning has been performed. The data with respect to ‘reading age’ and ‘book description’ (synopsis) have been
Fig. 1 Flow diagram of proposed system
54
P. Baby Maruthi and J. Manchiraju
Table 1 Splitting of data according to age range Age range (years)
Total number of records
Number of records in training set
Number of records in validation set
Number of records in test set
0–5
691
448
104
139
6–10
678
441
101
136
11–18
212
137
32
43
cleaned. For example, children_stories.csv dataset contains some entries for age without specifying a lower age, e.g., 7+. Since the maximum age in the dataset table is 10, we assume that for all such entries, the upper boundary age is 10.Thus, 7+ is changed to the range 7–10 years. Similarly, for the book description, we remove redundant letters, extra spaces, symbols, etc., to keep the data ready for feature extraction.
3.3 Splitting of Dataset Grouping the values of Reading_age column, the dataset is partitioned into three subsets of training, testing, and validation. There are three categories of age range values, i.e., 0–5 years, 6–10 years, and 11–18 years. From each category, the data are partitioned such that 65% of the data goes to the training set, 20% data into the testing set, and 15% data into validation set. The splitting of data based on age range is shown in the following Table 1.
3.4 Feature Extraction Numerical representation of words and sentences as vectors is useful in classification tasks. Statistical feature extraction techniques such as TF-IDF, Word2Vec, Doc2Vec, and BERT are applied over ‘Description’ column. • Word2Vec—Word2Vec [13, 14] refers to vector representation of words. For every unique word in the corpus, a unique vector is assigned. To obtain this representation, a large corpus of text sentences is employed. The representation for a word is obtained by utilizing the context in which it appears across the sentences in the corpus. The representation learning is achieved by using a neural network. There are two popular approaches for obtaining the distributed vector representation of words. One is continuous bag of words (CBOW), and the other is Skip-gram method. Typically, words are represented as a 300-dimensional vector. Word2Vec word embeddings have been shown to be capable of capturing semantic and syntactic similarity.
5 Prediction of Children Age Range Based on Book Synopsis
55
• Doc2Vec—Doc2Vec is a method similar to word2vec, but for obtaining vector like representations for paragraphs or documents. As with word2vec, doc2vec representations are learnt under the assumption document-vector representation which should be capable of predicting words in the document. Doc2Vec generates embeddings irrespective of length of the document. • BERT stands for Bidirectional Encoder Representations from Transformers. It is useful in understanding the meaning of a group of words together. The idea of Bidirectional training is a crucial part of the popular attention-based sequence-tosequence model called Transformer. This idea is adapted for the task of language modeling within BERT by considering the words in a sentence as a token sequence. The optimization within BERT involves taking a sequence of tokens which are embedded into vectors and processed by a neural network. However, the output consists of certain tokens which are masked. The model tries to predict the masked words from the context provided by the non-masked words in the sequence.
3.5 Classification Models In this section, the machine learning models KNN, SVM, XGBoost, LGBM, MLP classifiers with a combination of handcrafted and statistical features are discussed in detail and it is shown in Fig. 2. • K-Nearest Neighbor (KNN)—K-Nearest Neighbors algorithm performs classification by examining the distance of the test sample to all the training samples. The training samples are first sorted in the order of increasing distances. The majority label from among the labels of K-nearest (by distance) training samples is considered as the predicted label for the test sample. The integer K is the hyperparameter for the K-Nearest Neighbors algorithm, while simple to implement, K-Nearest Neighbors algorithm requires all training data to be available during prediction. • Support Vector Machine (SVM)—For a two-class classification problem, the linear hyperplane which separates samples of the two classes can be considered as the classifier. Even if the samples are separable, it is desirable to obtain a classifier which provides a margin between samples of the two classes for improved generalization. To achieve this, two ‘support’ hyperplanes parallel to the classifier hyperplane are constructed. The distance between the support hyperplanes is called the margin. Subsequently, the coefficients of the classifier hyperplane (weights) are optimized to not only produce correct classification, but also to maximize the margin mentioned above, possibly at the cost of some misclassifications. Additionally, to enable classification for nonlinearly separable classes, the so-called ‘kernel trick’ is utilized to indirectly project the features into a very high-dimensional space. These notions form the basis for Support Vector Machines. A multi-class formulation of Support Vector Machines is used when
56
P. Baby Maruthi and J. Manchiraju
Fig. 2 Flow diagram of the classification algorithms
the number of classes is greater than two. The choice of Kernel (‘linear’, ‘polynomial’, ‘sigmoid’, ‘radial basis function’), the Kernel hyperparameters, and the margin tradeoff term (‘C’) form the hyperparameters of SVM classifier. • Extreme Gradient Boosting (XGBoost) is a leading machine learning algorithm which can be used to solve classification problems. It is influenced by Gradient Boosting framework and is designed to be used for large datasets. The process
5 Prediction of Children Age Range Based on Book Synopsis
57
called boosting is used to improve the results. Gradient-Boosted Decision Trees (GBDT) build a model of multiple decision trees which are ensembled to obtain accurate results. • Multi-layer Perceptron (MLP)—A collection of neurons is organized into a layer. Neurons in each pair of adjacent layers are connected in a dense manner across the layers. These connections are associated with trainable parameters of the neural network called weights. A sequence of layers connected in this manner is referred to as a Multi-layer Perceptron. The input features are presented at the first layer of the MLP. The transformed features are multiplied by the weights between the layers. The output from the last layer is considered the model prediction. The gap between prediction and ground-truth, called the loss, is used to optimize the MLP weights using a procedure called backpropagation. The layers in MLP, other than the input and output layer, are referred to as hidden layers. The crucial hyperparameters in MLP are the number of hidden layers and the number of neurons in each hidden layer. Other hyperparameters include the choice of activation function, the optimizer used, and the learning rate.
4 Results The performance of machine learning model can be evaluated in the following subsections.
4.1 KNN The results of KNN can be evaluated by using F1-score and it is shown in Table 2. The F1-score is defined as harmonic mean of Precision and Recall, i.e., F1-score =
2 × Precision × Recall . Precision + Recall
Table 2 Validation set F1-scores for KNN classifier KNN
Best K with respect to validation set
Handcrafted
F1-score on validation set
F1-score on test set
11
0.464
0.494
Word2Vec
1
0.692
0.648
Doc2Vec
1
0.536
0.475
21
0.679
0.629
BERT
58
P. Baby Maruthi and J. Manchiraju
Precision is the ratio of True-Positive predictions (TP) to the total number of positive predictions (TP + FP), and Recall can be defined as the ratio of True-Positive predictions (TP) to the total number of positive samples (TP + FN). The KNN model is evaluated on the test set with best hyperparameter values retrieved from Table 3. The results in Table 3 show that the best performance for KNN classifier is obtained using word2Vec features and with K set to 1.
4.2 SVM The F1-scores of SVM classifier for various features on validation set are given in Tables 4, 5, 6, 7, and 8 listed in this section. The SVM classifier is validated using the parameters of Kernel taking the values of ‘RBF’, ‘Polynomial’, and ‘Sigmoid’. For each value of Kernel, the parameters of C and Gamma are varied. The tables portray the F1-scores obtained for varying parameters of Kernel, C, and Gamma. The results in Table 8 show that the best performance for SVM classifier is obtained using word2Vec features and with hyperparameters of (Kernel, C, Gamma) set to (‘RBF’, 10, 1).
4.3 XGBoost The results showed that the best performance on validation set for XGBoost classifier is obtained using wor2Vec and BERT features in Table 9.
4.4 MLP The results showed that the best performance for MLP classifier is obtained using word2Vec features and with neurons per hidden layer set to [225, 150, 75, 6] and find the observations in Table 10.
5 Conclusion and Future Recommendations The prediction of the children age range based on book synopsis started by determining the three different age range categories between 0 and 5, 6 and 10, and 11 and 18. As the first stage of evaluation, various combinations of features along with classifiers and selected range of hyperparameters were trained on training set. The combination along with hyperparameters which provided the best performance on validation set was noted. To determine the best combination of features and
1
0.435
0.692
0.536
0.650
K
Handcrafted
Word2Vec
Doc2Vec
BERT
0.629
0.515
0.667
0.443
3
0.620
0.502
0.667
0.426
5
Table 3 Test set F1-score for KNN classifier
0.667
0.502
0.629
0.456
7
0.671
0.489
0.654
0.435
9
0.658
0.468
0.675
0.464
11
0.654
0.468
0.662
0.460
13
0.654
0.456
0.641
0.460
15
0.662
0.447
0.667
0.451
17
0.667
0.447
0.646
0.447
19
0.679
0.443
0.684
0.426
21 F1-score
5 Prediction of Children Age Range Based on Book Synopsis 59
60
P. Baby Maruthi and J. Manchiraju
Table 4 F1-scores of SVM classifier with RBF Kernel, C = 0.1 (Kernel, C, Gamma)
(‘rbf’, 0.1, 1)
(‘rbf’, 0.1, 0.1)
(‘rbf’, 0.1, 0.01)
(‘rbf’, 0.1, 0.001)
Handcrafted
0.435
0.435
0.439
0.498
word2Vec
0.662
0.435
0.435
0.435
Doc2Vec
0.435
0.435
0.624
0.435
BERT
0.435
0.688
0.684
0.435
Table 5 F1-scores of SVM classifier with RBF Kernel, C = 1 (Kernel, C, Gamma)
(‘rbf’, 1, 1)
(‘rbf’, 1, 0.1)
(‘rbf’, 1, 0.01)
(‘rbf’, 1, 0.001)
Handcrafted
0.447
0.460
0.464
0.515
word2Vec
0.717
0.692
0.435
0.435
Doc2Vec
0.439
0.460
0.734
0.726
BERT
0.692
0.709
0.679
0.684
Table 6 F1-scores of SVM classifier with RBF Kernel, C = 100 (Kernel, C, Gamma)
(‘rbf’, 100, 1)
(‘rbf’, 100, 0.1)
(‘rbf’, 100, 0.01)
(‘rbf’, 100, 0.001)
Handcrafted
0.443
0.426
0.388
0.481
word2Vec
0.734
0.722
0.738
0.692
Doc2Vec
0.439
0.468
0.717
0.658
BERT
0.700
0.688
0.709
0.700
Table 7 F1-scores of SVM classifier with polynomial Kernel
XGBoost
F1-score on validation set
F1-score on test set
Handcrafted
0.608
0.579
word2Vec
0.713
0.704
Doc2Vec
0.633
0.648
BERT
0.759
0.679
Table 8 F1-scores obtained on test set for SVM classification (Kernel, Degree)
(Polynomial, 1)
(Polynomial, 2)
(Polynomial, 3)
(Polynomial, 4)
(Polynomial, 5)
Handcrafted
0.506
0.515
0.519
0.527
0.532
word2Vec
0.713
0.730
0.738
0.717
0.717
Doc2Vec
0.722
0.734
0.667
0.506
0.464
BERT
0.679
0.692
0.684
0.688
0.700
5 Prediction of Children Age Range Based on Book Synopsis
61
Table 9 F1-scores obtained for XGBoost classifier SVM
Best Parameters with respect to validation set
F1-score on validation set
F1-score on test set
Handcrafted
Kernel = ‘Polynomial’, Degree = 5
0.532
0.487
Word2Vec
Kernel = ‘RBF’ C = 10 Gamma = 1
0.743
0.752
Doc2Vec
Kernel = ‘RBF’ C = 1 Gamma = 0.01
0.734
0.686
BERT
Kernel = ‘RBF’ C = 1 Gamma = 0.1
0.709
0.676
Table 10 F1-scores obtained for MLP classifier over test set MLP
Best hidden layer configuration
F1-score on validation set
F1-score on test set
Handcrafted
[2, 6, 6, 9]
0.489
0.428
word2Vec
[225, 150, 75, 6]
0.726
0.742
Doc2Vec
[150, 75, 45]
0.684
0.660
BERT
[576, 192, 115]
0.700
0.676
classification models, which are handcrafted, word2vec, doc2vec, and BERT were considered and finding out the comparison study. Classification models such as KNN, SVM, LGBM, XGBoost, MLP were considered. In the second stage of evaluation, the training and validation sets were merged. The classifiers were retrained on the test set with the best setting of hyperparameters determined using the validation set. The resulting model was evaluated using the test set. The analysis proved that using word2vec as features and SVM as classifier with Radial Basis Function Kernel provides the best performance for the task of age range prediction. In this paper, age range prediction was considered as a classification task. The set of all age ranges was grouped into three distinct age range classes. In future, the problem could be modeled as a multi-regression classification wherein the lower and upper ages are predicted simultaneously as numerical quantities.
References 1. Puritat K, Intawong K (2020) Development of an open source automated library system with book recommendation system for small libraries. In: 2020 Joint International conference on digital arts, media and technology with ECTI Northern section conference on electrical, electronics, computer and telecommunications engineering (ECTI DAMTNCON), pp 128–132. https://doi.org/10.1109/ECTIDAMTNCON48261.2020.9090753
62
P. Baby Maruthi and J. Manchiraju
2. Nguyen D, Smith NA, Rose C (2011) Author age prediction from text using linear regression. In: Proceedings of the 5th ACL-HLT workshop on language technology for cultural heritage, social sciences, and humanities, pp 115–123 3. Pentel A (2015) Automatic age detection using text readability features. In: EDM (Workshops) 4. Schapire RE (2013) Explaining Adaboost. In: Empirical inference. Springer, pp 37–52. Siddiqui H, Siddiqui S, Rawat M, Maan A, Dhiman S, Asad M (2021) Text summarization using extractive techniques. In: 2021 3rd International conference 5. Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Philip SY et al (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14:1–37 6. Marquardt J, Farnadi G, Vasudevan G, Moens M-F, Davalos S, Teredesai A, de Cock M (2014) Age and gender identification in social media. In: Proceedings of CLEF 2014 evaluation labs, vol 1180, pp 1129–1136 7. Blandin A, Lecorvé G, Battistelli D, Étienne A (2020) Age recommendation for texts. In: Language resources and evaluation conference (LREC) 8. Chen Y, Zhang Z (2018) Research on text sentiment analysis based on CNNs and SVM. In: 2018 13th IEEE conference on industrial electronics and applications (ICIEA), pp 2731–2734. https://doi.org/10.1109/ICIEA.2018.8398173 9. Zhang Y, Rao Z (2020) n-BiLSTM: BiLSTM with n-gram features for text classification. In: 2020 IEEE 5th information technology and mechatronics engineering conference (ITOEC), pp 1056–1059. https://doi.org/10.1109/ITOEC49072.2020.9141692 10. Yao T, Zhai Z, Gao B (2020) Text classification model based on FastText. In: 2020 IEEE International conference on artificial intelligence and information systems (ICAIIS), pp 154– 157. https://doi.org/10.1109/ICAIIS49377.2020.9194939 11. Liu C, Sheng Y, Wei Z, Yang Y-Q (2018) Research of text classification based on improved TF-IDF algorithm. In: 2018 IEEE International conference of intelligent robotic and control engineering (IRCE), pp 218–222. https://doi.org/10.1109/IRCE.2018.8492945 12. Wang Z, Liu J, Sun G, Zhao J, Ding Z, Guan X (2020) An ensemble classification algorithm for text data stream based on feature selection and topic model. In: 2020 IEEE International conference on artificial intelligence and computer applications (ICAICA), pp 1377–1380. https://doi. org/10.1109/ICAICA50127.2020.9181903 13. Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: Proceedings of the 31st International conference on international conference on machine learning, vol 32, ICML’14. JMLR.org, pp. II–1188–II–1196 14. Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (n.d.) Distributed representations of words and phrases and their compositionality
Chapter 6
Predicting Credit Card Churn Using Support Vector Machine Tuned by Modified Reptile Search Algorithm Marko Stankovic , Luka Jovanovic , Vladimir Marevic , Amira Balghouni , Miodrag Zivkovic , and Nebojsa Bacanin
1 Introduction Churn is a major concern for businesses, as it can significantly impact a company’s revenue and profitability. The ability to retain customers is a major challenge for banks, as customer churn can lead to significant revenue losses. Perceived value, trust, and customer satisfaction are the major drivers of customer loyalty, and a lack of these factors leads to increased churn. To attract new customers banks often offer welcome bonuses coupled with several other benefits when opening a new account or a new credit card. However, this is not a perfect system, as customers have found ways of abusing the system in a practice dubbed card churning. Malicious users would open several credit cards to gain benefits, only to close them immediately once the bonuses expire, negatively impacting credit providers. What is especially alarming is that one of the main factors that undermine a company’s enterprise value is attrition [15]. Past research suggests that acquiring new customers to offset the loss M. Stankovic (B) · L. Jovanovic · V. Marevic · M. Zivkovic · N. Bacanin Singidunum University, Danijelova 32, 11010 Belgrade, Serbia e-mail: [email protected] L. Jovanovic e-mail: [email protected] V. Marevic e-mail: [email protected] M. Zivkovic e-mail: [email protected] N. Bacanin e-mail: [email protected] A. Balghouni Modern College of Business and Science, Muscat, Oman e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_6
63
64
M. Stankovic et al.
of existing ones would be financially destructive [27]. Therefore, a major concern for companies is preventing customer churn or at least reducing it. Due to immense volumes of data generated in the financial sector every day novel [12], automated and optimized techniques are needed to detect and handle potential malicious customer practices. Due to the increased availability of affordable computational power, the application of artificial intelligence (AI) is a promising approach. Novel AI algorithms are capable of responding to changing environments and adapting solutions to complex problems without explicitly programming [5, 36, 37]. By gaining experience from observed data, AI algorithms are capable of detecting subtle patterns in available indicating malicious behavior. A popular approach for tackling complex data-driven tasks is the use of machine learning (ML) algorithms, a subgroup of AI that simulates learning processes using discreet computing machines. Many algorithms have been developed for tackling different challenges. Experimentation is needed to determine the best approach suited to a specific task, given that no solution works best for all tasks as the no free lunch theorem states (NFL) [30]. A promising ML approach for tackling big data problems, such as those faced by credit card companies and banks, is the application of support vector machine(SVM) [14]. However, their potential has not yet been fully explored and applied to detecting credit card churn. This work aims to address this research gap. Most novel ML techniques are designed to attain good general performance. As a trade-off, algorithms present sets of parameters that control the internal workings of the algorithms. These hyperparameters require adequate adjustment, to attain acceptable performance when matched to a specific problem. This process is known as hyperparameter optimization [13]. The ensure the best possible performance of SVMs applied to detecting credit card churn, hyperparameter tuning has been performed using a state-of-the-art metaheuristic algorithm. Metaheuristic algorithms and popular among researchers for their relative simplicity and admirable performance when addressing optimization problems. A popular approach when tackling hyperparameter tuning is to formulate the process as a maximization task and apply a metaheuristic algorithm as an optimizer [17, 19]. A novel reptile search algorithm (RSA) [2] has shown great potential applied to optimization tasks. However, the performance of this algorithm has yet to be explored when addressing SVM hyperparameter optimization applied to credit card churn detection. Furthermore, potential improvements for the original RSA are explored through a novel modified RSA algorithm proposed in this work. This paper’s contributions to academia may be stated as follows: • An introduction of a novel SVM-based approach for predicting credit card churn • A proposal for a modified RSA-based metaheuristic algorithm tacking hyperparameter optimization of an SVM • The application of the proposed modified RSA-SVM approach for tackling a pressing economic issue of credit card churn on real-world data. The rest of this academic paper is structured per the following: Sect. 2 discusses research tackling similar issues and presents preceding works. In Sect. 3 the intro-
6 Predicting Credit Card Churn Using Support Vector Machine Tuned …
65
duced approach is elaborated on in depth. Subsequently, in Sect. 4 experiments used for evaluation are described, and the metrics and utilized dataset are presented. The outcomes are demonstrated in Sect. 4.3 and Sect. 5 gives a few closing remarks on the work and presents prospective subsequent research.
2 Background and Related Works More businesses are discovering the advantages of big data and adopting technology that enables them to comprehend a variety of sources, especially in an era where information is generated at a staggering pace and volumes [25]. Recognizing the customers most likely to leave the service might also be a significant additional source of revenue if addressed early enough [24]. The application of AI in the financial sector has seen great success in recent years. The availability of affordable computational resources has caused a rapid increase in the capabilities of AI, and the financial sector has reaped many benefits. Several approaches have been proposed by researchers for stock-related forecasting [9, 18], mitigating cryptocurrency fluctuations [28], predicting changes in oil prices [16], and the results indicated a great potential for the application of AI in the financial sector. However, there is a research gap present when considering the application of SVM to predicting credit card churn. Furthermore, the optimization potential of the recently proposed RSA has yet to be fully explored. In this work, we attempt to tackle this research gap.
2.1 Support Vector Machine (SVM) A favored ML method predominantly applied to solve binary classification tasks is the SVM [26]. Samples from specific classes are segregated via decision boundaries called hyperplanes, provided that the dataset contains linearly separable samples. The ideal hyperplane partitions datasets by correctly assigning all samples of a particular class to one side of the hyperplane, while minimizing the margin within the range of the decision boundary and the closest-positioned members for every category on either side of the hyperplane, also known as support vectors. Given a training set with n instances, populated with vectors xi ∈ R d , where d represents the dimensionality of feature space, i = 1, 2, ..., n, while the corresponding category identifiers for each sample yi take on a value of either −1 or 1, the hyperplanes positioned on the boundary of each class are mathematically modeled in Eq. (1): (w · xi + b) = 1 (1) (w · xi + b) = −1
66
M. Stankovic et al.
Minimizing the parameter w will result in greater separation of these hyperplanes, which is crucial in producing the optimal hyperplane. Each sample must exist outside the region bounded by two hyperplanes specified in Eq. (2): yi (w · xi + b) ≥ 1 for 1 ≤ i ≤ n.
(2)
For real-world datasets with outliers and incorrectly labeled instances, the given model is effectively worthless because it does not tolerate examples inside the margin or improperly classified samples. This theoretical paradigm called soft margin [11] is adapted to practical applications by introducing the notion of inexact classification, defined in Eq. (3): yi (w · xi + b) ≥ 1 − i , i ≥ 0, 1 ≤ i ≤ n
(3)
where i controls the offset of proper variable classification. The SVM model employing soft margin methodology is produced by solving Eq. (4): 1 w2 + C i 2 i=1 n
(4)
in which C denotes the hyperparameter implemented to control the impact of mislabeled samples by introducing a penalty value. If samples in the training set are to be correctly labeled, the decision boundary may need to be quite near one specific class. The decision boundary is overly vulnerable to noise and to slight variations in the independent variables in this scenario, though, thus accuracy on the test dataset may be reduced. In contrast, a decision boundary could be positioned as close to each class as feasible at the risk of certain instances that are incorrectly classified. The C parameter determines how this trade-off is handled. By utilizing the kernel approach, SVM has the ability to categorize even more complex datasets. In certain circumstances, kernel functions are used to transform data points that are not linearly separable into linearly separable ones by mapping the data to a feature space of higher dimensions. The radial basis function (RBF) is one of many functions that are frequently used: 2 K xi , x j = exp −γ xi − x j
(5)
in which γ is another hyperparameter that governs the contribution of each sample to the SVM model, significantly affecting its accuracy. The focus of this research will be the tuning of SVM hyperparameters C and γ . Obtaining individual parameter values is insufficient; instead, the best pair of values for each dataset under consideration has to be discovered. However, this task cannot be undertaken using a deterministic approach due to the boundless nature of the parameters’ search space. Grid search is one of the fundamental and most straightforward strategies for configuring SVM parameters: SVM configurations are
6 Predicting Credit Card Churn Using Support Vector Machine Tuned …
67
produced for various ordered pairs (C, γ ) where the parameters have preset potential values. Finally, the SVM model with the highest degree of accuracy is selected. This can serve as the preliminary step for a pursuit of the ordered pair’s desired values, but as classification accuracy is a multi-modal function, a more advanced search method, such as swarm intelligence metaheuristics, is required.
2.2 Swarm Intelligence A popular subgroup of metaheuristics algorithms that usually takes inspiration from groups cooperating in nature is swarm intelligence. These algorithms enable sophisticated patterns to arise on a global search scale by creating comparatively simple rule sets that regulate the activities of a population. Furthermore, swarm intelligence algorithms retain great flexibility and are even capable of addressing tasks considered NP-hard, something considered impossible using traditional deterministic methods. Since these algorithms have an inherent metaheuristic nature, it is unlikely an optimal solution will be attained in just one iteration. Nevertheless, with each subsequent execution, the likelihood of finding the true optimum increases. Swarm intelligence algorithms have drawn inspiration from numerous sources, with notable instances including the artificial bee colony (ABC) [20], firefly algorithms (FA) [35] that take their inspiration from forging and reproductive behaviors observed in groups of insects. However, not all algorithms are necessarily inspired by nature with notable exceptions being the sine cosine algorithms (SCA) [22] and the arithmetic optimization algorithm (AOA) [3] that draw inspiration from abstract mathematical concepts. It is crucial to emphasize the importance of experimentation when working with metaheuristic algorithms, as defined by the NFL [30] no solution works equally well for all problems. Hence, continuous experimentation is necessary to identify problem-algorithm pairs that yield optimal performance. The demonstrated capabilities of swarm intelligence algorithms, along with their capacity to handle NP-hard problems using reasonable computational resources and within practical time constraints, have established them as a favored option among researchers when confronted with challenging optimization tasks. Some notably interesting proposals with promising results have been made for medical and healthcare applications [23, 29], optimizations of wireless sensor networks [6, 7], cloud computation [8, 10, 31–34] as well as tackling problems related to the recent COVID19 pandemic [1, 38].
68
M. Stankovic et al.
3 Proposed Method 3.1 Original Reptile Search Algorithm Crocodiles’ intelligent predatory behavior, which is centered on surrounding and then striking the prey, served as the primary source of inspiration for the RSA [4]. Optimization begins by populating matrix X with stochastic agents xi, j according to Eq. (6), where i indexes a solution, j denotes its momentary location, N represents the population size, while n signifies the dimension of the specific challenge: ⎡
x1,1 · · · ⎢ x2,1 · · · ⎢ ⎢ ··· ··· ⎢ X =⎢ . .. ⎢ .. . ⎢ ⎣ x N −1,1 · · · x N ,1 · · ·
x1, j x2, j xi, j .. .
x1,n−1 ··· ··· .. .
x N −1, j · · · x N , j x N ,n−1
x1,n x2,n ··· .. .
⎤
⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ x N −1,n ⎦ x N ,n
(6)
Equation (7) is used to generate these solutions at random. Here, rand denotes a randomly generated number within the specified range [0, 1], LB denotes the lower boundary, and UB is the upper boundary of the problem in question. xi j = rand × (UB − LB) + LB,
j = 1, 2, . . . , n
(7)
The search technique is comprised of two primary processes (encircling and hunting). These can be vied as four distinct behaviors to underline the importance of exploration and exploitation. Two walking techniques are used during exploration: elevated walking and stomach walking. The main objectives of this phase are to expand the search scope and facilitate the subsequent hunting phase. The upright walking strategy comes into effect when t ≤ T4 , while the stomach walking behavior is conditioned on t > T4 and t ≤ 2 T4 . Equation (8) updates a crocodile’s location: x(i, j) (t + 1) =
Best j (t) × −η(i, j) (t) × β − R(i, j) (t) × rand, t ≤ Best j (t) × x(r1 , j) × ES(t) × rand, t> η(i, j) = Best j (t) × P(i, j)
T 4 T 4
and t ≤ 2 T4 (8) (9)
where Best j is the current best solution in position j, t signifies the current iteration, and T denotes the maximum iterative count. The hunting operator η(i, j) is provided in Eq. (9), where β denotes a sensitive parameter locked at 0.1 The search area is narrowed down by employing a reduction function, as per Eq. (10):
6 Predicting Credit Card Churn Using Support Vector Machine Tuned …
R(i, j) =
Best j (t) − x(r1 , j) Best j (t) +
69
(10)
where r1 denotes a random number in the range [1, N], xr 1, j is the ith’s solution random location, while represents some small value. Equation (11) is utilized to determine evolution likelihood, referred to as “Evolutionary Sense” that stochastically shifts from −2 to 2 as iterations progress: 1 ES(t) = 2 × r2 × 1 − T
(11)
where r2 signifies a stochastic number in the range [−1, 1]. Equation (12) calculates the relative difference in the percent of the corresponding locations of the current and best-obtained solution: P(i, j) = α +
x(i, j) − M (xi ) Best j (t) × UB( j) − LB( j) +
(12)
where α denotes the sensitive parameter, preset to 0.1, that controls the fluctuation between potential solutions for the hunting cooperation. The corresponding limits of the jth location are denoted as UB( j) for the upper and LB( j) for the lower limit. The average location M(xi ) of the ith solution is described in Eq. (13). M (xi ) =
n 1 x(i, j) n j=1
(13)
The exploitation phase of the RSA is founded on hunting coordination (when t ≤ 3 T4 and t > T2 ) and cooperation (when t ≤ T and t > 3 T4 ) strategies, with the aim of intensifying the local examination of the search area and approaching the best potential solution. The hunting behavior of crocodiles is given in Eq. (14).
t ≤ 3 T4 and t > T2 t ≤ T and t > 3 T4 (14) RSA exhibits the O(N × (T × D + 1)) time complexity, where N denotes the number of potential solutions, T represents the iterations count, and D is the size of solution space. x(i, j) (t + 1) =
Best j (t) × P(i, j) (t) × rand, Best j (t) − η(i, j) (t) × − R(i, j) (t) × rand,
3.2 Modified RSA Despite the admirable performance of the original RSA algorithm, following extensive testing using CEC benchmark functions [21], it has been observed that in certain executions of the original RSA, a tendency to overly focus on less promising regions
70
M. Stankovic et al.
within the problem area can be observed. This suggests a lack of exploration power in the base algorithm, as well as that further improvement on the already admirable performance, is possible. To overcome this limitation, an extra parameter called the trial parameter is assigned to every potential solution in the population. If a solution fails to improve in a particular iteration, the trial parameter is incremented. When the trial parameter surpasses a predefined thr eshold, the solution is replaced with a new solution generated pseudo-randomly using the same method as in the initialization phase or the original RSA algorithm. This modified version of the RSA algorithm is referred to as MRSA (Modified RSA). The pseudocode for the novel MRSA approach is given in Algorithms 1. Algorithm 1 Pseudocode of the MRSA Parameter (α , β , etc.) initialization. Stochastic generation of solutions X , where i = 1, ..., N . Assign trial parameter to each solution in X Defined thr eshold parameter value while t < T do Determine the candidate solutions’ quality. Increment trial parameter of solutions that have not improved for k in N do if trial of solution k > thr eshold then Replace solution k with a new stochastic-generated solution end if end for Store current best-performing agent. Update Evolutionary Sense as per Eq.11. Exploration and exploitation stage for ain N do for binn do Update values η, R, P as per Eq. 9, 10, 12, respectively. if t ≤ 2 T4 then Apply Elevated walking strategy else if t > T4 and t ≤ 2 T4 then Apply Stomach walking strategy else if t ≤ 3 T4 and t > T2 then Apply Hunting coordination strategy else Apply Hunting cooperation strategy end if end for end for Increment t end while Return the best agent.
4 Experiments and Discussion This section presents the utilized dataset, experimental setup, solution encoding, and evaluation metrics, followed by the attained results and their discussion.
6 Predicting Credit Card Churn Using Support Vector Machine Tuned …
71
Fig. 1 Class distribution pie chart and features heatmap
4.1 Dataset Description To assess the proposed approach and evaluate the improvements made by the novel methods a publicly available credit card churn dataset1 was acquired on the December 17, 2022 has been used. Four different credit card categories are available to customers: blue, silver, gold, and platinum. Clients who choose to switch banks are classified as churn customers. Overall, there are 10,127 clients, of which 1627 are churning ones, making the dataset highly class-imbalanced. The class distribution is shown in Fig. 1. Twenty features make up the original dataset: 1 dependent (attrition flag) and 19 independent.2 The correlations between the features in the original dataset can be seen in Fig. 1. To adapt the original dataset to the proposed SVM methods required processioning is performed. The variables were first separated into continuous and categorical. Categorical variables have been covered in numerical values through label encoding and later one-hot encoded. The resulting dataset used in this research presents a total of 36 utilized features.
4.2 Experimental Setup and Solutions Encoding The SVM parameters optimized include C, γ as well as the type of utilized kernel. Accordingly, each potential swarm algorithm solution is encoded with a length of three, where the first element represents the SVM C parameter, the second represents the SVM γ parameter, and the final parameter is the type of utilized kernel. The 1 2
https://leaps.analyttica.com. https://leaps.analyttica.com.
72
M. Stankovic et al.
values for the C and γ parameters are continuous, while the latter parameter is an integer, making the task of selecting optimal hyperparameter values a mixed NP-hard optimization problem. The value of C is selected form a range of [0.1, 100], and for gamma the range is [0.0001, 10]. The kernel function has four potential values in the range of [0, 3] representing kernel types. Value 0 signifies a polynomial (poly), value 1 indicated a radial basis function (RBF), 2 signifies the os of a sigmoid, and finally, 3 denotes a linear kernel type. During testing the dataset was divided in two, with the former 70% utilized for training, and the latter 30% used for testing. The fitness of each solution is evaluated on the training set, and once the best solution is selected, its performance is evaluated on training data which represents the final results of the run. The proposed novel MRSA algorithms have been subjected to a comparative analysis with other contemporary metaheuristic algorithms including the original RSA [4], ABC [20], FA [35] as well as the SCA [22]. All the metaheuristics have been independently implemented for this research. Furthermore, a baseline SVM implementation has been as well and included in the comparative analysis. The implementations have been done in Python, using supporting libraries including Pandas, Numpy, and Sklearn. The objective function used to guide the optimization process was Cohen’s kappa score κ due to the function’s super ability to handle imbalanced datasets such as the one utilized for this research. The method for determining the Cohen’s kappa score κ is shown in Eq. (15) 1 − po po − pe =1− (15) k= 1 − pe 1 − pe where po represents an observed value while pe is the expected. By utilizing Cohee’s kappa score the optimization problem can be formulated as a maximization problem. Apart from the objective function score, additional metrics are utilized to better demonstrate the improvements made. These additional metrics include the error rate, precision, recall, F1-score, receiver operating characteristic curve (ROC), area under the ROC (AUC), and precision-recall AUC (PR-AUC). During the optimization process, each tested metaheuristic was assigned a population of 10 agents and limited to 15 optimization iterations to improve potential solutions. To overcome the randomness inherent in metaheuristics algorithms the optimization process is repeated over five independent runs.
4.3 Results and Discussion The results in Table 1 show the best, worst, mean, and median results for the objective function over 5 independent runs. Additionally, Table 1 shows the selected C and γ values as well as the selected best-performing kernel types.
6 Predicting Credit Card Churn Using Support Vector Machine Tuned …
73
Table 1 Overall objective functions values over 5 independent runs Method
Best
Worst
Mean
Median
Std
Var
C
γ
kernel
SVMMRSA
0.656526
0.611919
0.629577
0.615213
0.020361
0.000415
0.100000
0.017755
1
SVM-RSA
0.647604
0.611919
0.631576
0.639867
0.014945
0.000223
41.666351
0.019366
1
SVM-ABC 0.641074
0.611919
0.625996
0.628591
0.012168
0.000148
100.000000 0.010784
1
SVM-FA
0.636478
0.619211
0.628291
0.630829
0.007527
0.000057
1.242665
0.023652
1
SVM-SCA
0.627383
0.611919
0.617522
0.614787
0.006069
0.000037
71.682387
0.019977
1
Baseline SVM
0.598945
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
Table 2 Overall error values over 5 independent runs Method
Best
Worst
Mean
Median
Std
Var
C
γ
kernel
SVMMRSA
0.084430
0.095395
0.090789
0.094298
0.005208
0.000027
0.100000
0.017755
1
SVM-RSA
0.086623
0.095395
0.090132
0.087719
0.003886
0.000015
41.666351
0.019366
1
SVM-ABC 0.086623
0.095395
0.091228
0.089912
0.003563
0.000013
100.000000 0.010784
1
SVM-FA
0.088816
0.089912
0.089254
0.088816
0.001641
0.000003
1.242665
0.023652
1
SVM-SCA
0.088816
0.095395
0.092325
0.092105
0.002721
0.000007
71.682387
0.019977
1
Baseline SVM
0.096491
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
Similarly, Table 2 shows the best, worst, mean, and median results for the objective error over 5 independent runs, as well as the selected C and γ values and selected best-performing kernel types. As shown in Table 1 as well as Table 2 the novel proposed MRSA metaheuristic attained the best results in the best execution run, while the original RSA metaheuristic attained the best results in the median and mean results. The best performance in the worst run where attained by the FA algorithm. Interestingly, all metaheuristics selected kernel type 1 (RBF) as the best performing. Furthermore, it is important to note that compared to the baseline SVM all tuned methods attained significantly better results. Detailed metrics for each tested metaheuristic are shown in Table 3. The results shown in Table 3 indicate that the novel-introduced MRSA attained the best results on the accuracy, precision-0, weighted average precision, recall 1, weighted average recall, and all F1-scores, only being slightly outperformed by the SCA algorithms of precision 1 and recall 0 metrics. A visual comparison of the objective function and error convergence as well as box plots are shown in Fig. 2. As shown in Fig. 2 the chances in the introduced MRSA positively impact objective and error convergence rates. Further graphics for ROC AUC and PR-AUC of the introduced MRSA algorithm are provided in Fig. 3.
74
M. Stankovic et al.
Table 3 Detailed metrics attained by each tested metaheuristic SVMSVM-RSA SVM-ABC SVM-FA MRSA Accuracy (%) Precision 0 Precision 1 W.Avg. Precision Recall 0 Recall 1 W.Avg. Recall F1-score 0 F1-score 1 W.Avg. F1-score
SVM-SCA
Baseline SVM
91.55700
91.33770
91.33770
91.11840
91.11840
77.63160
0.932246 0.8000000 0.911075
0.930991 0.791304 0.908629
0.927771 0.807339 0.908491
0.928661 0.787611 0.906080
0.924411 0.80952 0.906019
0.838554 0.146341 0.727739
0.9699740 0.630137 0.915570
0.968668 0.623288 0.913377
0.972585 0.602740 0.913377
0.968668 0.609589 0.911184
0.97389 0.582192 0.911184
0.908616 0.082192 0.776316
0.950736 0.704981 0.911393
0.949456 0.697318 0.909092
0.949649 0.690196 0.908114
0.948243 0.687259 0.906462
0.948506 0.677291 0.905088
0.872180 0.105263 0.749406
Fig. 2 Objective function and error convergence and box plots
6 Predicting Credit Card Churn Using Support Vector Machine Tuned …
75
Fig. 3 Introduced MRSA methods ROC AUC and PR-AUC curves
5 Conclusion The research put forth tackles a complex and pressing economic issue faced by banks and credit companies internationally known as credit card churn. To address this difficult task as well as several research gaps present in this domain, the popular SVM algorithm has been tasked with detecting potential cases of credit card churn on a real-world dataset. To ensure the best possible performance of SVMs a recently introduces metaheuristic RSA is applied to selecting optimal hyperparameter values. Additionally, a novel modified RSA metaheuristic is introduced to further improve the admirable performance of the original. Furthermore, tuned approaches attained better results over the baseline SVM implementation. However, it is important to note that this research only evaluated the potential of SVM applied to this specific problem and that other methods may exist that present even better performance. Future work will focus on further exploring the potential of different approaches applied to this real-world dataset as this task presents a pressing economic issue. Taking into account their great potential when dealing with large datasets with many features, hybrid deep learning models between convolutional neural networks (CNNs) and traditional models, e.g., Extreme Gradient Boosting (XGboost), SVM, etc. will be applied. Additionally, for results interpretation, Shapley Additive Values (SHAP), Shapley Additive Global importance (SAGE), and Local interpretable model-agnostic explanations (LIME) may be investigated.
References 1. Abdulkhaleq MT, Rashid TA, Hassan BA, Alsadoon A, Bacanin N, Chhabra A, Vimal S (2022) Fitness dependent optimizer with neural networks for covid-19 patients. In: Computer methods and programs in biomedicine update, p 100090 2. Abualigah L, Abd Elaziz M, Sumari P, Geem ZW, Gandomi AH (2022) Reptile search algorithm (RSA): a nature-inspired meta-heuristic optimizer. Expert Syst Appl 191:116158 3. Abualigah L, Diabat A, Mirjalili S, Abd Elaziz M, Gandomi AH (2021) The arithmetic optimization algorithm. Comput Meth Appl Mech Eng 376:113609
76
M. Stankovic et al.
4. Abualigah L, Elaziz MA, Sumari P, Geem ZW, Gandomi AH (2022) Reptile search algorithm (RSA): a nature-inspired meta-heuristic optimizer. Expert Syst Appl 191(116158):116158 5. AlHosni N, Jovanovic L, Antonijevic M, Bukumira M, Zivkovic M, Strumberger I, Mani JP, Bacanin N (2022) The xgboost model for network intrusion detection boosted by enhanced sine cosine algorithm. In: International conference on image processing and capsule networks. Springer, pp 213–228 6. Bacanin N, Arnaut U, Zivkovic M, Bezdan T, Rashid TA (2022) Energy efficient clustering in wireless sensor networks by opposition-based initialization bat algorithm. In: Computer networks and inventive communication technologies. Springer, pp 1–16 7. Bacanin N, Sarac M, Budimirovic N, Zivkovic M, AlZubi AA, Bashir AK (2022) Smart wireless health care system using graph LSTM pollution prediction and dragonfly node localization. Sustain Comput Inform Syst 35:100711 8. Bacanin N, Zivkovic M, Bezdan T, Venkatachalam K, Abouhawwash M (2022) Modified firefly algorithm for workflow scheduling in cloud-edge environment. Neural Comput Appl 34(11):9043–9068 9. Bacanin N, Zivkovic M, Jovanovic L, Ivanovic M, Rashid TA (2022) Training a multilayer perception for modeling stock price index predictions using modified whale optimization algorithm. In: Computational vision and bio-inspired computing. Springer, pp 415–430 10. Bezdan T, Zivkovic M, Bacanin N, Strumberger I, Tuba E, Tuba M (2022) Multi-objective task scheduling in cloud computing environment by hybridized bat algorithm. J Intell Fuzzy Syst 42(1):411–423 11. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297 12. Fang B, Zhang P (2016) Big data in finance. In: Big data concepts, theories, and applications. Springer, pp 391–412 13. Feurer M, Hutter F (2019) Hyperparameter optimization. In: Automated machine learning. Springer, Cham, pp 3–33 14. Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B (1998) Support vector machines. IEEE Intell Syst Appl 13(4):18–28 15. Jain H, Yadav G, Rajapandy M (2021) Churn prediction and retention in banking, Telecom and IT sectors using machine learning techniques, pp 137–156 16. Jovanovic L, Jovanovic D, Bacanin N, Jovancai Stakic A, Antonijevic M, Magd H, Thirumalaisamy R, Zivkovic M (2022) Multi-step crude oil price prediction based on LSTM approach tuned by Salp swarm algorithm with disputation operator. Sustainability 14(21):14616 17. Jovanovic L, Jovanovic G, Perisic M, Alimpic F, Stanisic S, Bacanin N, Zivkovic M, Stojic A (2023) The explainable potential of coupling metaheuristics-optimized-xgboost and shap in revealing vocs’ environmental fate. Atmosphere 14(1):109 18. Jovanovic L, Milutinovic N, Gajevic M, Krstovic J, Rashid TA, Petrovic A (2022) Sine cosine algorithm for simple recurrent neural network tuning for stock market prediction. In: 2022 30th telecommunications forum (TELFOR). IEEE, pp 1–4 19. Jovanovic L, Zivkovic M, Antonijevic M, Jovanovic D, Ivanovic M, Jassim HS (2022) An emperor penguin optimizer application for medical diagnostics. In: 2022 IEEE zooming innovation in consumer technologies conference (ZINC). IEEE, pp 191–196 20. Karaboga D, Basturk B (2008) On the performance of artificial bee colony (abc) algorithm. Appl Soft Comput 8(1):687–697 21. Li X, Tang K, Omidvar MN, Yang Z, Qin K, China H (2013) Benchmark functions for the cec 2013 special session and competition on large-scale global optimization. Gene 7(33):8 22. Mirjalili S (2016) SCA: a sine cosine algorithm for solving optimization problems. KnowlBased Syst 96:120–133 23. Prakash S, Kumar MV, Ram SR, Zivkovic M, Bacanin N, Antonijevic M (2022) Hybrid GLFIL enhancement and encoder animal migration classification for breast cancer detection. Comput Syst Sci Eng 41(2):735–749 24. Qureshi SA, Rehman AU, Qamar AM, Kamal A, Rehman A (2013) Telecommunication subscribers’ churn prediction model using machine learning. In: Eighth international conference on digital information management (ICDIM 2013), pp 131–136
6 Predicting Credit Card Churn Using Support Vector Machine Tuned …
77
25. Raguseo E (2018) Big data technologies: an empirical investigation on their adoption, benefits and risks for companies. Int J Inf Manage 38(1):187–195 26. Ranjan N, George S, Pathade P, Anikhindi R, Kamble S (2022) Implementation of machine learning algorithm to detect credit card frauds. Int J Comput Appl 184(1):17–20 27. Sabbeh SF (2018) Machine-learning techniques for customer retention: a comparative study. Int J Adv Comput Sci Appl 9 28. Stankovic M, Bacanin N, Zivkovic M, Jovanovic L, Mani J, Antonijevic M (2022) Forecasting ethereum price by tuned long short-term memory model. In: 2022 30th telecommunications forum (TELFOR). IEEE, pp 1–4 29. Stankovic M, Gavrilovic J, Jovanovic D, Zivkovic M, Antonijevic M, Bacanin N, Stankovic M (2022) Tuning multi-layer perceptron by hybridized arithmetic optimization algorithm for healthcare 4.0. Proc Comput Sci 215:51–60 30. Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput 1(1):67–82 31. Yadav MP, Pal N, Yadav DK (2021) Workload prediction over cloud server using time series data. In: 2021 11th international conference on cloud computing, data science and engineering (Confluence), pp 267–272. https://doi.org/10.1109/Confluence51648.2021.9377032 32. Yadav MP, Rohit, Yadav DK (2021) Maintaining container sustainability through machine learning. Cluster Comput 24(4):3725–3750 33. Yadav MP, Rohit, Yadav DK (2022) Resource provisioning through machine learning in cloud services. Arab J Sci Eng 47(2):1483–1505 34. Yadav MP, Yadav DK (2021) Workload prediction for cloud resource provisioning using time series data. In: Advances in intelligent systems and computing. Springer Singapore, Singapore, pp 447–459 35. Yang XS, Slowik A (2022) Firefly algorithm. In: Swarm intelligence algorithms. CRC Press, pp 163–174 36. Zivkovic M, Jovanovic L, Ivanovic M, Bacanin N, Strumberger I, Joseph PM (2022) Xgboost hyperparameters tuning by fitness-dependent optimizer for network intrusion detection. In: Communication and intelligent systems. Springer, pp 947–962 37. Zivkovic M, Jovanovic L, Ivanovic M, Krdzic A, Bacanin N, Strumberger I (2022) Feature selection using modified sine cosine algorithm with COVID-19 dataset. In: Evolutionary computing and mobile sustainable networks. Springer, pp 15–31 38. Zivkovic M, Petrovic A, Venkatachalam K, Strumberger I, Jassim HS, Bacanin N (2023) Novel chaotic best firefly algorithm: COVID-19 fake news detection application. In: Advances in swarm intelligence. Springer, pp 285–305
Chapter 7
Comparison of Deep Learning Approaches for DNA-Binding Protein Classification Using CNN and Hybrid Models B. Siva Jyothi Natha Reddy, Sarthak Yadav, R. Venkatakrishnan, and I. R. Oviya
1 Introduction Proteins having DNA-binding domains are known as DNA-binding proteins. They have an affinity, specific or general, for single-/double-stranded Deoxyribonucleic Acid (DNA). It recognizes single-/double-stranded DNA because it contains at least one structural motif. The relationship between secondary structural parts is described by a structural motif in a protein. DNA-binding proteins have transcription factors. These transcription factors attune the process of different types of polymerases, transcription, histones involved in packaging and transcription in the nucleus of the cell, and nucleases that sever DNA modules. Each transcription factor impedes the transcription of genes having their sequences near promoters by binding to a specific set of DNA. First, they bind directly or through other mediator proteins the TNA polymerase which oversees transcription. By doing this, they locate the polymerase at the promoter and permit it to start transcription. On other hand, Enzymes at the promoter that alter histones can bind to transcription factors. This modifies the availability of DNA templates to the polymerase. In this paper, the classification is done on a DNA sequence is a binding protein. When the number of DNA sequences increases exponentially, machine learning techniques are used to classify them. DNA is made up of four nucleotides, also called building blocks of DNA, which are Adenine (A), Thymine (T), Guanine (G), and Cytosine (C) [1]. DNA is either single-/double-stranded. These nucleotides form bonds with their complementary nucleotides only. Adenine forms bonds with B. Siva Jyothi Natha Reddy · S. Yadav · R. Venkatakrishnan · I. R. Oviya (B) Department of Computer Science and Engineering (Artificial Intelligence), Amrita School of Computing, Amrita Vishwa Vidyapeetham, Chennai, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_7
79
80
B. Siva Jyothi Natha Reddy et al.
Thymine whereas, Guanine forms bonds with Cytosine. Ribonucleic Acid (RNA) is also either single- or double-stranded. The difference between RNA and DNA is just one nucleotide. In RNA, Thymine (T) is replaced by Uracil (U). The sequence of nucleotides in RNA (A, G, C, U) and DNA (A, G, C, T) is called the genome [2]. For feature extraction, input to the CNN cannot be given by the raw DNA sequence. So, before being processed in the CNN it must be converted into a numerical representation. In classification accuracy, a significant role was played by the encoding method. One-hot encoding is one of the encoding methods which is used in this work. Every unique index value preserves the positional information by which each nucleotide in the DNA sequence is identified. Feature extraction is crucial for any model to gain good accuracy. Feature extraction, in ML models, is carried out manually. If data convolution rises, the manual feature selection may originate in several challenges like selecting features that do not result in the finest outcome and skipping out some necessary features. To encounter this challenge, automatic feature selection can be used. To retrieve key attributes from the raw dataset, CNN is among the finest DL techniques [3].
2 Literature Review In recent years, Convolutional Neural Networks (CNNs) and CNN-Bidirectional Long Short-Term Memory (CNN-Bi-LSTM) models have garnered attention in the field of computational biology for their ability to analyze and classify DNA sequences [4]. The utilization of these models is centered on the idea of capturing long-term dependencies between different motifs present in DNA sequences, which are crucial in understanding the functional and structural aspects of DNA. Data for these models is commonly obtained from large-scale projects such as the Encyclopedia of DNA Elements (ENCODE) project, and the results have shown high levels of accuracy. Specifically, sensitivity and specificity scores have been reported to reach 87.12% and 91.06%, respectively, while overall accuracy has been observed to be up to 89.19% [5]. In [6] Deep DBP-ANN achieved a train accuracy of 99.02% and test accuracy of 82.80%. On the other hand, Deep DBP CNN achieved a train accuracy of 94.32% and was able to excel at identifying test instances, with an accuracy of 84.31%. In order to further improve the accuracy of DNA sequence categorization, the application of K-Mer and Label encoding techniques in conjunction with CNNLSTM and CNN-Bi-LSTM architectures has been explored [7]. A study showed that the use of these techniques resulted in an accuracy of 93.16 and 93.13% for CNN-LSTM and CNN-Bi-LSTM models, respectively [5]. The utilization of hybrid models, such as CNN/RNN designs, for classification purposes, has also been investigated. However, these models were not found to outperform the traditional CNN-Bi-LSTM models [7]. To address this issue, a novel framework using the Pseudoinverse Learning Autoencoder (PILEA) algorithm has been proposed and was found to be computationally effective in the DNA sequence classification [8] eliminates the need for specifying the number of hidden layers and
7 Comparison of Deep Learning Approaches for DNA-Binding Protein …
81
learning control parameters during the training process, thus simplifying the process and reducing computational overhead. In Medical Image Processing, a machine learning-based hybrid CNN has been applied for identification of tumor. The paper used a pseudo-data-generation algorithm that replaces information randomly in the whole domain. The algorithm in this study is used to obtain the F1 value of 74.68 when thousand pieces of pseudo-labeled data are added to the train [9]. Another method is used [10] to address word-level and sentence-level disturbances in electronic medical records, and the AFKF model of fusion Kalman filtering is ALBERT-based. They used CNN and RNN for the purpose of text classification and obtained an accuracy of 76% Another methodology, the PDBP-Fusing approach [11], has been developed to identify DNA-binding proteins using only primary sequences, by fusing both local characteristics and long-term dependencies in the analysis. In this approach, CNN was employed to learn local features, while a Bi-LSTM was utilized to capture key long-term dependencies in context. In addition, a nucleotide-level hybrid deep learning technique, combining the strengths of both CNN and LSTM networks, was proposed in [12, 13] to address the limitations of previous models. This novel approach showed promising results, with an accuracy rate of 95% in DNA sequence classification.
3 Methodology The model architecture used in this work can be seen in Fig. 1. Here we can see that the input is given as proteins sequences. We pre-process the data and split the data into training and testing. We use CNN model for feature extraction and LSTM/ Bi-LSTM for classification.
3.1 Data Collection The data used in this study was obtained from Kaggle. The class distribution of the sample was carefully examined and found to be perfectly balanced, meaning that the number of samples for each class is equal. This balance is critical in ensuring that the data is not biased toward any particular class and provides a fair representation of all classes. Maintaining a balanced class distribution is crucial when constructing models for classification problems, as it helps to prevent over-fitting and promotes the model’s generalization ability.
82
B. Siva Jyothi Natha Reddy et al.
Fig. 1 Process of the proposed model for categorizing DNA sequences
3.2 Data Pre-processing The dataset analyzed in this study contains categorical data in the form of genomic sequences, which must be transformed into numerical data for processing. This transformation can be achieved using various techniques, such as encoding. In this study, one-hot encoding was used to convert the nucleotide data from categorical to numerical representation. One-hot encoding [14] assigns a unique binary vector to each category, where the length of the vector is equivalent to the number of categories. For example, if the data contains four categories, each category would be represented as a binary vector of length 4, with a single 1 representing the corresponding category and the rest being 0s.
3.3 Classification Models In the classification of DNA sequences along with traditional CNN [15], three models are widely used. These models combine CNNs with either LSTM or Bi-LSTM. This combination has proven to be effective in identifying the characteristics of DNA sequences, resulting in a deeper understanding of the biology and genetics involved. The use of these models has transformed the analysis of DNA sequences, leading to numerous scientific advancements and discoveries in the field of bioinformatics.
7 Comparison of Deep Learning Approaches for DNA-Binding Protein …
83
4 Result and Discussion The data set are split into 80% training data and 20% testing data. During learning criteria, the logistic-loss function is utilized here as a cost function. The discrepancy between the intended label and the actual output is computed by binary cross entropy function. Based on that, the weights have been updated and trained. By ranging the settings of various model parameters such as layer count, filter size, and filters. We examined the algorithms like CNNs, CNN + LSTM, and CNN + Bi-LSTM (see in Fig. 2). To choose the finest parameters for the model, cross-validation generator was used as the framework optimization method. The finest parameters for these three models are 128, 64, and 32 the number of filters in every layer. The dimension of the filters are 2 × 2, size of the training batch is 128, with 10 training epochs. Based on different classification metrics such as accuracy, recall, precision, and F1 score from the confusion matrix, the classification models are examined. The performance of the model over the course of training and validation. It represents the plot for the classification of DNA-binding proteins using the CNN + BiLSTM model. From the above results, we can differentiate between the performances among the three models. First, considering the summary, we get to know that the CNN model has six layers with 60,813 trainable parameters, CNN + LSTM has eight layers with 83,775 trainable parameters, and finally, CNN + Bi-LSTM has nine layers with 189,887 trainable parameters. According to empirical studies, the more parameters, the better the result of the neural network. CNN + Bi-LSTM has the maximum number of trainable parameters, so it can yield better results as compared to CNN and CNN + LSTM. A comparison between the accuracy, precision, recall, and F1 score of all three models can be seen in Table 1. From Table 1, we can clearly conclude that CNN + Bi-LSTM came out as the best model of the three. Previous works in this concept Fig. 2 Accuracy plot CNN + Bi-LSTM
84
B. Siva Jyothi Natha Reddy et al.
Table 1 Comparison chart of all hybrid models Model
Accuracy (%)
Precision (%)
Recall (%)
F1 Score (%)
CNN
94.5
96.2
92.3
94.2
CNN + LSTM
97.7
96.5
98.9
97.7
CNN + Bi-LSTM
98.75
97.5
99.87
98.7
yielded some quite good results. The models implemented in this paper have given even better results. The CNN + LSTM of our model gives an accuracy of 97.7%, whereas a method previously used by implementing CNN + LSTM using K-Mer encoding gave an accuracy of 93.16% [5]. Another model yielded an accuracy of 94.3% by using CNN + LSTM [12]. Here we can clearly see that our CNN + LSTM model has outperformed its predecessors. CNN + Bi-LSTM outperformed previous models. It gave an accuracy of 98.75%. Previously, a method of using CNN + Bi-LSTM using K-Mer encoding gave an accuracy of 93.13% [5]. Another work yielded an accuracy of 94.5% [16]. Hence, the CNN + Bi-LSTM model proposed in this paper is better than the previous works. In terms of robustness, all three models have shown promising results in predicting DNA protein binding. However, the CNN + Bi-LSTM model has been reported to outperform the other two models in terms of accuracy and robustness.
5 Conclusion This dissertation has evaluated three deep learning methods, CNNs, CNN + LSTM, and CNN + Bi-LSTM [2] using one-hot encoding as the encrypting method. This result termed CNN + Bi-LSTM exceeds the various algorithms by 99%. The evaluated classification on different metrics while the precision of CNN + Bi-LSTM gave 97.5%, recall of CNN + Bi-LSTM gave 100%, and the F1 score of CNN + Bi-LSTM gave 98.7%.
References 1. Schmidt MF (2022) DNA: Blueprint of the proteins. In: Chemical biology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-64412-6 2. Bailey J (2022) Nucleosides, nucleotides, polynucleotides (RNA and DNA) and the genetic code. In: Inventive geniuses who changed the world. Springer, Cham. https://doi.org/10.1007/ 978-3-030-81381-9 3. Aslan MF, Unlersen MF, Sabanci K, Durdu A (2021) CNN based transfer learning-BiLSTM network: a novel approach for COVID-19 infection detection. Appl Soft Comput 98:106912 4. Zhang YQ, Ji S, Li S, Yizhou (2020) DeepSite: bidirectional LSTM and CNN models for predicting DNA–protein binding. Int J Mach Learn Cybern 11. https://doi.org/10.1007/s13 042-01900990-x
7 Comparison of Deep Learning Approaches for DNA-Binding Protein …
85
5. Gunasekaran H, Ramalakshmi K, Rex Macedo Arokiaraj A, Deepa Kanmani S, Venkatesan C, Suresh Gnana Dhas C (2021) Analysis of DNA sequence classification using CNN and hybrid models. Comput Math Methods Med 2021:1835056. PMID: 34306171; PMCID: PMC8285202. https://doi.org/10.1155/2021/1835056 6. Shadab S, Alam Khan MT, Neezi NA, Adilina S, Shatabda S (2020) DeepDBP: deep neural networks for identification of DNA-binding proteins. Inf Med Unlocked 19:100318 7. Trabelsi A, Chaabane M, Ben-Hur A (2019) Comprehensive evaluation of deep learning architectures for prediction of DNA/RNA sequence binding specificities. Bioinformatics 35(14):i269–i277. https://doi.org/10.1093/bioinformatics/btz339 8. Mohammed A, Mahmoud B, Guo P. DNA sequence classification based on MLP with PILAE algorithm 9. Dhiman G, Juneja S, Viriyasitavat W, Mohafez H, Hadizadeh M, Islam MA, El Bayoumy I, Gulati K (2022) A Novel machine-learning-based hybrid CNN model for tumor identification in medical image processing. Sustainability 14:1447. https://doi.org/10.3390/su14031447 10. Li J, Huang Q, Ren S, Jiang L, Deng B, Qin Y (2023) A novel medical text classification model with Kalman filter for clinical decision making. Comput Methods Programs Biomed 200:105917. https://doi.org/10.1016/j.cmpb.2021.105917 11. Li G, Du X, Li X, Zou L, Zhang G, Wu Z (2021) Prediction of DNA binding proteins using local features and long-term dependencies with primary sequences based on deep learning. PeerJ 9:e11262. PMID: 33986992; PMCID: PMC8101451. https://doi.org/10.7717/peerj.11262 12. Tasdelen, A., Sen, B. A hybrid CNN-LSTM model for pre-miRNA classification.Sci Rep 11, 14125 (2021). https://doi.org/10.1038/s41598-021-93656-0 13. Abraham MA, Srinivasan H, Namboori C, Krishnan (2019) Healthcare security using blockchain for pharmacogenomics. J Int Pharm Res 6:529–533 14. Nguyen N, Tran V, Ngo D, Phan D, Lumbanraja F, Faisal M, Abapihi B, Kubo M, Satou K (2016) DNA sequence classification by convolutional neural network. J Biomed Sci Eng 9:280–286. https://doi.org/10.4236/jbise.2016.95021 15. Oviya IR, Spandana C, Krithika S, Priyadharshini AR (2022) Chest X-ray pathology detection using deep learning and transfer learning. In: 2022 IEEE 7th International conference on recent advances and innovations in engineering (ICRAIE), Mangalore, India, pp 25–30. https://doi. org/10.1109/ICRAIE56454.2022.10054329 16. Hu S, Ma R, Wang H (2019) An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences. PLoS ONE 14(11):e0225317. https://doi.org/10.1371/journal.pone.0225317
Chapter 8
Exploring Jaccard Similarity and Cosine Similarity for Developing an Assamese Question-Answering System Nomi Baruah, Saurav Gupta, Subhankar Ghosh, Syed Nazim Afrid, Chinmoy Kakoty, and Rituraj Phukan
1 Introduction The information age is currently underway. As the amount of information available grows and the world becomes more informational, the virtual information retrieval system, which is an artificial question-answering system, maintains its importance [1]. Users frequently look for answers to specific questions. They like straightforward, explicit responses, and they always prefer to ask questions in their native tongue as opposed to being constrained by a query language, query formation guidelines, or even a specific knowledge subject. A linguistic analysis of the inquiry and an attempt to understand the user’s true intentions are the new approaches to meet user needs [2]. The AQAS is made up of three primary modules in Assamese NLP: data collecting, information and user question processing, and establishing relationships between them [3]. For the purpose of NLP pre-processing, approaches used are tokenization and stop word removal. We have used the NLTK tokenizer that divides strings into lists of substrings. We employed the NMF to lower the dimension of a question and information while simultaneously reducing the program’s execution time [4]. It also makes it easier to comprehend and calculate in a straightforward manner. We used cosine and Jaccard similarity to generate user-requested replies. These methods help in the construction of a connection between user questions and data. Here is an overview of the contributions of the proposed research work: • For information retrieval, we have presented mathematical and statistical techniques. • Stop word removal and tokenization were performed as part of the data preprocessing. N. Baruah (B) · S. Gupta · S. Ghosh · S. N. Afrid · C. Kakoty · R. Phukan Dibrugarh University, Dibrugarh, Assam 786004, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_8
87
88
N. Baruah et al.
• To minimize the complexity of time and space and to provide rapid answers to inquiries, we have used the NMF with cosine similarity. • Cosine similarity and Jaccard similarity were implemented to develop the questionanswering system. The rest of the paper is structured as the next section provides the literature study done during the development of this research work. Section 3 explains the background study for the project, which was followed by Sect. 4, which consists of our proposed work. In Sect. 5, the pre-processing stage of the work is described. Section 6 contains the establishment and relationship between the two algorithms that are being used. Section 7 provides the experiment and results of the project, and lastly, Sect. 8 explains the conclusion and future work of the project.
2 Related Study As stated in the introduction, question-answering (Q-A) systems are a preferable choice for knowledge exchange, textual information retrieval, and discovery. This is the rationale behind the recent development of a vast variety of Q-A systems [5]. These systems cater to a wide range of international languages. However, some are better served than others. Furthermore, large-scale Q-A techniques for a much larger environment, such as the World Wide Web, have received significant attention. Some search engines include a question-answering feature that works well [6]. Many different Q-A systems have been developed in various settings when it comes to languages. Q-A systems have been extensively examined in the instance of Latin languages. Particularly well-served is English. This is partly due to the fact that the great majority of documents on the Internet are written in English. “Baseball” is one of the oldest question-answering systems. It supports a finite number of questions on corpora having a defined collection of documents [7]. The web is used as a resource by Q-A systems, such as “start” and “swingly,” which both employ search engines to get answers. There is also “qalc,” a Q-A system for English factoids in the free domain. For each query, our system does a syntactic and semantic analysis. Nonetheless, it has some faults as a result of insufficient syntactical rules. (Edipe is a morpho-syntactic pattern-based Q-A system created by the LIC2M in France.) However, in terms of the tools employed, it takes a minimalist approach [8]. To improve the accuracy of question-answering, Gomes et al. presented a Hereditary Attentive Template-based Approach for Complex Knowledge Base QuestionAnswering Systems (KA-HAT). This approach combines the benefits of templates and deep learning [9]. On a benchmark dataset, the proposed approach yields an F1 score of 82.5 percent. A BERT-based triple classification model with knowledge graph embeddings for question-answering is presented by Do et al. [10]. The model achieves an accuracy of 84.2% on the WebQuestionsSP dataset, outperforming several previously existing models. Sentiment analysis [11] is being used for improving the performance of question-answering systems. There are numerous studies that
8 Exploring Jaccard Similarity and Cosine Similarity…
89
concentrate on using sentiment analysis to enhance the effectiveness of questionanswering systems [12, 13]. These approaches show promising results in predicting the sentiment and topic of queries and selecting the best answer in CQAS, respectively. The accuracy achieved by the systems demonstrates the potential of sentiment analysis in improving the performance of Q-A systems. Short-text clustering [14] is also used in developing Q-A systems [15]. The future looks bleak for the Assamese language. Regardless of being a commonly used language on the Internet, the Assamese language lacks computerized tools and resources. Q-A systems created exclusively for Assamese are one type of these missing tools [16]. The number of developed Assamese Q-A systems is still small as compared to those established for English or French, for example. This is owing to two factors: a lack of access to linguistic resources and tools, such as corpora and fundamental Assamese NLP tools, and the language’s extremely complicated character (for instance, Assamese is inflectional and not concatenative, and there is no capitalization as in the case of English) [5].
3 Background 3.1 Tokenization Tokenization is a natural language processing (NLP) approach that divides lengthy text sequences into tokens. These “tokens” could be single words or whole sentences. Tokenization is done to make it easier to build NLP models and to improve contextual comprehension of the text. Tokenization assists with meaning interpretation and the extraction of insightful information from the text by examining the word order inside these tokens. We have used the NLTK Tokenizer that divides strings into lists of substrings [17].
3.2 Stop Word Removal Stop words are terms that are excluded from natural language data processing in computing. A common pre-processing method in many natural language processing (NLP) applications is the elimination of stop words. The main goal is to get rid of words that are used a lot throughout the entire corpus. Pronouns and articles are frequently categorized as stop words. These overused terms can be eliminated, allowing the reader to concentrate on the text’s more significant and informative content [18].
90
N. Baruah et al.
3.3 Cosine Similarity Cosine similarity is a metric for comparing two nonzero vectors in an inner product space that calculates the cosine of the angle between them. As indicated in [19], the cosine of two nonzero vectors can be calculated as follows: A · B = ||A|| · ||B|| · cosθ.
(1)
3.4 Jaccard Similarity The Jaccard similarity is a statistical technique for figuring out how similar different sample sets are to one another. The Jaccard index formula, which is mentioned in [19], is given if P and Q are two sets: Jindex (P, Q) =
|P ∩ Q| |P ∩ Q| ∗ 100 = ∗ 100. |P ∪ Q| |P| + |Q| − |P ∪ Q|
(2)
4 Proposed Work In this research paper, we have presented an Assamese question-answering system using Assamese natural language processing. The procedure is isolated into three parts that are informative document collection, pre-processing data, and relationships between information and user questions. The action of cosine similarity and Jaccard similarity algorithms is urged to obtain the relationship between the questions and answers. A group of question and answer pairs collected on a specific topic is referred to as an informative document collection. Following the user’s question, cosine or Jaccard algorithms will be used to determine the relationship between the questions from users and the questions that are from the dataset. The responses to the questions with the greatest similarity will be returned. A question will be used by the user to provide input. The question will then be pre-processed, which means it will be cleansed to see if there is any undesirable data. Following that, tokenization and stop word removal will be performed. The training dataset’s questions will also be pre-processed in the same way. Following pre-processing, the Jaccard and cosine similarity algorithms will be used to compare the query question and the dataset questions for similarity. All of the questions in the training dataset will be checked for similarity by the algorithms. The user will receive the answer connected with the question that has the greatest similarity. A pictorial diagram of the proposed work is shown in Fig. 1.
8 Exploring Jaccard Similarity and Cosine Similarity…
91
Fig. 1 Pictorial diagram of the proposed work
5 Pre-processing To run the algorithm, the dataset needs to be pre-processed. The AQAS uses two preprocessing methods. The term “cleaning words” refers to the removal of undesirable characters from the dataset, such as colons, commas, exclamation points, semicolons, question marks, and other punctuation. A common approach for separating a given text into individual words is called word tokenization. The text is divided into discrete tokens at the word level by this algorithm using particular delimiters. The choice of delimiter affects how different word-level tokens are formed. (prathame byaktijon teor ghorloi For example: goisil). After applying tokenization: Now stop word removal. Stop words refer to words that do not have any impact on the meaning of a document, but help to complete the sentence. The second one is stop word removal. After removing the stop words the tokens left are
92
N. Baruah et al.
6 Establishment of Relationship 6.1 Jaccard Similarity The mathematics behind the Jaccard Similarity algorithm has been displayed in the following example. Here, initially, we took two sentences as doc_1 and doc_2. Later, we took the intersection and union of the two sentences as expressed in Eq. 3. After pre-processing, the questions and data can be revealed as doc_1= doc_2 = Let’s get the set of unique words for each document. wor ds_doc1 = wor ds_doc2 Now, we will calculate the intersection and union of these two sets of words and calculate the Jaccard Similarity between both documents.
(3)
6.2 Cosine Similarity Cosine similarity is one of the measures used in natural language processing to compare the text-similarity of two documents, regardless of their size. Vector representations of words are used. The text documents are visualized as vector objects in an n-dimensional space. The cosine similarity metric calculates the cosine of the angle formed by two vectors in n dimensions that are projected into a multidimensional space. The two documents’ cosine similarity will fall between 0 and 1. The orientation of two vectors is the same if the cosine similarity score is 1. A value that is nearer 0 means that there is less resemblance between the two documents [19]. The cosine similarity between two nonzero vectors can be computed using the following mathematical equation:
8 Exploring Jaccard Similarity and Cosine Similarity…
similarity = cos(θ ) =
93
n Ai Bi A.B = i=1 n n ||A||.||B|| 2 A i=1
i
2 i=1 Bi
.
(4)
Let us examine an illustration of how to compute the cosine similarity between two text documents: doc_1 = doc_2 = Vector representation of the document doc_1_vector = [1, 1, 1, 0, 1, 0, 1] doc_2_vector = [1, 0, 0, 1, 1, 1, 1] n Ai Bi = (1 ∗ 1) + (1 ∗ 0) + (1 ∗ 0) + (0 ∗ 1) + (1 ∗ 1) + (0 ∗ 1) A.B = i=1 + (1 ∗ 1) = 3 √ √ n 2 1+1+1+0+1+0+1= 5 i=1 Ai = √ √ n 2 1+0+0+1+1+1+1= 5 i=1 Bi = cosine similarity = cosθ =
A.B |A|.|B|
=
√ 3√ 3 5
=
3 5
= 0.6.
7 Experiments In this section, we offer a set of experiments that were carried out to assess the efficacy of our proposed model as well as the mathematical and statistical methodologies used in AQAS. In order to set the stage for these investigations, we first describe the precise questions we hope to answer and give information on the setting. We then examine and assess the effectiveness and outcomes of our suggested technique.
7.1 Data Preparation Since in the future, this project can be used as a bot in any commercial website, we have built the dataset in context to a particular topic or domain(e.g., travel agency booking, course subscription). The developed dataset has 200 question-answer pairs in the training set and 50 question-answer pairs in the testing set.
94
N. Baruah et al.
7.2 Experimental Setup We implemented our propounded model in Anaconda distribution with Python 3.7 programming language and executed them on a Windows 10 PC with an Intel Core i7, CPU (3.20GHz), and 8GB memory. Python is a high-level object-oriented language (OOP) that is suitable for scientific examination and tool development. We have used the Anaconda as the apportionment of Python. Anaconda creates the best stage for open-source data science which is powered by Python.
7.3 Result and Analysis A retrospective analysis was used to evaluate the effectiveness of these techniques using top-1 accuracy measurements. Top-1 accuracy is the standard accuracy, which requires that the model’s output match the expected output. Accuracy =
total no. of correctly classified news articles ∗ 100. total no. of news articles
(5)
In this study, we evaluated the performance of two similarity metrics, Jaccard similarity, and cosine similarity, for answering 250 questions. Our results showed that Jaccard similarity achieved an accuracy of 87.38%, while cosine similarity achieved an accuracy of 93.8%. Our findings indicate that cosine similarity outperforms Jaccard similarity in answering the given questions. The higher accuracy of cosine similarity can be attributed to its ability to capture the semantic similarity between words, as opposed to Jaccard similarity which only considers the overlap of words between two sentences. Our results suggest that cosine similarity is a more effective similarity metric for answering questions in our dataset. Future work could explore the use of other similarity metrics and their performance on larger datasets. We compared our ideas to work that has been done in Bengali or Bangla as a point of comparison. We had to investigate the suggested approach in those scripts because Assamese and Bangla scripts have the fewest shared alphabets. However, there are in Bengali/Bangla a few tiny typographical changes. The letter roˆ is written as in Assamese, whereas the character pronounced as woˆ is written and as (bo) ˆ in Bengali script. In addition, the character (khyo) is missing from as the Bengali script in Assamese. Interestingly, the Bengali/Bangla language, which has the Subject-Object-Verb (SOV) word order, has a script that is very similar, yet dialect differences occur in both languages. Table 1 represents the comparison of the proposed work with previously existing work. To determine the similarity between sentences within an article, we compared each sentence against itself and all other articles. Since the similarity of a sentence to itself is always 1, we replaced all diagonal values in the similarity matrix with 0.
8 Exploring Jaccard Similarity and Cosine Similarity… Table 1 Comparison of accuracy with previous work Algorithm Our accuracy (%) Cosine similarity Jaccard similarity
93.80 87.38
Table 2 Similarity matrix of five documents D1 D2 D1 D2 D3 D4 D5
1 0.2236 0.1884 0.2674 0.1722
0.2236 1 0.1350 0.2450 0.2308
Table 3 Replacing the diagonal values to 0 D1 D2 D1 D2 D3 D4 D5
0 0.2236 0.1884 0.2674 0.1722
0.2236 0 0.1350 0.2450 0.2308
0 0.2236 0.1884 0.2674 0.1722
0.2236 0 0.1350 0.2450 0.2308
[19] (%) 93.22 84.64
D3
D4
D5
0.1884 0.1350 1 0.1817 0.2840
0.2674 0.2450 0.1817 1 0.2495
0.1722 0.2308 0.2840 0.2495 1
D3
D4
D5
0.1884 0.1350 0 0.1817 0.2840
0.2674 0.2450 0.1817 0 0.2495
0.1722 0.2308 0.2840 0.2495 0
Table 4 Most similar document in terms of similarity value D1 D2 D3 D1 D2 D3 D4 D5
95
0.1884 0.1350 0 0.1817 0.2840
D4
D5
0.2674 0.2450 0.1817 0 0.2495
0.1722 0.2308 0.2840 0.2495 0
This adjustment was made to ensure that we could identify the most similar document for each article, excluding self-comparisons. To find the most similar news article for a given article, we computed the maximum similarity value from the similarity matrix. The following tables show the illustration of the procedure explained above. D1, D2, D3, D4, and D5 represent sentence-level documents. Tables 2 and 3 represent the cosine similarity values between the documents, whereas Table 4 represents the highlighted values of the similar documents.
96
N. Baruah et al.
7.4 Comparison Between English Chatbot and AQAS 1. How to go to your office? Mitsuku: Take the metro and stop at the terminus, we’re 5 min away from there. AQAS: 2. How can I call you? Mitsuku: You can’t call me, you can only talk to me here. AQAS: 3. When can I check in? Mitsuku: Anything you want. AQAS: 4. What can you do? Mitsuku: Anything you want. AQAS-
8 Conclusion and Future Work The AQAS is designed to provide exact responses to inquiries posed by users in natural language. The ability of Q-A systems to deliver precise answers is one of its most crucial characteristics. The user asks a question in natural language to these systems, not knowing the structure of the sources to be queried. In the proposed work, the NMF is used to reduce the time complexity and the space complexity as it allows us to reduce the dimension of the data. The highest result achieved from our work is 93.80% for cosine similarity. The proposed question-answering system is capable of answering questions that are asked in the Assamese language. After a few improvements, the proposed Q-A system can be used for the following applications: 1. The proposed system can be used in a search engine. 2. Individuals with reading disabilities can get benefit from the Q-A system by incorporating a speech recognition feature into the system. 3. Can be used on websites for user queries addressing 4. The proposed system can be used in online lectures for a more interactive experience.
8 Exploring Jaccard Similarity and Cosine Similarity…
97
5. The existing system can be used for various analytical tasks involving information gathering and analysis. 6. The system can expedite text review in situations where a comprehensive assessment is time-consuming. In the future studies, the dataset size could be increased for better accuracy. Future research of this work can explore new ways to improve the Q-A system performance by implementing neural networks, deep learning, and machine learning algorithms. Another possible direction is incorporating user feedback into the system, which can help the model learn from its mistakes and refine its predictions.
References 1. Lai Y, Jia Y, Lin Y, Feng Y, Zhao D (2018) A Chinese question answering system for single-relation factoid questions. In: Natural language processing and chinese computing: 6th CCF international conference, NLPCC 2017, Dalian, China, Nov 8–12, 2017, Proceedings 6. Springer, pp 124–135 2. Sahu S, Vasnik N, Roy D (2012) Prashnottar: a Hindi question answering system. Int J Comput Sci Inf Technol 4(2):149 3. Hammo B, Abu-Salem H, Lytinen SL, Evens M (2002) Qarab: A: question answering system to support the Arabic language. In: Proceedings of the ACL-02 workshop on computational approaches to Semitic languages 4. Gupta P, Gupta V (2012) A survey of text question answering techniques. Int J Comput Appl 53(4) 5. Gupta V, Lehal GS (2011) Named entity recognition for Punjabi language text summarization. Int J Comput Appl 33(3):28–32 6. Uddin MM, Patwary NS, Hasan MM, Rahman T, Tanveer M (2020) End-to-end neural network for paraphrased question answering architecture with single supporting line in Bangla language. Int J Future Comput Commun 9(3) 7. Mishra A, Jain SK (2016) A survey on question answering systems with classification. J King Saud Univ-Comput Inf Sci 28(3):345–361 8. Dhanjal GS, Sharma S, Sarao PK (2016) Gravity based Punjabi question answering system. Int J Comput Appl 147(3):21 9. Gomes Jr J, de Mello RC, Ströele V, de Souza JF (2022) A hereditary attentive template-based approach for complex knowledge base question answering systems. Expert Syst Appl 205:117, 725 10. Do P, Phan TH (2022) Developing a Bert based triple classification model using knowledge graph embedding for question answering system. Appl Intell 52(1):636–651 11. Pradhan R, Sharma DK (2022) An ensemble deep learning classifier for sentiment analysis on code-mix Hindi–English data. Soft Comput 1–18 12. Oh JH, Torisawa K, Hashimoto C, Kawada T, De Saeger S, Wang Y et al (2012) Why question answering using sentiment analysis and word classes. In: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning, pp 368–378 13. Pessutto L, Moreira V (2022) Ufrgsent at semeval-2022 task 10: structured sentiment analysis using a question answering model. In: Proceedings of the 16th international workshop on semantic evaluation (SemEval-2022), pp 1360–1365 14. Pradhan R, Sharma DK (2022) A hierarchical topic modelling approach for short text clustering. Int J Inf Commun Technol 20(4):463–481
98
N. Baruah et al.
15. da Silva JWF, Venceslau ADP, Sales JE, Maia JGR, Pinheiro VCM, Vidal VMP (2020) A short survey on end-to-end simple question answering systems. Artif Intell Rev 53(7):5429–5453 16. Gupta D, Kumari S, Ekbal A, Bhattacharyya P (2018) Mmqa: a multi-domain multi-lingual question-answering framework for English and Hindi. In: Proceedings of the Eleventh international conference on language resources and evaluation (LREC 2018) 17. Choo S, Kim W (2023) A study on the evaluation of tokenizer performance in natural language processing. Appl Artif Intell 37(1):2175, 112 18. Kochhar TS, Goyal G (2022) Design and implementation of stop words removal method for Punjabi language using finite automata. In: Advances in data computing, communication and security: proceedings of I3CS2021. Springer, pp 89–98 19. Kowsher M, Rahman MM, Ahmed SS, Prottasha NJ (2019) Bangla intelligence question answering system based on mathematics and statistics. In: 2019 22nd international conference on computer and information technology (ICCIT). IEEE, pp 1–6
Chapter 9
Artificial Neural Network Modelling for Simulating Catchment Runoff: A Case Study of East Melbourne Harshanth Balacumaresan, Md. Abdul Aziz, Tanveer Choudhury, and Monzur Imteaz
1 Introduction In recent times, urban flooding is being pondered as a major global environmental issue, primarily due to the strong intensity exhibited by these events [1], diminutive response period [2] and the countless repercussions being imposed upon urban ecosystems, leading to myriads of casualties, substantial infrastructural damage, transmission of public health risks and colossal socio-economic cataclysms [3, 4]. Majority of the urban landscape is covered by high proportions of impervious surfaces, renowned for their low infiltration rates and accelerated runoff generation capacity [4], where the prevalence of a short duration intense rainfall event can trigger an incessant residual urban flooding risk [3, 5]. The urban flood forecasting process is contemplated to be decidedly complex and nonlinear [6, 7], pertaining to the complex topographical features and limited data availability associated with urban catchments and the influential hydrological processes being inherently nonlinear [6, 8], showcasing high degree of spatial and temporal variability, respectively [6, 9, 10]. In the last decade, researchers have employed numerous types of conventional hydrological rainfall–runoff, simple conceptual and empirical models to explore the urban catchment response exhibited to extreme rainfall intensity events [2, 8]. However, amongst other drawbacks, the general assumption of linear relationships between input and output variables [2, 11], requirement of detailed datasets to work H. Balacumaresan (B) · Md. A. Aziz · M. Imteaz Department of Civil and Construction Engineering, Swinburne University of Technology, Melbourne, Australia e-mail: [email protected] T. Choudhury School of Engineering, Information Technology and Physical Sciences, Federation University Australia, Churchill, VIC 3353, Australia © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_9
99
100
H. Balacumaresan et al.
with [6] and the high computation costs associated with these models have limited their prediction accuracy, reliability and favourability [6, 11–13]. In recent times, Artificial Neural Network (ANN) models have gained significant prominence within the hydrological research community as a feasible alternative to conventional models [1, 5, 7, 12, 14]. The reliability of these models has been widely evaluated in numerous studies [12, 15–18], where they have displayed numerous favourable benefits such as the ability to identify and simulate highly complex and nonlinear relationships between input and output variables without the need to understand the nature of the physical processes involved [15, 16], low computation cost, higher prediction accuracy rate [19], to state some, thereby indicating these models to be a feasible option to be considered for application in urban catchments. Limitations associated with the Victorian Water Industry’s current modelling practices have led to substantial inaccuracies and underestimation of the expected flow and flood quantiles, resulting in dire consequences, such as the 2022 Maribyrnong River flood debacle [20, 21]. This strongly highlights the need for improvements in the accuracy of the current flood flow estimation process, which if left unaddressed could further worsen, in lieu of the adverse impacts of climate change. Hence, the primary aim of this research study is focused upon enhancing the accuracy of the hydrological flow estimation process, incorporating ANN modelling techniques. Majority of the past research conducted on incorporating ANN modelling techniques in rainfall–runoff estimation have utilised multiple physical, geomorphological, or hydro-climatic parameters besides the localised rainfall and flow [6, 16], as input parameters in rainfall–runoff estimation process [18, 19, 22–24]. However, there has been limited research where the effectiveness of ANN model, in predicting catchment runoff, based upon minimal input data providence has been evaluated. Especially, the research dimension where the localised rainfall was used as the primary input variable to estimate the flow in the catchment’s most upstream location, followed by adopting the estimated flow from the upstream catchment and re-incorporating it as a supplementary input variable [25], alongside the localised rainfall for accurate estimate of the expected flow in the catchment locations further downstream has been scarcely explored. Therefore, this research intends to explore this research dimension, as part of the key objectives, using the localised rainfall and flow data for the study area (Gardiners Creek catchment), accordingly, to enhance the accuracy of the ANN models’ flow estimation. Two of the most powerful supervised learning algorithms—Levenberg– Marquardt (LM) and Bayesian Regularisation (BR)—were employed, as part of the ANN model development process, where their predictive abilities, in terms of accurately estimating the expected flood flow, were comparatively assessed and validated against the gauging station flow records. The key objectives of the research project are the development of an ANN-based improved accuracy flood flow estimation model and application of the developed ML model in effective prediction of urban catchment runoff, in comparison against the actual gauging station observations, based upon minimal data providence—localised rainfall and flow data. Amongst the multiple simulations conducted, the most optimum model dimensions, in terms of statistical performance, were plotted out and proposed as a potential alternative, that
9 Artificial Neural Network Modelling for Simulating Catchment Runoff …
101
can be considered for accurate flow estimation of urban catchments across the state of Victoria.
2 Study Area The highly urbanised Gardiners Creek catchment, situated in the eastern suburbs of Melbourne, with an average fraction imperviousness of 46% was selected as the study area for this case study (as seen in Fig. 1). The creek originates at the Middleborough Road retarding basin, flowing over a total length of over 16.5 km, and outlets to the Yarra River, approximately 6 km from the Melbourne Central Business District (CBD) at Heyington, encompassing a total catchment area of 111 km2 [15]. The Gardiners Creek catchment area functions as a multifaceted urban environment, uniting and delivering accessibility for traversing across a multitude of Eastern municipalities, namely Boroondara, Stonnington, Whitehorse and Monash [15]. The mixed land-use distribution within the catchment area comprises of residential (64%), public use (11%), roads (9%) and other uses (16%) [15]. Three streamflow gauging stations—Great Valley Road at Gardiner (Station ID 229624A), High Street Road at Ashwood (Station ID 229625A) and Eley Road East Drain at Eley Road retarding basin (Station ID 229638A) with good-quality, complete datasets, located at close proximity to the catchment area—were selected for the ANN model calibration and performance validation processes.
StaƟon ID 229624A
StaƟon ID 229625A
Fig. 1 Study area (Gardiners Creek catchment) and locations of the gauging stations
StaƟon ID 229638A
102
H. Balacumaresan et al.
3 Methodology 3.1 Description of Datasets Quality-checked daily rainfall (in mm) and stream discharge (in m3 /s) data available at 6-min time intervals, ranging from 8 April 1977 to 1 December 2021, recorded at the various rainfall and streamflow gauging stations located in close proximity to the Gardiners Creek catchment area, were provided by Melbourne Water Corporation. The rainfall and flow datasets were assessed for completeness (complete dataset with no missing values) and data quality, where three streamflow gauging stations were selected—Great Valley Road at Gardiner (Station ID 229624A), High Street Road at Ashwood (Station ID 229625A) and Eley Road East Drain at Eley Road retarding basin (Station ID 229638A) for the model calibration and performance validation. Two recent historical storm events—4 February 2011 and 6 November 2018, that transpired within the catchment over the last decade, were selected based upon the flood history and rainfall records from the Victoria State Emergency Services (SES) local flood guide [26]. The storm event selection is restricted within the last decade to comprehend the catchment’s current state of response, integrating the impacts of recent catchment characteristics modifications, environmental issues and climate change. The storm event data selection involves infusing three days prior to and after the storm event prevalence date, to obtain a better overall understanding of the current catchment response. There are two main input variables—the localised rainfall data and the flow from the upstream catchment—and one output variable which is the estimated flow/runoff. Rainfall is the primary input and is initially considered as the only input variable for the most upstream catchment location (Station ID 229638A). As the analysis moves further downstream, it is necessary to consider the flow from the upstream catchment, given the holistic complexity of upstream–downstream linkages of hydrological processes [25, 27]. Accordingly, the upstream flow was considered as a supplementary input variable, in addition to rainfall for the other two downstream catchment locations—High Street Road at Ashwood and Great Valley Road at Gardiner. MathWorks MATLAB software was utilised in the development of the Artificial Neural Network (ANN) model using the Neural Network (NN) toolbox, where the datasets were segregated into training and testing datasets with an 80:20 ratio, respectively.
3.2 Datasets Preprocessing The initial step was catchment and sub-catchment delineation, using GIS software, to identify the catchment characteristics and comprehend downstream impacts. This was followed by plotting the gauging station’s locations, using Geocoding tool in GIS software, based upon their coordinates provided by Melbourne Water Corporation.
9 Artificial Neural Network Modelling for Simulating Catchment Runoff …
103
The time lag was tested and adjusted accordingly for the maximum rainfall to correspond with the maximum discharge, to avoid over/(under) estimation, primarily due to the increased imperviousness in urban catchments allowing overland flow sources to reach the main channel at a more intensified pace, than under natural conditions [28, 29]. The datasets also underwent noise treatment and outliers were filtered off and removed.
3.3 Data Normalisation Prior to the ANN simulation, the rainfall and discharge data undergo the process of normalisation, to ensure all the numeric columns in the dataset share a common scale, thereby assuring that all the process parameters are treated equally by the ANN model, thereby deterring calculation errors associated with varying parameter magnitudes [30]. The data are normalised using the following normalising equation seen in Eq. 1 below, X Norm =
X O − X min , X max − X min
(1)
where X Norm denotes the normalised value, X O denotes the original value and X min and X max represent the minimum and maximum values. Following the normalisation, the data are prepared and ready to be used for calibrating and validation of the ANN model.
3.4 Development of ANN Model The ANN model being developed embraces a feedforward multilayer perceptron neural network architecture, with a hyperbolic tangent sigmoid function (tansig) being set as the neural transfer activation function for all layers. The ANN model is trained using two supervised learning algorithms—Levenberg–Marquardt (LM) and Bayesian Regularisation (BR) for a total of 1000 epochs, and a comparative analysis on the predictive ability and performance of both algorithms in accurately estimating the expected flood flow is evaluated. With the trained network performance being highly sensitive to the number and dimensions of hidden layers, it is essential for the hidden layers to be varied, with varying number of neurons within each layer, and continuously repeated until the optimal hidden layer dimensions can deduced [26]. This is vital in this case study, as urban catchments are revered for the high complexity and nonlinearity associated with the underlying physical processes, thus making the determination of the optimum number of hidden layers with the optimal number of neurons crucial in offsetting the physical processes’ nonlinearity [30–32]. The variance was conducted in the format, where for one hidden layer, the number of
104
H. Balacumaresan et al.
neurons were varied from 1 to 30, whereas for two hidden layers, the neurons were varied from 2 and 1 to 30 and 29, with an increment of one neuron in each layer. This was continued for a total of up to six hidden layers. Initially, the localised rainfall data is used as the only input parameter at all three catchment locations, based upon which the ANN model is trained and tested, in comparison with the observed flow data from the streamflow station using the two learning algorithms—LM and BR. This is followed by re-training of the two downstream catchment locations, High Street Road at Ashwood and Great Valley Road at Gardiners, through re-incorporating the estimated flow from the respective catchment upstream (Eley Road for Ashwood and Ashwood for Gardiners), as a supplementary input variable, alongside the rainfall data, in accurately estimating the flow. The inclusion of the upstream catchment flow primarily serves its purpose of accounting for the upstream flow impacts on downstream locations, thereby improving the flow estimation accuracy and obtaining a more realistic representation of the catchment response. The model performance and prediction accuracy are assessed for both the training and testing datasets, using the statistical indices of coefficient of correlation (R), coefficient of determination (R2 ) and mean absolute error (MAE), which depict how well the predicted output matches with the actual output (strength of linear relationship), goodness-of-fit achieved and the magnitude of difference coexisting between the predicted and actual outputs, respectively [30–32]. Based upon the R, R2 and MAE values, the most suitable model with the optimum hidden layer dimensions is selected, ensuring that there has been no incidence of network overfitting and the values have been properly generalised. Following this, the simulated values are denormalised, resorting them back to the actual value and plotted out.
4 Results and Discussion For the first location—Eley Road East Drain at Eley Road retarding basin (Station ID 229638A), both ANN models based on LM and BR algorithms are trained using only rainfall data as the input parameter and the flow data as the output variable for a recent storm event, November 2018, with the intention of comprehending the current catchment response state. The ANN model training is carried out of up to six hidden layers, with a desired hidden layer neuron number of 30 for a total of 1000 epochs. The models’ performance and prediction accuracy are assessed using three statistical indices, coefficient of correlation, R, coefficient of determination, R2 , and Mean Absolute Error, MAE, and the most suitable model with the optimum hidden layer dimensions is selected. The model performance for the training and testing datasets for Eley Road East Drain catchment location for the November 2018 storm event has been summarised below in Table 1. Referring to Table 1, the performance of both the models trained using LM and BR algorithms, being interpreted in terms of their R and R2 values, indicates that there is a moderate correlation coexisting between the rainfall and estimated flow, indicated
0.8466
0.8479
0.8482
0.8477
0.8482
0.848
1
2
3
4
5
6
Eley road East Drain at Eley road retarding basin (Station ID 229638A)
71.92
71.95
71.87
71.95
71.9
71.68
0.1343
0.1347
0.1347
0.1346
0.1343
0.1339
MAE
Testing
0.9022
0.8944
0.8991
0.8938
0.8953
0.9050
R
R2 MAE
81.39
80
80.83
79.88
80.16
81.91
0.1080
0.1152
0.1135
0.1064
0.1055
0.1001
0.8500
0.8509
0.8499
0.8502
0.8501
0.8501
R
72.24
72.26
72.24
72.28
72.27
72.27
R2
0.1376
0.1386
0.1388
0.1388
0.1388
0.1379
MAE
Training
R2
Training
R
Bayesian regularisation (BR)
Levenberg–Marquardt (LM)
Algorithms
No. of hidden layers
Station name
Table 1 ANN model performance for Eley Road East Drain catchment location
0.8912
0.8888
0.8894
0.8850
0.8852
0.8855
R
Testing
79.43
78.99
79.10
78.32
78.35
78.41
R2
0.1285
0.1277
0.1251
0.1241
0.1239
0.1203
MAE
9 Artificial Neural Network Modelling for Simulating Catchment Runoff … 105
106
H. Balacumaresan et al.
by the approximated R values being close to 0.85 and 0.9, in the training and testing datasets. When looking at the R2 value, a moderate level of goodness-of-fit has been achieved in both the training and testing datasets, further depicting that 71 and 81% of the variance can be attributed to the relationship between the rainfall and flow, with the remaining 29 and 19% suggesting the involvement of other environmental factors. The MAE value, approximately rounded off to 0.13 and 0.12 in the training and testing datasets, showcases the ANN model’s prediction accuracy to be good, with the ANN model better fitting the dataset and providing stronger confidence on the model predictions and result reliability. When comparing the performance of the various learning algorithms used in training the ANN model, the BR-ANN (ANN model trained with BR) only outperformed the LM-ANN (ANN model trained with LM) in terms of the training dataset, where it had a higher R and R2 values of 0.85 and 72%, in comparison with the LM-ANN, where the R and R2 values were 0.84 and 71%. However, the LM-ANN model had a lower MAE value of 0.134 in comparison to the BR-ANN model which had a higher MAE value of 0.139, indicating that the LM-ANN model predictions were more accurate. When comparing the testing dataset, the LM-ANN model outperformed the BR-ANN in terms of a higher R, R2 and lower MAE value. Based upon the results as summarised in Table 1, the ANN model specification with two hidden layers is selected as the most optimum choice, based upon the fairly moderate positive relationship exhibited in terms of the R and R2 value and the fairly lower MAE value, in both cases (training and testing using both BR and LM algorithms), in comparison with the other models. Figure 2 provides a virtual representation of the model performance, in terms of comparing the simulated output with the actual output/observed discharge data at the gauging station, for both the training and testing datasets using both the LM and BR algorithms. Referring to Fig. 2, it can be inferred that a moderate correlation and goodness-offit has been achieved between the actual observed flow records and both the LM-ANN and BR-ANN models, where the LM-ANN model clearly has a much better correlation and goodness-of-fit in comparison to the BR-ANN model predictions, indicated by the close alignment of the LM-ANN and actual outputs plots, in most cases. The few discrepancies that can be noticed can be attributed to be the effect of other environmental factors, besides rainfall, which will be further investigated. Being located upstream, close to the origin point of the Gardiners Creek, the inclusion of upstream flow is considered to have minimal impact upon this location and therefore is not modelled, given the strong R2 values attributing majority of the variance (71 and 81%) to rainfall in both cases (training and testing) for both LM and BR models. For the other two downstream locations, High Street Road at Ashwood and Great Valley Road at Gardiners, both the LM-ANN and BR-ANN are initially trained using only rainfall as the input variable, and their performance and prediction accuracy are assessed. This is followed by incorporating the flow from the upstream catchment— Eley Road for High Street Road at Ashwood and Ashwood for Great Valley Road at Gardiners, as a second input variable and the models are re-trained with the two input variables, and the performance and prediction accuracy are assessed in comparison to the actual observations from gauging station flow records. The inclusion of upstream catchment flow as a secondary input variable accounts for the effects of upstream flow
9 Artificial Neural Network Modelling for Simulating Catchment Runoff … Actual Output
LM-ANN Model
107
BR-ANN Model
4
Discharge (m3/s)
3.5 3 2.5 2 1.5 1 0.5
5/11/2018
7/11/2018
19:00
20:00
21:18
19:30
19:42
19:18
19:06
18:54
12:42
15:12
11:00
6/11/2018
11:12
10:48
10:24
10:36
10:12
10:00
9:12
9:42
7:12
7:00
3:12
6:36
3:00
5:12
4:06
4:30
3:06
3:24
2:12
1:36
0
9/11/2018
Date and Time
Actual Output
LM-ANN Model
BR-ANN Model
4
Discharge (m3/s)
3.5 3 2.5 2 1.5 1 0.5 0 23:48
3:00
4/11/2018 5/11/2018
6:42
8:54
9:00
9:06
9:24
9:30
9:36
6/11/2018
9:54
11:18 15:00 15:18 18:48 22:00 7/11/2018
Date and Time
Fig. 2 Performance evaluation plots of ANN model for Eley Road East Drain at Eley Road retarding basin discharge station for training dataset (above) and test dataset (below) for November 2018 storm event using both LM and BR algorithms
on downstream locations, given the holistic complexity associated with upstream– downstream linkages of hydrological processes [25, 27], while also providing a more realistic representation of the catchment response and contributing to improvements in prediction accuracy. When referring to the comparative assessment summary as seen in Table 2, it can be inferred that both the LM-ANN and BR-ANN models performed much better when the flow from the upstream catchment was considered as a second input variable, along with rainfall, in comparison to the models’ performance with rainfall as the only input variable, in both training and testing. This is reflected in terms of the high R and R2 values indicating that a very strong positive correlation co-exists between the input and output variables, while also indicating a high goodness-of-fit has been achieved, with more than 97 and 84% of the variance being attributed to the input variables. The MAE values are also found to be very low, indicating that the model
108
H. Balacumaresan et al.
has high prediction accuracy and fits the dataset, providing stronger confidence in the reliability of the model predictions. The high R and R2 values and the low MAE values insinuate that the use of upstream catchment flow as a second input variable has greatly influenced and improved the prediction accuracy and performance of the ANN models, while also providing a more realistic representation of the urban catchment response and the expected flow. When comparing the performance of the ANN models, the LM-ANN model showcases slightly better performance in comparison to the BR-ANN model, in most cases as inferred from Table 2, at both the downstream locations, which is reflected by the higher R and R2 values and lower MAE values in the testing and training datasets, respectively. The BR-ANN model has a higher R and R2 values, only in the case of the training dataset where rainfall was used as the only input variable for the High Street at Ashwood catchment location. Also, at the same location, when both rainfall and upstream catchment flow are used as input variables, the LM and BR algorithms have identical R and R2 values of 0.987 and 97.4%. In all the other cases, the performance of LM-ANN was better than the BR-ANN, especially in terms of the training datasets, where the LM-ANN performed much better as reflected by the higher R and R2 values and lower MAE values. This indicates that the LMANN model exhibits a more superior performance than the BR-ANN model and has outperformed it in majority of the cases modelled. The following Figs. 3, 4, 5 and 6 serve as graphical representations of the ANN models’ results for both High Street Road at Ashwood and Great Valley Road at Gardiners Catchment locations for performance evaluation in comparison with the actual observations/discharge data. When referring to Figs. 3, 4, 5 and 6 depicting the performance evaluation of the developed LM and BR-based ANN models, in comparison to the gauging station data for both the downstream catchment locations, High Street Road at Ashwood and Great Valley Road at Gardiners when using a single input variable and multiple input variables, it can be observed that the LM-ANN models share a strong positive correlation and goodness-of-fit with the actual output, outperforming the BR-ANN model, which is clearly exhibited by the goodness-of-fit and close alignment between the actual and LM-ANN plots. The inclusion of a second input parameter, the upstream flow, has greatly contributed towards improving the model performance and prediction accuracy, in comparison to using only a single input variable, proved by the improved correlation, goodness-of-fit and minimum absolute error. The LM-ANN and BR-ANN models’ performances and prediction accuracy were further validated using another storm event, 4 February 2011, that transpired within the catchment, where the same procedure was repeated and the results were comparatively analysed. The performance evaluation plots of LM and BR-based ANN models, in comparison with the gauging station data for the February 2011 storm event have only been plotted out, for when the modelling used two input variables for the two downstream locations—High Street Road at Ashwood and Great Valley Road at Gardiners only for both the training and testing datasets and can be seen in Figs. 7 and 8. Referring to Figs. 7 and 8, where the developed LM and BR-based ANN models’ performances were validated using another storm event, February 4th 2011, it can
0.781
Test
Great valley road at Gardiners
0.773
Train
High street road at Ashwood
0.905
0.741
Train
Test
54.9
81.8
60.9
59.8
0.091
0.078
0.155
0.154
MAE
0.724
0.901
0.710
0.779
R
BR
52.4
81.1
50.4
60.7
R2
0.097
0.083
0.167
0.148
MAE
0.984
0.986
0.915
0.987
R
96.8
97.1
83.7
97.4
R2
0.033
0.021
0.112
0.028
MAE
0.978
0.981
0.893
0.987
R
BR
95.6
96.2
79.7
97.4
R2
0.057
0.028
0.116
0.013
MAE
LM
R2
LM
R
Rainfall and U/S Flow from catchment as input variables
Rainfall only as input variable
Train/test
Station name
Table 2 Summary of comparative assessment of ANN model performance at Ashwood and Gardiners using only one input variable—rainfall—and using two input variables—rainfall and u/s flow for both LM-ANN and BR-ANN models
9 Artificial Neural Network Modelling for Simulating Catchment Runoff … 109
110
H. Balacumaresan et al. LM-ANN
5/11/2018
BR-ANN
6/11/2018 Date and Time
7/11/2018
17:18
22:42
21:48
20:00
20:30
19:36
19:06
17:00
10:48
14:06
10:30
9:48
10:12
9:24
7:30
9:06
6:54
6:30
4:24
2:30
3:06
0:12
Discharge (m3/s)
Actual Output 45 40 35 30 25 20 15 10 5 0
9/11/2018
Fig. 3 Performance evaluation plots of LM and BR-based ANN model for High Street Road at Ashwood catchment location for training dataset for November 2018 storm event using single input variable (above) and two input variables (below) Actual Output
LM-ANN
BR-ANN
45 40
Discharge (m3/s)
35
30 25 20 15 10 5 0:12 2:12 2:42 3:06 4:18 4:42 6:30 6:42 7:06 7:30 8:54 9:12 9:24 9:42 9:54 10:12 10:24 10:36 10:48 11:12 23:18 17:00 19:00 19:12 19:36 19:54 20:12 20:30 21:30 22:12 22:42 17:12
0
5/11/2018
6/11/2018
Date and Time
Discharge (m3/s)
Actual Output
7/11/2018
LM-ANN
9/11/2018
BR-ANN
40 35 30 25 20 15 10 5 0 3:18 3:30 9:36 10:00 10:54 11:06 11:18 14:54 18:42 19:18 19:30 19:42 20:06 20:48 22:30 19:42 5/11/2018
6/11/2018
7/11/2018
9/11/2018
Date and Time
Fig. 4 Performance evaluation plots of LM and BR-based ANN model for High Street Road at Ashwood catchment location for testing dataset for November 2018 storm event using single input variable (above) and two input variables (below)
9 Artificial Neural Network Modelling for Simulating Catchment Runoff …
Discharge (m3/s)
Actual Output
LM-ANN
111 BR-ANN
40 35 30 25 20 15 10 5 0 3:18 3:30 9:36 10:0010:5411:0611:1814:5418:4219:1819:3019:4220:0620:4822:3019:42 5/11/2018
6/11/2018
7/11/2018
9/11/2018
Date and Time
Actual Value
LM-ANN
BR-ANN
140 Discharge (m3/s)
120 100 80 60 40 20
2:12 2:54 3:06 3:24 4:06 6:24 6:36 7:12 7:30 9:48 10:06 10:18 10:30 10:42 11:00 14:48 17:00 18:36 18:48 19:00 19:12 19:30 19:48 20:00 20:12 20:30 20:54 21:06 21:30 21:42 18:36
0
5/11/2018
6/11/2018
7/11/2018 Date and Time
LM-ANN
BR-ANN
2:12 2:54 3:06 3:24 4:06 6:24 6:36 7:12 7:30 9:48 10:06 10:18 10:30 10:42 11:00 14:48 17:00 18:36 18:48 19:00 19:12 19:30 19:48 20:00 20:12 20:30 20:54 21:06 21:30 21:42 18:36
Discharge (m3/s)
Actual Value 140 120 100 80 60 40 20 0
5/11/2018
6/11/2018
7/11/2018 Date and Time
Fig. 5 Performance evaluation plots of LM and BR-based ANN model for Great Valley Road at Gardiners Catchment location for training dataset for November 2018 storm event using single input variable (above) and two input variables (below)
112
H. Balacumaresan et al. Actual Value
LM-ANN
BR-ANN
140 Discharge (m3/s)
120 100 80 60 40
20 0 1:48 4:36 5:24 6:54 9:42 9:54 10:54 11:06 11:18 19:18 20:18 20:42 20:48 21:24 17:06 5/11/2018
6/11/2018
7/11/2018
9/11/2018
Date and Time
Actual Value
LM-ANN
BR-ANN
Discharge (m3/s)
140 120 100 80 60
40 20 0 1:48 4:36 5:24 6:54 9:42 9:54 10:54 11:06 11:18 19:18 20:18 20:42 20:48 21:24 17:06 5/11/2018
6/11/2018 Date and Time
7/11/2018
9/11/2018
Fig. 6 Performance evaluation plots of LM and BR-based ANN model for Great Valley Road at Gardiners Catchment location for testing dataset for November 2018 storm event using single input variable (above) and two input variables (below)
be inferred that the LM-ANN model displays a very strong positive correlation and a high goodness-of-fit has been achieved between the LM-ANN simulated value and the actual gauging station data. Also, the LM-ANN model clearly outperforms the BR-ANN model, which can be noticed from the plots. As discussed by Ndehedehe et al. [25] and Berhanu et al. [27], given the holistic complexity of upstream–downstream linkages of hydrological processes, the flow from the catchment location upstream was incorporated as an input variable alongside the localised rainfall in estimating the expected flow downstream. As depicted by the results, the upstream catchment flow has a major impact upon the flow being generated at the downstream location, given how its inclusion has contributed towards enhancing the flow estimation accuracy of the ANN models and provides a more realistic representation of the urban catchment response to a major storm event [25, 27]. This practice of incorporating the upstream catchment flow alongside the localised rainfall as input variables in estimating the expected flow can be considered for
9 Artificial Neural Network Modelling for Simulating Catchment Runoff … Actual Output
LM-ANN
113 BR-ANN
70
Discharge (m3/s)
60 50 40 30 20 10 1:30 5:30 7:00 8:30 14:00 18:30 19:00 19:30 19:30 20:30 21:00 21:00 21:30 22:00 22:30 23:00 23:30 23:30 1:30 2:00 2:30 4:00 4:00 4:30 5:00 5:30 6:00 6:30 7:00 8:30 9:30 10:30
0
4/02/2011
5/02/2011
6/02/2011
Date and Time
Fig. 7 Performance evaluation plots of LM and BR-based ANN models for High Street Road at Ashwood streamflow station for training dataset (above) and test dataset (below) for February 2011 storm event
implementation in data scarce areas as well as in ungauged catchment locations, to obtain a better estimate of the expected flood flow and flood quantiles. Considering the performances of the LM-based ANN model and BR-based ANN model, as showcased by the results, it can be inferred that the LM-based ANN model outperforms the BR-based ANN model in accurately estimating the expected flow at the three catchment locations, when compared against the gauging station flow records, in terms of its strong positive correlation, the goodness-of-fit achieved between the LM-ANN simulated values and the actual observations and the low MAE values obtained, which were relatively much lower than the MAE values generated by the BR-ANN model simulations, providing further confidence on the LM-ANN’s predictive capability and results accuracy. As previously investigated by Tabassum & Dar [33] in their research, where the performance of LM-based ANN model was evaluated against a BR-based ANN model in the flow prediction of an alluvial Himalayan river, the LM-based ANN model in this research project also displays superior performance over the BR-ANN model and outperforms it, providing further confidence that the LM-based ANN model is the ideal model type to be considered for implementation in accurate estimation of the expected flow and studying the catchment response of urban catchments. Therefore, the LM-based ANN model can be considered as a potential alternative that can be implemented towards enhancing the accuracy of the current flow estimation process of urban catchments and effectively estimating the catchment runoff based upon minimal data providence. The simple methodology investigated in this study, involving the incorporation of the upstream catchment flow alongside the localised rainfall as input variables for estimating the flow, can also be adapted as a potential alternative solution for implementation in ungauged urban catchments located in data scarce regions.
114
H. Balacumaresan et al. LM-ANN
BR-ANN
1:30 6:30 8:00 18:30 18:30 19:00 19:30 20:30 20:30 21:00 21:30 22:00 22:30 23:00 23:30 0:30 2:00 2:30 3:00 3:30 4:00 4:30 5:00 5:30 6:00 6:00 7:00 7:30 21:30 10:30 1:30 3:30
Discharge (m3/s)
Actual Value 400 350 300 250 200 150 100 50 0
4/02/2011
5/02/2011
6/02/2011 8/02/2011
Date and Time Actual Output
LM-ANN
BR-ANN
70 Discharge (m3/s)
60 50 40 30
20 10
4/02/2011
10:30
21:30
5/02/2011
22:30
21:30
8:30
8:30
6:30
5:30
5:00
5:00
3:30
3:30
3:30
2:00
3:00
22:30
23:00
20:00
22:00
19:00
20:00
18:30
18:30
8:00
0
6/02/2011
Date and Time
LM-ANN
4/02/2011
BR-ANN
21:30
8:30
8:00
6:30
5:30
5:30
5:00
4:30
4:30
3:30
23:30
22:30
22:30
21:30
22:00
21:00
21:00
19:00
18:30
18:00
14:00
8:00
7:30
6:30
Discharge (m3/s)
Actual Value 400 350 300 250 200 150 100 50 0
5/02/2011
Date and Time
Fig. 8 Performance evaluation plots of LM and BR-based ANN models for Great Valley Road at Gardiners streamflow station for training dataset (above) and test dataset (below) for February 2011 storm event
9 Artificial Neural Network Modelling for Simulating Catchment Runoff …
115
5 Future Works The future scope of work in this research project will involve the development of a hydrological model of Gardiners Creek catchment using the Victorian Water Industry’s standard hydrological practice, RORB software, which will be calibrated using the same storm events, 4 February 2011 and 6 November 2018, so that the modelled flood hydrograph(s) from RORB closely resemble the observed hydrograph(s) at the streamflow gauging stations [23]. Further investigation on the developed ANN model’s performance, prediction accuracy and results’ reliability by comparatively assessing and validating against actual observations and flood flow estimates from the calibrated RORB model will be conducted, employing multiple statistical indices attesting to the strength of the linear relationship ®, goodness-of-fit being achieved (R2 ) and the magnitude of difference (MAE) coexisting between the predicted and actual values. This comparative assessment is envisioned to further showcase the suitability of the developed ANN model for application in enhancing the accuracy of the catchment runoff estimation process, in lieu of the favourable benefits the model has to offer, such as optimum model performance, nonlinearity and high prediction accuracy, which overcome most of the limitations associated with the conventional model (RORB) and provide more reliable and accurate flow estimates. Future work will also investigate the applicability potential of the developed ANN model in the estimation of expected future flood flow in the respective urban catchments (under analysis), for extreme rainfall intensity increases, under the future climate change scenario [34]. Future projected rainfall data for future timescales (until 2100), using two emissions scenarios—RCP4.5 (medium emissions) and RCP8.5 (high emissions) are being considered for use in the future flood flow estimations. Given how accurate estimation of expected flood flow is the most critical component in any flood forecasting applications, the expected future flood flow estimates under climate change scenario can then be considered for employment in driving a suitable hydraulic modelling software in determining the expected future flood quantiles (expected flood level, flow velocity). These flood quantiles can then be considered by the relevant water authorities and policymakers in making informed decisions towards flood planning activities and devising suitable mitigation measures to counteract the residual flood risk.
6 Conclusion The primary aim of this research project was to assess and comprehend the effectiveness of ANN models in accurately predicting the catchment runoff based upon minimal data providence, as a bid towards producing an ANN model that is capable of accurately estimating catchment runoff in gauged/ungauged urban catchments located in data scarce regions with limited data availability across Australia. This research utilises the localised rainfall as the primary input parameter to estimate
116
H. Balacumaresan et al.
the catchment runoff generated at the first catchment location (upstream), following which it incorporates the upstream flow from the previous catchment location as an additional input parameter, alongside the localised rainfall, in estimating the runoff in the following catchment locations, located further downstream. Two different learning algorithms—LM and BR—are used to train the ANN model. The developed ANN model was calibrated and validated for multiple historical storm events— 6 November 2018 and 4 February 2011, respectively at three catchment locations with gauging stations, Great Valley Road at Gardiner (Station ID 229624A), High Street Road at Ashwood (Station ID 229625A) and Eley Road East Drain at Eley Road retarding basin (Station ID 229638A). When referring to the results, a strong positive correlation and goodness-of-fit can be observed between the actual observations and the LM-based ANN model simulated output, with low error values ranging between 0 and 0.1, in both the calibration and validation datasets, at all three locations, showing superior performance and outperforming the BR-based ANN model The results thus suggest that the developed LM-based ANN model is highly effective and capable of accurately estimating catchment runoff, based upon minimal data providence (only two parameters—rainfall and flow from upstream), and can be considered a feasible option that is both cost-effective and capable of working with limited data availability, towards the application of accurate flood flow estimation in urban catchments, located in data scarce regions across Australia. The future scope of work in this research project will involve the developed ANN model’s performance, prediction accuracy and results’ reliability, being validated in comparison against actual observations (flow records from the gauging station) and flood flow predictions of the Victorian Water Industry’s standard hydrological model, RORB. The developed ANN model is also planned to be adopted for estimation of expected future flood flow in urban catchments for extreme rainfall intensity increases, under the future climate change scenario. The expected results can then be interpreted towards making informed decisions towards flood planning activities and devising suitable mitigation measures to counteract the expected residual flooding risk.
References 1. Mosavi A, Ozturk P, Chau KW (2018) Flood prediction using machine learning models: literature review. Water (Switzerland) 10:1–40. https://doi.org/10.3390/w10111536 2. Teng J, Jakeman AJ, Vaze J et al (2017) Flood inundation modelling: a review of methods, recent advances and uncertainty analysis. Environ Model Softw 90:201–216. https://doi.org/ 10.1016/j.envsoft.2017.01.06 3. Khosravi K, Pham BT, Chapi K et al (2018) A comparative assessment of decision trees algorithms for flash flood susceptibility modeling at Haraz watershed, Northern Iran. Sci Total Environ 627:744–755. https://doi.org/10.1016/j.scitotenv.2018.01.26 4. Zhang S, Pan B (2014) An urban storm-inundation simulation method based on GIS. J Hydrol 517:260–268. https://doi.org/10.1016/j.jhydrol.2014.05.04
9 Artificial Neural Network Modelling for Simulating Catchment Runoff …
117
5. Chapi K, Singh VP, Shirzadi A et al (2017) A novel hybrid artificial intelligence approach for flood susceptibility assessment. Environ Model Softw 95:229–245. https://doi.org/10.1016/j. envsoft.2017.06.12 6. Asadi H, Shahedi K, Jarihani B, Sidle RC (2019) Rainfall-runoff modelling using hydrological connectivity index and artificial neural network approach. Water (Switzerland) 11. https://doi. org/10.3390/w11020212 7. Khosravi K, Panahi M, Golkarian A et al (2020) Convolutional neural network approach for spatial prediction of flood hazard at national scale of Iran. J Hydrol 591:125552. https://doi. org/10.1016/j.jhydrol.2020.125552 8. Kim H Il, Han KY (2020) Urban flood prediction using deep neural network with data augmentation. Water (Switzerland) 12. https://doi.org/10.3390/w12030899 9. Paul A, Das P (2014) Flood prediction model using artificial neural network. Int J Comput Appl Technol Res 3:473–478. https://doi.org/10.7753/ijcatr0307.1016 10. Lee J, Kim CG, Lee JE, et al (2018) Application of artificial neural networks to rainfall forecasting in the Geum River Basin, Korea. Water (Switzerland) 10. https://doi.org/10.3390/w10 101448 11. Chu H, Wu W, Wang QJ et al (2020) An ANN-based emulation modelling framework for flood inundation modelling: application, challenges and future directions. Environ Model Softw 124:104587. https://doi.org/10.1016/j.envsoft.2019.104587 12. Xie S, Wu W, Mooser S et al (2021) Artificial neural network based hybrid modeling approach for flood inundation modeling. J Hydrol 592. https://doi.org/10.1016/j.jhydrol.2020.125605 13. Wang G, Yang J, Hu Y et al (2022) Application of a novel artificial neural network model in flood forecasting. Environ Monit Assess 194:1–13. https://doi.org/10.1007/s10661-022-097 52-9 14. Kao IF, Liou JY, Lee MH, Chang FJ (2021) Fusing stacked autoencoder and long short-term memory for regional multistep-ahead flood inundation forecasts. J Hydrol 598:126371. https:// doi.org/10.1016/j.jhydrol.2021.126371 15. Melbourne Water & Water Technology (2010) Flood mapping of Gardiners Creek October 2010. Melbourne, Australia 16. Chang DL, Yang SH, Hsieh SL et al (2020) Artificial intelligence methodologies applied to prompt pluvial flood estimation and prediction. Water (Switzerland) 12:1–23. https://doi.org/ 10.3390/w12123552 17. Wang R-Q (2021) Artificial intelligence for flood observation. Elsevier Ltd 18. Aziz K, Kader F, Ahsan A, Rahman A (2015) Development and validation of artificial intelligence based regional flood estimation model for eastern. In: Partnering with industry and the community for innovation and impact through modelling: proceedings of the 21st International congress on modelling and simulation (Modsim2015), 29 November–4 December 2015, Gold Coast, Queensland. Gold Coast, Australia, pp 2165–2171 19. Lin Q, Leandro J, Wu W et al (2020) Prediction of maximum flood inundation extents with resilient backpropagation neural network: case study of Kulmbach. Front Earth Sci 8:1–8. https://doi.org/10.3389/feart.2020.00332 20. Clay L, Aubrey S (2023) Flood alert system failed, leaving Maribyrnong residents to flee rising water. The Age 1 21. Aubrey S, Dow A (2022) It was surprisingly fast: SES defends late warnings for Maribyrnong residents. The Age 1 22. Madhuri R, Sistla S, Srinivasa Raju K (2021) Application of machine learning algorithms for flood susceptibility assessment and risk management. J Water Clim Chang 12:2608–2623. https://doi.org/10.2166/wcc.2021.051 23. Tamiru H, Dinka MO (2021) Application of ANN and HEC-RAS model for flood inundation mapping in lower Baro Akobo River Basin, Ethiopia. J Hydrol: Region Stud 36:100855. https:// doi.org/10.1016/j.ejrh.2021.100855 24. Xie K, Liu P, Zhang J et al (2021) Physics-guided deep learning for rainfall-runoff modeling by considering extreme events and monotonic relationships. J Hydrol 603:127043. https://doi. org/10.1016/j.jhydrol.2021.127043
118
H. Balacumaresan et al.
25. Ndehedehe CE, Onojeghuo AO, Stewart-Koster B et al (2021) Upstream flows drive the productivity of floodplain ecosystems in tropical Queensland. Ecol Ind 125:107546. https://doi.org/ 10.1016/j.ecolind.2021.107546 26. Victoria State Emergency Services (2019) Glen Iris local flood guide. Melbourne, Australia 27. Berhanu B, Seleshi Y, Amare M, Melesse AM (2016) Upstream–downstream linkages of hydrological processes in the Nile River basin. In: Springer geography, pp 207–223 28. Rajaee T, Ebrahimi H, Nourani V (2019) A review of the artificial intelligence methods in groundwater level modeling. J Hydrol 572:336–351. https://doi.org/10.1016/j.jhydrol.2018. 12.037 29. Sultan D, Tsunekawa A, Tsubo M et al (2022) Evaluation of lag time and time of concentration estimation methods in small tropical watersheds in Ethiopia. J Hydrol Reg Stud 40:101025. https://doi.org/10.1016/j.ejrh.2022.101025 30. Choudhury TA, Wei J, Barton A et al (2018) Exploring the application of artificial neural network in rural streamflow prediction—a feasibility study. In: IEEE International symposium on industrial electronics 2018-June, pp 753–758. https://doi.org/10.1109/ISIE.2018.8433644 31. Khoirunisa N, Ku CY, Liu CY (2021) A GIS-based artificial neural network model for flood susceptibility assessment. Int J Environ Res Public Health 18:1–20. https://doi.org/10.3390/ije rph18031072 32. Swara AA, KRN (2020) Predicting flood using artificial neural networks. Int J Appl Eng Res 15:53–57 33. Tabbussum R, Dar AQ (2020) Analysis of Bayesian regularization and Levenberg–Marquardt training algorithms of the feedforward neural network model for the flow prediction in an Alluvial Himalayan River. In: Bansal JC, Deep K, Nagar AK (eds) Proceedings of ICCCMLA 2019. Springer International Publishing, pp 43–50 34. Bhattarai R, Bhattarai U, Pandey VP, Bhattarai PK (2022) An artificial neural networkhydrodynamic coupled modeling approach to assess the impacts of floods under changing climate in the East Rapti Watershed, Nepal. J Flood Risk Manage 15(4):1–19
Chapter 10
Effective Decision Making Through Skyline Visuals R. D. Kulkarni, S. K. Gondhalekar, and D. M. Kanade
1 Introduction A skyline query raised against a multidimensional dataset includes multiple preferences from a user and summarizes the best objects from the dataset accordingly. The concept of skyline queries finds application in the area of decision support systems where goal is to select optimum number of objects to satisfy user’s preferences on various performance indicator parameters specified by the user. For example, suppose a person is going on holiday to Goa and he is looking for a homestay that is both low-priced and nearby the beach. As it should be the homestays nearer to the beach tend to be more expensive. However, the person is interested in all such homestays that are not worse than any other homestay on both the dimensions of ‘price’ and ‘distance’. This set of interesting homestays is the ‘skyline’. It helps the user to weigh personal inclinations. So, the skyline computation is known as the maximum vector problem. This means for every point ‘p’ in the skyline there exists a monotone scoring function such that ‘p’ maximizes the scoring function. Since there is no direct support for skyline queries in databases engines, they are executed by writing multiple nested queries and the level of nesting depends upon the number of preferences a user has. The large number of user preferences produce highly nested queries resulting into unacceptable response time. To overcome this problem, many techniques have been proposed by the database research community. These recent skyline computation techniques are computation oriented that aim to R. D. Kulkarni (B) · S. K. Gondhalekar · D. M. Kanade K. K. Wagh Institute of Engineering Education and Research, Nasik, India e-mail: [email protected] S. K. Gondhalekar e-mail: [email protected] D. M. Kanade e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_10
119
120
R. D. Kulkarni et al.
improve the response time of skyline computation by leveraging the features of parallel programing, data mining, computing architectures, and hardware advents. The ultimate aim of all such skyline computation algorithms is to provide precise cut of the database that helps the user to take effective decision in shortest possible time. However, in this era of big data, the datasets are tending to be very large, and hence, the resultant skylines also tend to be huge. When huge skylines are presented to the user, they make the process of decision making tedious and time-consuming. Hence, the big data scientists make use of data visualization techniques visualize the data, get insights from the visuals, and make effective decisions. The same concept can be extended to the skyline queries. Through this paper, • We present couple of visualization solutions for exploring the skyline which are called as ‘skyline visuals’. These visuals ease the job of end user who wish to take optimum decision from a huge skyline. • We present various types of skyline visuals that help to understand various other aspects of data to make the meaningful decision. The remainder of the paper has been organized as follows. Section 2 explores the research work in this area, while the Sect. 3 details the techniques related to skyline visuals. Section 4 highlights the conclusions and discusses the future scope.
2 Related Work The concept of skyline operator [1] was introduced in 2001. The skyline computation uses the notion of ‘dominance’, where a point is said to dominate other point, if it is if it is as good or better in all dimensions and better in at least one dimension. The set of all such points is the skyline generated from user preferences. This research proposed the algorithms such as Block Nested Loop (BNL) and Divide-and-Conquer (D&C). Since then, as per the advents in technologies and computational architectures, many solutions have been proposed by researchers for the skyline computation. The initial algorithms focused on reducing the computational complexity by reducing the search space and the number of scans. Some of these algorithms are Sort Filter Skyline (SFS) [2], Sort and Limit Skyline (SaLSa) [3], and Linear Elimination Sort of Skyline (LESS) [4]. As the popularity of distributed networks and parallel computing grew, the skyline computing solutions focused to improve the response time of computation by using the methods like selection of skyline-contributing peer, reducing communication costs within the networks, making use of map-reduce paradigm, etc. Some the related research efforts are Distributed Skyline (DSL) [5–7] use of FPGAs [8], map-reduce-based [9, 10]. Some of the skyline computing techniques tend to utilize the features of underlying network to reduce the computational complexity by reducing the subspace required for the computation. Such few techniques are SSP [11], iSky [12], Skyframe [13]. With high dimensional dataset, the data mining concepts and algorithms have also been applied to skyline computation. Few of such research efforts are Compressed SkyCube (CSC) [14–16]. With advent
10 Effective Decision Making Through Skyline Visuals
121
of interactive, graphical, user interfaces to exhibit the user preferences, data visualization solutions have been adopted in many of the areas. Few of the research efforts in this area can be found in [17, 18]. In this paper, we present a couple of solutions, namely ‘skyline visuals’ to visualize the skyline. The importance of these technological solutions is that they allow user to interact with them and make a better decision or choice with their multidimensional preferences. They provide a means to compute couple of types of skyline queries. The next section details these techniques.
3 Visualizing the Skyline A skyline query is a multi-preference query, which identifies the ‘best’ objects in a multi-attribute dataset. The computation of a skyline is based on a scoring function. The points in the skyline are those points, which maximize this scoring function [1]. For example, examine the skyline query given next: ‘Find a hotel in Juhu having min distance to beach, min crowd, min tariff, min street noise’. This is a 4D, i.e., fourdimensional skyline query. As the number of user preferences, size of the dataset grows skylines tend to be huge. Now imagine that the user is driving and navigating to such hotel using maps. It will be a tedious task for such user to read the textual recommendations resulting out of a skyline computing algorithm. Similarly, when user wishes to selectively opt for knowing the details of other parameters of the entity being queried (like entity ‘hotel’ in above example), then it won’t suffice to answer only the required dimensions related to the entity. Also given an optimum range of all hotels satisfying the required preferences, user will have freedom of making a flexible choice. To suit to all similar requirements, we propose visualizing the skylines through ‘skyline’ visuals. They benefit in the following ways: • User is able to visualize skyline recommendations in form of a graphic which is a better choice over a textual report. • User is able to interact with the skyline query dimensions and get (drill-down) more detailed information about the entity. • User is able to explore and choose from an optimum range of skylines. • User is able to interact with skyline to make an optimum decision. To achieve all these data visualization tools, prove to be of great support. With availability with these tools, the user is able to perform decision making more effectively and interactively. In next sub-section we present how the concept of ‘skyline visuals’ helps to achieve the above stated features.
122
R. D. Kulkarni et al.
3.1 Skyline Visuals When a skyline query results into very large number of data points, the job of end user of the same becomes difficult as the user then has to make the decision from a larger dataset. To resolve this, different variants of a skyline query has been proposed as given below. Top-k Skyline: The top-k (also called as range) skyline queries return the most promising ‘k’ objects from the dataset, where ‘k’ is specified by the user [6, 19–21]. Dynamic Skyline Queries: A dynamic skyline query retrieves all those data objects from a dataset for a query point ‘q’, not dynamically dominated by any other point. This means instead of a strict skyline, data objects nearer to skyline points are also included in the dynamic skyline set [22]. Constrained Skyline Queries: Here the user specifies constraints on the one or more query dimensions. Such skyline queries are called as constrained skyline queries, and they return the best, promising data objects under the constraints specified on the query dimensions [23]. Representative Skyline Queries: The top-k skyline queries may represent skyline points belonging only to a particular cluster and may fail to represent the extreme points in contrast to the general skyline query. To get the overall idea about the available choices, user specifies another value ‘k’ from the skyline points which are the nearest neighbors at a distance ‘k’. The accumulated skyline in this way represents the overall dataset. This is representative skyline [24]. Using such various types of skylines queries the end user is in better position to make effective decisions. The user experience and efficiency of making the decision can further be enhanced when the resultant skyline is visualized, and the end user has an option to interact with it before making a final decision. Finally, the ease, efficiency, and richer user experience have always been the parameters to attract the end user community toward any application, service, or platform. And all these are the exactly benefits of the ‘skyline’ visuals. The next sub-section details this.
3.2 Creating the Skyline Visuals In this sub-section, we explore how the skyline visuals can be created. In the examples given next, we have utilized the popular dataset of NBA players available at https:// www.kaggle.com/justinas/nba-players-data. This dataset has seven dimensions and contains total 3921 tuples. We have used Google Data Studio (GDS) as the data visualization tool, and all the figures in this section are part of the GDS report. In all the skyline visuals that we present next, we have assumed the ‘min’ constraint on all the query dimensions.
10 Effective Decision Making Through Skyline Visuals
123
To get the skyline of ‘min. height, min. weight’, we opted a table chart in GDS, setting the ‘metrics’ as min. height, min. weight, and ‘dimensions’ as Player name, birth city, and college. Now to obtain the top-k skyline, the ‘rows per page’ property that indicates ‘k’ has been set to 5. Also, the ‘sort’ properties have been set to ascending on both of the above metrics. The result has been depicted in Fig. 1. Figure 1a depicts the top five tuples as per the query dimensions. This is a topk skyline query. In GDS, enabling the visual property like ‘drill-down’, user can mindfully select the player according to other dimensions like player-city. The same has been depicted into Fig. 1b. Also, by using the other visual properties like ‘slider’ and putting report wide control like ‘input box’ user is able to see constrained skyline as per his input. Also, the ‘advanced filters’ option enables the user to dig into the skyline further and browse the other values. This is dynamic skyline. The latter two solutions are shown in Fig. 1c and Fig. 1d, respectively. The GDS tool also offers to design the skyline using other visual types like line chart or scatter plot. This is shown in Fig. 2 and Fig. 3, respectively. They enable to detect outliers, find representative skyline or sky clusters. This is how the skyline visuals ease the task of end user by giving him better options to explore around the skyline. The next section concludes the paper with discussion on future scope.
4 Conclusion and Future Scope The skyline research community aims to optimize the query performance for users. However, in era of big data and data visualization the skyline presentation in a neat and precise to understand graphic is also important. This need becomes particularly significant when the resultant skyline is huge and complicates the end user’s task of decision making. Then, the proposed concept of the ‘skyline visuals’ assists. It enables the end user to explore the skyline in an easy to interpret manner using the pictorial view over the traditional textual outcomes of the skyline queries. Another benefit of the skyline visuals is that it helps the user to interactively select the option of his choice with ease and better exploration. These visuals avail ease of understanding and flexibility to discover variants of skylines. Hence, they enhance the end user experience making the decision-making process easy and hassle-free. This feature is appreciated by most of the BI tools used by business heads. The future scope this work can be creating the skyline visuals by accepting the vocal commands from end user using the skyline query dimensions. The AI-based data visualization tools like Microsoft power BI, and others need to be explored for this option. Also, various other types of skyline queries like continuous skyline, real-time skyline, location dependent, or spatial skyline queries need to be explored. The skyline visuals through geographical maps can be of great help to a navigator.
124
R. D. Kulkarni et al.
Fig. 1 a Skyline min. height, min. weight. b Drilling into skyline for better choice. c User interaction with skyline using sliders. d User interaction with skyline using search
10 Effective Decision Making Through Skyline Visuals
125
Fig. 2 Skyline using the line-chart visual
Fig. 3 Skyline using the scatter plot visual
References 1. Borzsonyi S, Kossmann D, Stocker K (2001) The skyline operator. In: IEEE International conference on data engineering. Heidelberg, pp 421–430 2. Chomicki J, Godfrey P, Gryz J, Liang D (2003) Skyline with presorting. In: IEEE International conference on data engineering, pp 717–719 3. Bartolini I, Ciaccia P, Patella M (2006) SaLSa: computing the skyline without scanning the whole sky. In: ACM International conference on information and knowledge management, pp 405–411 4. Godfrey P, Shipley R, Gryz J (2005) Maximal vector computation in large data sets. In: International conference on very large databases, pp 229–240 5. Wu P, Zhang C, Feng Y, Zhao B, Agrawal D, Abbadi A (2006) Parallelizing skyline queries for scalable distribution. In: International conference on extending database technology, pp 112–130 6. Vlachou A, Doulkeridis C, Norvag K (2012) Distributed top-k query processing by exploiting skyline summaries. J. Distrib Parallel Databases 30(3–4):239–271 7. Rocha-Junior J, Vlachou A, Doulkeridis C, Norvag K (2011) Efficient execution plans for distributed skyline query processing. In: Proceedings of ACM International conference on extending database technology, pp 271–282 8. Woods L, Alonso G, Teubner J (2013) Parallel computation of skyline queries. In: IEEE 21st annual international symposium on field-programmable custom computing machines, pp 1–8
126
R. D. Kulkarni et al.
9. Anisuzzaman Siddique Md, Tian H, Qaosar M, Morimoto Y (2019) MapReduce algorithm for variants of skyline queries: Skyband and dominating queries, J on algorithms. MPDI, pp 1–14 10. Choudhury ZZ, Zaman A, Hamid ME (2018) Efficient processing of area skyline query in MapReduce framework. In: IEEE International conference on electrical and computer engineering, pp 79–82 11. Wang S, Ooi B, Tung A, Xu L (2007) Efficient skyline query processing on peer-to-peer networks. In: Proceedings of IEEE International conference on data engineering, pp 1126–1135 12. Chen L, Cui B, Lu H, iSky: efficient and progressive skyline computing in a structured P2P network. In: Proceedings of IEEE International conference on distributed computing systems, pp 160–167 13. Wang S, Vu SQ, Ooi B, Tung A, Xu L (2009) Skyframe: a framework for skyline query processing in peer-to-peer systems. J VLDB 18(1):345–362 14. Xia T, Zhang D (2005) Refreshing the sky: the compressed Skycube with efficient support for frequent updates. In: Proceedings of ACM SIGMOD International conference on management of data, pp 493–501 15. Zhang N, Li C, Hassan N, Rajasekaran S, Das G (2014) On Skyline groups. IEEE Trans Knowl Data Eng 26(4):942–956 16. Yuan Y, Lin X, Liu Q, Wang W, Yu JX, Zhang Q (2005) Efficient computation of the skyline cube. In: Proceedings of IEEE International conference on very large databases, pp 241–252 17. Kim W, Shim C, Chung YD (2021) SkyFlow: a visual analysis of high-dimensional skylines in time-series. J Vis 24:1033–1050 18. Gogolou T, Tsandilas TP, Bezerianos A (2019) Comparing similarity perception in time series visualizations. IEEE Trans Vis Comput Graph 25(1):523–533 19. Mouratidis K, Li K, Tang B (2021) Marrying top-k with skyline queries: relaxing the preference input while producing output of controllable size. In: International conference on management of data, pp 1317–1330 20. Zheng Z, Zhang M, Yu M, Li D, Zhang X (2021) User preference-based data partitioning top-k skyline query processing algorithm. In: IEEE International conference on industrial application of artificial intelligence (IAAI), Harbin, China, pp 436–444 21. Han X, Wang B, Li J et al (2019) Ranking the big sky: efficient top-k skyline computation on massive data. J Knowl Inf Syst 60:415–446 22. Chen L, Lian X (2008) Dynamic skyline queries in metric spaces. In: International conference on extending database technology: advances in database technology, pp 333–343 23. Chen L, Cui B, Lu H (2011) Constrained skyline query processing against distributed data sites. IEEE Trans Knowl Data Eng 23(2):204–217 24. Lin X, Yuan Y, Zhang Q, Zhang Y (2007) Selecting stars: the k most representative skyline operator. In: Proceedings of IEEE international conference on data engineering. Istanbul, Turkey, pp 86–95
Chapter 11
A Review Paper on the Integration of Blockchain Technology with IoT Anjana Rani and Monika Saxena
1 Introduction The Internet of Things is one of the growing revolutions in which physical gadgets are allowed to communicate via diversified networks [1]. The enormous availability of wireless connectivity and the also the decrement in the processing prices are two important factors that lead to the exponential development in the usage of the IoT devices [2]. According to Gartner [3], the total devices attached with IoT universally is anticipated to jump to 30.9 billion units by 2025 which is a keen bounce from 13.8 billion units that are projected in 2021. It is increasingly used in multiple sectors such as industrial control, automation at home, heath sector, travel, wearables, etc. [4]. With the recent development of the IoT devices, security became the main and important problem as it is very crucial to protect the information which is produced by IoT devices. In IoT to interconnect the nodes, robustness, authentication privacy and security are the necessary aspects and all these characteristics came in the Blockchain technology, the concept of this technology presented by the Satoshi Nakamoto with the production of the first cryptocurrency, Bitcoin [5]. Blockchain technology is a dispersed, encoded database prototype, peer-to-peer (P2P) network transaction organization which settle down the secured execution of operations that are executed on a segregated ledger database that is divided between many nodes which are applicable in constructing the blockchain network. It is a remarkable security structure which is established on the cryptography, consensus algorithms and also on the communication technology [6]. A. Rani (B) Banasthali Vidyapith, Banasthali, Rajasthan, India e-mail: [email protected] M. Saxena Department of Computer Science, Banasthali Vidyapith, Banasthali, Rajasthan, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_11
127
128
A. Rani and M. Saxena
Rest of the paper is arranged as follows: the literature review is discussed in Sect. 2 and Sect. 3 provides detailed explanation of IoT with its security issues and blockchain technology. Section 4 describes the blockchain technology. Section 5 describes the amalgamation of blockchain technology and the IoT and lastly, conclusion is provided in Sect. 6.
2 Literature Review Hassan et al. [7] explains the privacy problems which arises after combining blockchain technology with IoT devices and applications, and it also discussed 5 protection techniques for privacy like encryption, mixing, private contract differential privacy, and minimization. Tan et al. [8], in this paper authors discussed the importance as well as the role of the blockchain technology in the IIoT ecosystem and also explains that how it assists multiple attacks. Zhao et al. [9], uses the blockchain technology so that a plan to check the remote data integrity to preserve the privacy can be built for information management systems of IoT without trusting the third party. This framework helps to stop the data loss to the third party as this process doesn’t require the third-party authentication and, in this authentication, privacy and security are met easily. Alfandi et al. [2], presents a thorough literature review to resolve the privacy as well as security related issues of IoT, and to address multiple IoT challenges, research has been done on the blockchain technology and authors outlined the problems created by introducing Blockchain with IoT devices. Mistry et al. [10], discussed detailed research in respect of the IoT technology with 5G as supporting Blockchain- based industries automation for different applications such as smart city, smart agriculture, supply chain management and also healthcare 4.0. Datta et al. [11], IoT privacy as well as authentication is presented in this paper by utilizing Blockchain within the forest fire. Gang et al. [12], proposed an identity authentication mechanism for IoT systems with respect to the blockchain. In this framework device will identify the data and that data is store by using the blockchain and BCOT gateway i.e., Blockchain of Things Gateway which is proposed to record authenticated transaction. Shukla et al. [13], presented a three-tier architecture based on the Blockchain as well as Fog computing that issues the secure services for the transactions and also secure transmission near the edge which is helpful for IoT applications in healthcare to supply authenticity, reliability and security. Li et al. [14], put forward the concept of Eunomia which is a blockchain based VDF (Vehicular digital forensics) scheme which provides a secure procedure to share information for forensic purposes and has the capability to chase the fake users. Idrees et al. [15], a comprehensive study of the blockchain technology is done in this paper with the detailed solutions of the blockchain security issues arises in
11 A Review Paper on the Integration of Blockchain Technology with IoT
129
the industrial applications. This article also discussed the potential of blockchain in revolutionizing the various applications like IoT. Uddin et al. [16], in this study a consensus algorithm CBCIOT which is a Combination of the blockchain technology and IoT technology is designed for the IoT devices with respect to the blockchain technology. Mutar et al. [17], presents a system for cooperative data sharing that uses cuttingedge cloud computing and blockchain technology to complete data sharing tasks by collaborating numerous data sources and users.
3 Background 3.1 Internet of Things Over the past few years, IoT has become one of the predominant technologies of twenty-first century. Internet of things is represented as a group of physical bodies that are arranged with the software, sensors, processing ability and with further technologies, and that affix and can be defined as a correlative computing device, digital and mechanical gadgets, objects, animals as well as humans all are provided with unique Ids (UIDs) and also as a potential to transmit information over the network without human–human or human–computer interaction [18]. Whenever a communication between the IoT devices takes place, that communication constitute with following connections as shown in Fig. 1 [19, 20].
Fig. 1 Connections in IoT
130
A. Rani and M. Saxena
Fig. 2 Architecture of IoT
3.2 Architecture of IoT Many researchers in their paper [21], defined IoT architecture as a three-layered consists of a Network layer, Application Layer and Perception layer as shown in Fig. 2, which explains the functioning of IoT networks, here information is gathered from the physical objects like cameras and sensors etc. is transmitted through the way to the application layer and then the information from the perception layer can be transmitted straight via the gateway to the users via internet [22]. Perception layer [22] is the last layer of the architecture of the IoT. The main responsibility of this layer is to gather the data from environment and to transmit that data to another layer. Network layer [22] is known as the connecting layer between the perception layer and the application layer, i.e., it gets the information from the perception layer and transmits it to the application layer by using the technologies like 4G, WiFi, etc. and also known as the communication layer. Application layer [22] is the 3rd and the uppermost layer of the IoT. This layer is an interface between the users and the IoT devices. Some major protocols of this layer are HTTP, and MQTT, etc.
3.3 Applications of IoT The IOT can be used in many different facets of life, both in public as well as private sectors. The attractive feature of IoT is its versatility for organizations, businesses and also for the government branches. IoT is appraised as the future of the connected
11 A Review Paper on the Integration of Blockchain Technology with IoT
131
Fig. 3 Applications of IoT
and smart world. Smart Grids, Smart agriculture, Smart industries, etc. are the some of the IoT- based applications [23] as shown in Fig. 3.
3.4 Security Aspects of IoT By 2030, Transforma Insights’ research projects that there will be 24 billion active devices. Some of the IoT gadgets such as wireless as well as wired sensors, or any other limited devices, are used in places where it’s not very hard for unethical means to reach. These devices could also be entirely replaced. For instance, a sensor used in fields to monitor temperature and humidity can easily be tampered and can also results in a fake data. Because all the systems/devices are connected to the IoT system, some devices physical reach renders other devices vulnerable to attacks. By placing their devices in difficult-to-reach or visible locations, users can boost trust as well as security.
3.5 Attacks and Vulnerabilities in IoT IoT attacks can be called as the Cyber-attacks that gain entry to the user’s delicate data with the help of any IoT system, this can be done by installing malware on the system, or by harming the system. Following are the zones or surface areas where the attackers can originate the attacks and gets the sensitive data: devices, communication channels, applications and software [24]. IoT’s general architecture has three layers and these layers are vulnerable with different types of attacks [25]. Figure 4 shows some of the important attacks like phishing attack, sniffing attack etc. Phishing Attack: It is the practice of sending forged communications that seem to come from a respectable origin. Typically, it is done with the help of email service.
132
A. Rani and M. Saxena
Fig. 4 Characteristics of blockchain
Man in Middle Attack: It occurs when an invader privately obstructs and transfer messages between duo parties who assumed that they are commune directly with one another. Sniffing Attack: These are the data thefts that occurs when packet sniffers capture network traffic and are able to illegally access and read unencrypted data. DoS Attack: It is the attack that attempts to block access to a computer or network to its intended users. Data Transit Attack: it focuses on data moving across the network from one place to another. Sybil Attack: In this the perpetrator pretends to be a number of different people simultaneously. Sinkhole Attack: The attacker in this attack uses forged routing information to lure neighbour nodes, after which they perform selective forwarding or alter the data that passes through them. Spoofing Attack: When anybody imitate another in an attempt to gain our confidence, gain access to our systems, carry off our data, our money, or extend malware, known as spoofing. Replay Attack: When a hacker spy on a secure network communication, intercepts it, and then purposefully delays or resends it to trick the recipient into complying what the impostor wants. DDoS Attack: In this attack a large number of devices attacks on a single server.
11 A Review Paper on the Integration of Blockchain Technology with IoT
133
4 Blockchain Technology Blockchain Technology is the dispersed database technology which can be defined as the steady chain of blocks which are interlinked with each other and consists of the ledger records in a disseminate manner and these records are resistant to change and are secured using the cryptography algorithms [22]. The blockchain concept was first introduced by the Haber and Stornetta in 1991, and Nakamoto (pseudonym of a developer) executed the first Blockchain in which transactions are committed with the help of an electronic currency i.e., Bitcoin, where public ledger is used to store the records of the transactions [6]. In this technology, security is established because it uses the strong cryptographic hash and public key and also uses the decentralization technique [26]. Figure 4 shows the following important properties of the blockchain technology [27].
4.1 Types of Blockchain On the basis of the current trend, researchers define the four types of blockchain as discussed below [28, 29]: 1. Public Blockchain: Type of a blockchain which functions in a dispersed and permissionless manner without any restriction on mining, joining and transmitting transactions through the network of blockchain. 2. Private Blockchain: Type of a blockchain in which only authorized nodes or participants can join the network and can execute tasks on the network. For example: Hyperledger Fabric and Corda. 3. Consortium Blockchain: It is known as the part of the private blockchain. But, in this all the controls are shared among the group not to a single administrator. 4. Hybrid Blockchain: This blockchain uses the features of both private as well as public blockchain. It helps to take a decision between what kind of information should be private and what should be public.
4.2 Structure of Blockchain Blockchain is a sequence of blocks which mainly has two parts: Block header and Block body. A block body consists of all the transactions that take place where as the block header consists of hash of the previous block, a timestamp, nonce, and Merkel root [22, 28, 29] as shown in Fig. 5.
134
A. Rani and M. Saxena
Fig. 5 Structure of blockchain
5 Integration of Blockchain and IoT The Internet of Things (IoT) is making manual workflows digital by transforming and improving them. By providing trustworthy sharing services, blockchain can help the internet of things. The data can be backed up and proven. At any time, the information’s source can be easily identified. Additionally, the data will not change over time [30, 31]. Improve security in locations where a large number of IoT participants should securely share data. A significant shift is represented by this integration.
5.1 Blockchain Solution to IoT Blockchain technology may be able to more effectively address the issues IoT systems face. IoT systems are likely to have more devices or items that interact in the future. The internet will become a medium as the number of gadgets attempting to communicate with one another grows. Since most obtained information in IoT gadgets is put away on focal servers, this would give various difficulties. On the off chance that gadgets wish to get to information, they should convey by means of bringing together organization, [32] with information coursing through a focal server. The top benefits of Blockchain/IoT include decentralized or dispersed networks with peerto- One of the most effective strategies for dealing with this is through dispersed or decentralized networks that feature capabilities for ADC, DFS, and PPN [33].
5.2 Blockchain Scalability to IoT The Bitcoin is used for the online transactions that do not necessitate security from a third party has contributed to the rise in interest in blockchain technology. Scalability, on the other hand, presents blockchain service providers with the greatest obstacle. To integrate IoT and blockchain, scaling issues must be inscribed. IoT devices will,
11 A Review Paper on the Integration of Blockchain Technology with IoT
135
on one side, initiates the transactions at the cost in which ongoing blockchain [34] solutions cannot handle due to their sheer number. However, blockchain peers cannot be implemented on IoT devices due to resource limitations. In their current state, neither technology can be directly integrated. Several strategies, including Segwit, Sharding, increasing the block size, POS, and off-chain state, have been put forward to forward the issue of scalability. A scalability solution known as Segwit, or segregated witness, increases the amount of transactions of the block while maintaining the same block size. A segregated witness frees up space for new Bitcoin transactions by excluding the signature information from the transaction. There are a few disciplines that send IoT frameworks for each and every benefit that it gives, for example, the capacity to catch the information and speak with its companion gadgets without any human/machine mediation. Data leakage is very likely to occur during these interactions. There are a number of approaches that are taken to address this aspect of security in an effort to overcome this.
5.3 Security in Blockchain and IoT The structure of the Internet of Things (IoT) is made up of Machine-to-Machine (M2M) connections that do not involve humans in any way. As a result, building trust with the help of machines is a huge challenge that IoT equipment has yet to fully address. For improved scalability, data security, dependability, and privacy, the Blockchain (BC) can act as a medium in this process. This can be done by BC technology to follow all connected devices in the IoT environment. After that, it is used to make transaction processing possible and/or synchronized. We can fully remove a Single Point of Failure (SPF) in the IoT structure by utilizing the blockchain function [35]. Various algorithms, such as cryptographic algorithms and hashing techniques, are used to encrypt data in BC. As a result, BC technology serves to restore the digital market and provides enhanced security services in an IoT. The controllers are overly concerned about BC’s ability to recommend secure, private, and immediately discernible transaction monitoring. As a result, the BC can help us safeguard industrial IoT devices and prevent the organization from altering or spoofing data [36].
5.4 A Comparative Study A comparison of the Blockchain-based IoT system and the previously developed IoT system is included here. The IoT and blockchain technology are currently emerging technologies that have the potential to rapidly transform civilization [36]. Table 1 shows a list of the blockchain enabled IoT system w.r.t to several features.
136
A. Rani and M. Saxena
Table 1 Comparison between IoT and blockchain–IoT system Features
IoT
IoT-Blockchain
1. DE-centralize
Completely centralized
Entirely decentralized
2. Storage, privacy and security Intermediate
Considerably higher
3. Reliability
Data tampering is possible Data tampering is not possible
4. Immutability
Not immutable
Fully immutable
5. Real-time
Yes, real time
Nearly real time
6. Interoperability
Intermediate
High
7. Maintenance
Medium
Effectively high
According to the observations in Table 1, the integrated system of the blockchain technology and the IoT is the best one for ensuring security, immutability, interoperability and, reliability among others things. As a result, can say that the most suitable technology for IoT systems is the blockchain technology.
6 Conclusion This review paper has given a detailed overview of the interconnection between blockchain technology and the IoT model, which is focused to revolutionize the upcoming generation of IoT. Restrictions are essential for incorporating IoT and blockchain into government infrastructure. Citizens’, governments’, and businesses’ engagement will be accelerated as a result of this recognition. It is essential to bear research to guarantee the privacy and security of crucial technologies like blockchain technology and the IoT. The reality is that individuals are taking benefit of this situation, particularly in light of the instability of digital currency, is one of the biggest concerns regarding blockchain. This paper is then going on to introduce Blockchain technology, Internet of Things, IoT security with blockchain, scalability of blockchain in IoT, and so on.
References 1. Sadrishojaei M, Navimipour NJ, Reshadi M, Hosseinzadeh M (2021) A new preventive routing method based on clustering and location prediction in the mobile internet of things. IEEE Internet Things J 8(13):10652–10664 2. Alfandi O, Khanji S, Ahmad L, Khattak A (2021) A survey on boosting IoT security and privacy through blockchain. Clust Comput 24(1):37–55 3. Pettey LC, van der Meulen R (2010) Gartner reveals top predictions for IT organizations and users for 2011 and beyond 4. Ouaddah A (2019) A blockchain based access control framework for the security and privacy of IoT with strong anonymity unlinkability and intractability guarantees. In: Advances in computers, vol 115. Elsevier, pp 211–258
11 A Review Paper on the Integration of Blockchain Technology with IoT
137
5. Nakamoto S (2008) Bitcoin: a peer-to-peer electronic cash system. Decentralized Bus Rev 21260 6. Alsunaidi SJ, Alhaidari FA (2019) A survey of consensus algorithms for blockchain technology. In: 2019 International conference on computer and information sciences (ICCIS). IEEE, pp 1–6 7. Hassan MU, Rehmani MH, Chen J (2019) Privacy preservation in blockchain based IoT systems: integration issues, prospects, challenges, and future research directions. Futur Gener Comput Syst 97:512–529 8. Yu K, Tan L, Aloqaily M, Yang H, Jararweh Y (2021) Blockchain-enhanced data sharing with traceable and direct revocation in IIoT. IEEE Trans Industr Inf 17(11):7669–7678 9. Zhao Q, Chen S, Liu Z, Baker T, Zhang Y (2020) Blockchain-based privacy-preserving remote data integrity checking scheme for IoT information systems. Inf Process Manage 57(6):102355 10. Mistry I, Tanwar S, Tyagi S, Kumar N (2020) Blockchain for 5G-enabled IoT for industrial automation: a systematic review, solutions, and challenges. Mech Syst Signal Process 135:106382 11. Datta S, Das AK, Kumar A, Sinha D (2020) Authentication and privacy preservation in IoT based forest fire detection by using blockchain–a review. In: International conference on internet of things and connected technologies. Springer, Cham, pp 133–143 12. Gong L, Alghazzawi DM, Cheng L (2021) BCoT sentry: a blockchain-based identity authentication framework for IoT devices. Information 12(5):203 13. Shukla S, Thakur S, Hussain S, Breslin JG, Jameel SM (2021) Identification and authentication in healthcare internet-of-things using integrated fog computing based blockchain model. Internet Things 15:100422 14. Li M, Chen Y, Lal C, Conti M, Alazab M, Hu D (2021) Eunomia: anonymous and secure vehicular digital forensics based on blockchain. IEEE Trans Dependable Secure Comput 15. Idrees SM, Nowostawski M, Jameel R, Mourya AK (2021) Security aspects of blockchain technology intended for industrial applications. Electronics 10(8):951 16. Uddin M, Muzammal M, Hameed MK, Javed IT, Alamri B, Crespi N (2021) CBCIoT: a consensus algorithm for blockchain-based IoT applications. Appl Sci 11(22):11011 17. Mijwil MM, Aggarwal K, Mutar DS, Mansour N, Singh RS (2022) The position of artificial intelligence in the future of education: an overview. J Appl Sci 10(2) 18. Farhan L, Kharel R, Kaiwartya O, Hammoudeh M, Adebisi B (2018) Towards green computing for Internet of things: energy oriented path and message scheduling approach. Sustain Cities Soc 38:195–204 19. Farhan L, Alzubaidi L, Abdulsalam M, Abboud AJ, Hammoudeh M, Kharel R (2018) An efficient data packet scheduling scheme for Internet of things networks. In: 2018 1st International scientific conference of engineering sciences-3rd scientific conference of engineering science (ISCES). IEEE, pp 1–6 20. Farhan L, Kharel R, Kaiwartya O, Quiroz-Castellanos M, Alissa A, Abdulsalam M (2018) A concise review on Internet of Things (IoT)-problems, challenges and opportunities. In: 2018 11th International symposium on communication systems, networks & digital signal processing (CSNDSP). IEEE, pp 1–6 21. Jamali MJ, Bahrami B, Heidari A, Allahverdizadeh P, Norouzi F (2020) IoT architecture. In: Towards the internet of things, pp 9–31 22. Kumar R, Sharma R (2022) Leveraging blockchain for ensuring trust in IoT: a survey. J King Saud Univ Comput Inf Sci 34(10):8599–8622 23. Khanna A, Kaur S (2020) Internet of things (IoT), applications and challenges: a comprehensive review. Wireless Pers Commun 114(2):1687–1762 24. Ashraf I, Park Y, Hur S, Kim SW, Alroobaea R, Zikria YB, Nosheen S (2022) A survey on cyber security threats in IoT-enabled maritime industry. IEEE Trans Intell Transp Syst 25. Mohanta BK, Jena D, Ramasubbareddy S, Daneshmand M, Gandomi AH (2020) Addressing security and privacy issues of IoT using blockchain technology. IEEE Internet Things J 8(2):881–888
138
A. Rani and M. Saxena
26. Rawat DB, Chaudhary V, Doku R (2020) Blockchain technology: emerging applications and use cases for secure and trustworthy smart systems. J Cybersecur Privacy 1(1):4–18 27. What is blockchain technology? A step-by-step guide for beginners (WWW Document). https:// blockgeeks.com/guides/what-is-blockchaintechnology/. Accessed 4 Oct 2021 28. Zheng Z, Xie S, Dai H, Chen X, Wang H (2017) An overview of blockchain technology: architecture, consensus, and future trends. In: 2017 IEEE international congress on big data (BigData Congress). IEEE, pp 557–564 29. Velliangiri S, Karthikeyan P (2020) Blockchain technology: challenges and security issues in consensus algorithm. In: 2020 International conference on computer communication and informatics (ICCCI). IEEE, pp 1–8 30. Sammy F, Vigila S (2016) A survey on CIA triad for cloud storage services. Int J Control Theory Appl 9(14):6701–6709 31. Grammatikis PIR, Sarigiannidis PG, Moscholios ID (2019) Securing the Internet of things: challenges, threats and solutions. Internet of Things 5:41–70 32. Karamitsos I, Papadaki M, Al Barghuthi NB (2018) Design of the blockchain smart contract: a use case for real estate. J Inf Secur 9(3):177–190 33. Nartey C, Tchao ET., Gadze JD, Keelson E, Klogo GS, Kommey B, Diawuo K (2021) On blockchain and IoT integration platforms: current implementation challenges and future perspectives. Wireless Commun Mobile Comput 2021 34. Wang Z, Jin H, Dai W, Choo KKR, Zou D (2021) Ethereum smart contract security research: survey and future research opportunities. Front Comp Sci 15:1–18 35. Singh D, Mishra PM, Lamba A, Swagatika S (2020) Security issues in different layers of IoT and their possible mitigation. Int J Sci Technol Res 9(04):2762–2771 36. Mohammad AS, Brohi MN, Khan IA (2021) Integration of IoT and blockchain
Chapter 12
Survey and Analysis of Epidemic Diseases Using Regression Algorithms Shruti Sharma
and Yogesh Kumar Gupta
1 Introduction Healthcare practitioners require the help from innovative technologies to manage the global challenges of pandemic like COVID-19. Epidemic diseases have always been a major issue to fight against morbidity and mortality like situations. COVID19 pandemic is a severe acute respiratory syndrome, dawn from the city of Wuhan, China, in the late December 2019, and is animadvert as COVID-19 [1]. The pandemic is so severe, that it is affecting the population globally, due to its chain like reaction mechanism [2]. To control the diverse effects of this pandemic, many countries have declared lockdown for months, and government health and related officials are giving their duties in such an exposed environment to save the lives of many people. Henceforth, after unlocking the countries, it is the moral responsibility of every citizen to abide by all the sanitary precautions and help themselves and others too, to break the virus chain; otherwise, the repercussions would be precarious. Meanwhile artificial intelligence is also aiding government health officials to be well prepared antecedently with all the required and mandatory resources to fight the race [3– 5]. As per the current scenario, the recorded confirmed cases till June 7, 2020, are more than 6.7 million while the reported deceased cases are approximately 4 lacs. The deceased rate is becoming lower daily while recovery rate is increasing, and the confirmed cases are also increasing. Therefore, the situations are still not under control, as various vaccines are available, but still it can be prevented using proper sanitization and awareness. Now, it is the time when artificial intelligence can help the medical practitioners for handling these challenges, although enhancement in S. Sharma (B) · Y. K. Gupta Department of Computer Science, Banasthali Vidhyapith, Jaipur, Rajasthan, India e-mail: [email protected] S. Sharma STME, SVKM’s NMIMS Indore, Indore, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_12
139
140
S. Sharma and Y. K. Gupta
Get data
Clean, prepare and Manipulate Data
Train Model
Test and validatedata
Improve
Fig. 1 Machine learning process
technology leads to detect COVID-19 with 96% accuracy [6]. Machine learning is a technique which works upon the theory of training the data, and based on that data training, the model will learn to predict the values [7]. The basic workflow of machine learning is depicted in Fig. 1. Thus, to predict epidemic cases or disease values, machine learning (ML) uses the historical data of patients, climatic variables, and social media data [8]. The primary contribution of this study is the use of regression methods to the COVID-19 data set in order to forecast the number of instances. The research also focused on a survey of epidemic diseases like dengue, malaria, and Influenza-like Illness. In addition, the work is primarily divided into the following sections: Sect. 2, literature review; Sect. 3, materials and techniques; results; and Sect. 4, conclusion and future work.
2 Literature Review The potential of big data analytics is playing a vital role in the healthcare sector. Research compels the use of data analytics and shows that it would intensify the results in health care with minimum cost. Many researchers did the work for the early prediction of epidemic diseases like dengue, ILI, malaria, and COVID-19 using machine learning and big data and considered imperative data as from social media like Twitter, climatic variables, and hospital records [9]. Aduragba and Cristea [10] analyzed two important issues; in what manners social media data is vital for the prediction of COVID-19, and how public health education is connected to it, which reflects the disease spread. With this analysis, the research concluded that prediction of disease from Twitter data has shown to be the vital source. Moreover, the study represents the impact of infectious diseases with high frequency in middle- and low-income countries.
2.1 Dengue Kittisak et al. [11] introduced a data-driven approach for forecasting the dengue infection in two cities of Bangkok and Thailand. With the Chi-Squared Automatic
12 Survey and Analysis of Epidemic Diseases Using Regression Algorithms
141
Interaction Detection (CHAID) model, first he compared it with the autoregressive integrated moving average (ARIMA) model and found CHAID performance is much better than ARIMA model. Then, he compared it with other models of machine learning like linear regression, generalized linear model, support vector machine (SVM), Artificial Neural Network (ANN) and found that CHAID model has least error of as compared to all other models in both the cities. The data they covered is from the meteorological department including rainfall and oceanic Nino Index (ONI), remote sensing Index, and historical data from hospitals. Patsaraporn [12] used time series analysis implemented with seasonal autoregressive integrated moving average (SARIMA) model and plotted a mean absolute percentage error (MAPE) to evaluate the performance. Box Jenkins method of time series analysis with SARIMA model predicted the much-closed attributes for the data taken from Thailand in the year 2019 and performed well from the dengue case patterns of one, two- and twelve-months past record. MAPE calculated with these instances was 0.05, and the research shows that SARIMA model can perform well to predict the number of dengue cases for the next year based on the past record.
2.2 Influenza Ya et al. [13] integrated the data from air quality index (AQI) and doctors visit for treatment to search the correlation between Influenza-like Illness (ILI) and AQI with the data from 2007 to 2020. First, Chao developed a linear regression model to find the correlation and saw a vital correlation between PM2.5 and ILI. Furthermore, the goal is to search the relation between data by deep learning algorithms of RNN, LSTM, and used TensorFlow Library with Keras tool. In their findings, a strong correlation between historical data and air quality index has been seen to predict the Influenzalike Illness with minimum (less than 20) mean absolute percentage error (MAPE). Perhaps, the model will not evaluate the same results with smaller data, so further research may be possible in the future. Xue et al. [14] formulated three models of Improved Particle Swarm Optimization Algorithm for optimized results from support vector regression for predicting the Influenza-like Illness (ILI) and inferred that the third model performed very well. The models are categorized as historical data from Twitter worked with model 1, model 2 depicted the empirical network model with Twitter data with its regional impacts. So, the third model integrated the data from center of disease control (CDC) and Twitter data. The third model inculcated the new and historical data from Twitter to fill the gap with efficacy. These three models are tested for ten different regions of United States (US) and used the key features like Mean Square Error (MSE), MAPE, and Root Mean Squared Error (RMSE) to evaluate the accuracy of the models. Eventually, model 3 depicted the least values of MSE, RMSE, and MAPE with IPOS-SVR.
142
S. Sharma and Y. K. Gupta
2.3 Malaria Using malaria data and meteorological data, Wang et al. developed an ensemble algorithm for predicting malaria outbreak. The procedure involves three steps: bagging, boosting, and stacking. For boosting, the algorithms ARIMA, STL-ARIMA, BPANN, and LSTM are independently employed at the same level to produce results for malaria outbreak using ARIMA, STL-ARIMA, BP-ANN, and LSTM. Nonetheless, these models are embedded via gradient-boosting regression trees (GBRT) to produce a stacked structure, and the results are mapped using MASE, RMSE, and Mean Absolute Deviation (MAD) features. Lastly, the ensemble model has lower RMSE, MAD, and MASE values compared to the other four models [15]. Laxmi Lydia et al. [16] utilized a deep learning model to label images of parasitized and uninfected blood cells for the diagnosis of malaria. The model learned the data set using convolutional neural networks and attained an accuracy of 97% in correctly locating the images.
2.4 COVID-19 Jiang et al. [17] employ the machine learning models that are very predominant for the prediction of COVID-19, which helps in diagnosing the patients with acute respiratory distress syndrome (ARDS). According to their study, the algorithms support vector machine (SVM), decision tree, k-nearest neighbor (KNN), and Random Forest predicted accuracy up to 70–80%. The imperative features they used for the prediction were ALT (Alkaline Amino transfer), Myalgia, hemoglobin, gender, temperature, Na+, K+, Lymphocyte Count, Creatinine, age and WBC. Huang et al. [18] used the major inequality allying cure ratio and mortality ratio for assorted and diverse regions of China for a period of Jan 23–Feb 11, 2020. Santosh [19] proposed AI driven model for the prediction of COVID-19. This research tries to impress active learning technology in place of traditional machine learning methods as the world is facing a pandemic where waiting for huge historical data would not work. So, the active learning method will contemplate with the Multitudinal and Multimodal data. Sumit Ghosal et al. [20] applied the linear regression analysis for the prediction of death due to COVID-19 for India. The prediction for the 5th and 6th week is 211 and 467 respectively from March 14, 2020.
2.5 Impact of Dengue, Malaria, and Influenza on COVID-19 An irresistible sickness might have a few periods that regularly happen in a particular season in the pre-immunization time. Fast distinguishing proof of an occasional flareup is fundamental to creating a response from medical services experts more rapidly
12 Survey and Analysis of Epidemic Diseases Using Regression Algorithms
143
and effectively. ‘The occasional episodes can prompt genuine infections, like flu or Influenza-like Illness (ILI)’, which can prompt passing when the pandemic separates epidemiologically in a district [14, 15].
3 Data and Methods This paper aims to do the survey of epidemic diseases like dengue, malaria, influenzalike illness, and COVID-19, and then performing the regression algorithms over the COVID-19 data. The data for implementing the algorithms was accumulated from the authentic source like covidindia.org and Kaggle for India with every province detail, and further a comparison of its severity is made with other epidemic diseases. However, the other epidemics are not as severe as this coronavirus, due to its long incubation period of 2 to 14 days (about 2 weeks), during which the sufferer imparts the infection to others without having any symptoms. The data analysis is presented below in Fig. 2 for confirmed cases, deaths, and recovered cases using Indian data set. The exponential growth of virus leads to treating it as an emergency crisis. Henceforth, to reach the objective data was trained using three regression algorithms like Boosted Decision tree, decision forest, Poisson regression, and additive nonlinear regression model. Boosted Decision Tree Regression algorithm assembles and hoards the regression trees using boosting. In this algorithm, the term boosting refers to individual tree that is dependent on preceding trees. Eventually, the algorithm masters by fitting the residual of the trees that antecede it. Hence, boosting in a decision tree orchestrate that tends to enhance the accuracy with very less risk and coverage. For the model, the total number of trees fabricated is 100, and highest number of training instances
Fig. 2 Confirmed, deaths, and recovered in India till Aug 2021
144
S. Sharma and Y. K. Gupta
required to form a leaf are considered 10 (10-crossfolds) with the learning rate of 0.2. Decision Forest Regression is coherent and well organized for the prediction of cases, which provides the efficient memory usage during training. The regression model is embedded as ensemble and orchestration of decision trees. Every tree in a regression decision forest results in a Gaussian distribution as a prediction. An assemblage of trees is carried out to find a Gaussian distribution closest to the collaborated dispensation for all trees in the model. This model considered eight decision trees with the greatest depth of 32 decision trees accumulated by single parameter trainer mode. Poisson Regression is basically used for the prediction of numeric values, specifically where counts are needed. This type of regression method uses the concept of Poisson distribution where counts cannot be negative and is applicable where one needs to name the count for event occurs. We are using this algorithm to predict the count for positive confirmed cases of COVID-19 pandemic. Additive Nonlinear Regression Model is a decomposable time series model with three primary model parts: pattern, irregularity, and occasions. They are merged in the accompanying condition from Eq. 1: y(t) = g(t) + s(t) + h(t) + ε,
(1)
where g(t) is the growth over long run, s(t) presents intermittent changes, and h(t) is the impact of occasions and at covers eccentric changes not obliged by the model. This article focuses on reviewing the research done for epidemic diseases like Influenza-like Illness, dengue and malaria with machine learning and Big Data. Henceforth, regression algorithms, cross-validated with 10-crossfolds, write down that the results are better with decision forest regression as compared to the boosted decision tree and Poisson regression algorithms. However, the coefficient of determinations is comparably similar with decision forest regression and boosted decision tree regression with the values 0.999868 and 0.999583, respectively. Therefore, prediction of COVID-19 confirmed cases are equal with the machine learning algorithms if trained with labeled and complete data. Meanwhile we analyzed the data set using regression algorithms of decision forest regression, boosted decision tree regression, and Poisson regression. A detailed summary of applied algorithms is mentioned in Tables 1, 2 and Fig. 3. The results shown below indicate that decision forest regression algorithm outperformed boosted decision tree and Poisson regression with mean absolute error (MAE) as 11.2626. Also it is showing that RMSE, RAE also has the minimum value for boosted decision regression tree algorithm. Figure 3 depicted that Poisson regression performed worst in all among the three with very high root mean squared error. Also it shows that the Poisson regression does not perform well in terms of MAE. Future trend of India for the COVID-19 cases is presented below in Fig. 4. Additive nonlinear regression model presented the cases with dates and predicted the future
12 Survey and Analysis of Epidemic Diseases Using Regression Algorithms
145
Table 1 Summarized review of literature Author details
Methods
Data source
Validation method
Results
Ribeiro et al. Support vector Clinical [21] regression and stacking-ensemble
Kaggle
Holdout
Accuracy: Error in range of 0.87–3.51% one, 1.02–5.63% three and 0.95–6.90% six day ahead
Yan et al. [22]
XGBoost
Clinical
Online sources Kaggle
Cross-validation Accuracy 90%
Chimmula et al. [23]
LSTM
Demographic John Hopkins University and Canadian Health authority, data containing infected cases up to March 2020
Cross-validation Ending point of the pandemic outbreak in Canada was predicted on June 2020
Demographic John Hopkins University
Cross-validation Real-time forecast and 10 days ahead, Observed seven key features associated with dead rate
Chakraborty, Hybrid and Ghosh Wavelet-ARIMA [24]
Data type
Buczak et al. Fuzzy Association Clinical, [25] Rule Mining Metrological
Hospital records
Karim et al. [26]
Metrological None Department
ANOVA
Climate
None
Positive Predictive value = 0.686, Sensitivity = 0.615, Specificity = 0.982 Accuracy 95% (continued)
146
S. Sharma and Y. K. Gupta
Table 1 (continued) Author details
Methods
Data type
Data source
Validation method
Huy et al. [27]
Decision Based
Clinical
Metrological None Department
AUC is 0.73
Yang et al. [13]
RNN, LSTM
Clinical
Historical data from doctors
MAPE less than 20
None
Results
Table 2 Summary of results for regression algorithm with mean absolute error (MAE), RMSE, RAE, RSE, and coefficient of determination Algorithms adopted
Mean absolute Root mean error squared error
Relative absolute error
Relative squared error
Coefficient of determination
Decision forest regression
11.26260
64.224548
0.004026
0.000132
0.999868
Boosted decision tree regression
29.17924
65.889239
0.009846
0.000124
0.999876
Poisson regression
72.47962
1413.546926
0.193179
0.05718
0.94282
till October 2021. The model predicts the actual data till April 2021 and forecasts the future with a high peak in October 2021.
4 Conclusion and Future Work COVID-19 is the most dangerous pandemic of the twenty-first century which is continually threatening many lives over the globe with the highest rate of risk. Apparently, dengue and malaria are also at highest peak with the climatic changes in a country like India. Evidently, amalgamation of diseases is a major concern to the governing bodies and healthcare service providers. This research shows the ostentatious review of diseases like dengue, malaria, ILI, and COVID-19 with the applied regression algorithms on COVID-19 data. This research stands for the regression algorithms over the aggregated data from online sources and concluded that decision forest regression algorithm performed well. Therefore, predictions of COVID19 confirmed cases are commensurate and comparable with the machine learning algorithms if trained with labeled and complete data. This research aims to allow clinicians, administrative staff, and financial experts to endure vigilance for events beforehand and thus allow them to make decisions about how to tackle the situation. In our future work, details related to dengue, malaria with COVID-19 will be considered, as with the recent survey, COVID-19 is occurring with these diseases.
12 Survey and Analysis of Epidemic Diseases Using Regression Algorithms
147
Mean Absolute Error 1600
1400
Root Mean Squared Error 1200 1000
Relative Absolute Error 800 600
Relative Squared Error 400 Coefficient of Determination
200
Relative Squared Error Relative Absolute Error
0
Root Mean Squared Error Mean Absolute Error
Fig. 3 Comparison of various regression algorithms
Fig. 4 Future trend predicted with additive nonlinear model
Coefficient of Determination
148
S. Sharma and Y. K. Gupta
References 1. Sarkodie SA, Owusu PA (2020) Investigating the cases of novel coronavirus disease (COVID19) in China using dynamic statistical techniques. Heliyon 6(4):e03747 2. Wang Y et al (2020) Abnormal respiratory patterns classifier may contribute to large-scale screening of people infected with COVID-19 in an accurate and unobtrusive manner. arXiv preprint arXiv:2002.05534 3. Sajid M, Dhar B, Almohaimeed AS (2022) Differential order analysis and sensitivity analysis of a CoVID-19 infection system with memory effect. AIMS Math 7(12):20594–20614 4. Misra AK, Maurya J, Sajid M (2022) Modeling the effect of time delay in the increment of number of hospital beds to control an infectious disease. Math Biosci Eng 19(11):11628–11656 5. Rajput A et al (2021) Optimal control strategies on COVID-19 infection to bolster the efficacy of vaccination in India. Sci Rep 11(1):1–18 6. Technology Org, AI algorithms detects corona virus infections in patients from CT scans with 96% accuracy. https://www.technology.org/2020/03/01/aialgorithm-detects-coronavirusinfections-in-patients-from-ctscans-with-96-accuracy/. Accessed on 13 Apr 2021 7. Xu X et al (2020) A deep learning system to screen novel coronavirus disease 2019 pneumonia. Engineering 6(10):1122–1129 8. Kohavi R, Longbotham R (2017) Online controlled experiments and A/B testing. Encyclopedia Mach Learn Data Mining 7(8):922–929 9. Harapan H et al (2020) Coronavirus disease 2019 (COVID-19): a literature review. J Infect Public Health 13(5):667–673 10. Aduragba OT, Cristea AI (2019) Research on prediction of infectious diseases, their spread via social media and their link to education. In: Proceedings of the 4th International conference on information and education innovations 11. Kerdprasop K, Kerdprasop N, Chuaybamroong P (2019) Forecasting dengue incidence with the chi-squared automatic interaction detection technique. In: Proceedings of the 2019 2nd artificial intelligence and cloud computing conference 12. Somboonsak P (2019) Time series analysis of dengue fever cases in Thailand utilizing the Sarima model. In: Proceedings of the 2019 7th International conference on information technology: IoT and smart city 13. Yang C-T et al (2020) Influenza-like illness prediction using a long short-term memory deep learning model with multiple open data sources. J Supercomput 76:9303–9329 14. Xue H et al (2019) Regional level influenza study based on Twitter and machine learning method. PLoS ONE 14(4):e0215600 15. Wang M et al (2019) A novel model for malaria prediction based on ensemble algorithms. PLoS ONE 14(12):e0226910 16. Laxmi LE, Jose MG, Sharmili N, Shankar K, Andino M (2019) Image classification using deep neural networks for malaria disease detection. Int J Emerg Technol 10(4):66–70 17. Jiang X et al (2020) Towards an artificial intelligence framework for data-driven prediction of coronavirus clinical severity. Comput Mater Continua 63(1):537–551 18. Huang R, He L, Zhou P (2020) Epidemic characteristics of 2019-nCoV in China, Jan 23, 2020–Feb 11, 2020. Available at SSRN 3542179 19. Santosh KC (2020) AI-driven tools for coronavirus outbreak: need of active learning and cross-population train/test models on multitudinal/multimodal data. J Med Syst 44:1–5 20. Ghosal S et al (2020) Linear regression analysis to predict the number of deaths in India due to SARS-CoV-2 at 6 weeks from day 0 (100 cases-March 14th 2020). Diab Metabolic Syn: Clin Res Rev 14(4):311–315 21. Ribeiro MHDM et al (2020) Short-term forecasting COVID-19 cumulative confirmed cases: perspectives for Brazil. Chaos, Solitons Fractals 135:109853 22. Yan L et al (2020) Prediction of criticality in patients with severe Covid-19 infection using three clinical features: a machine learning-based prognostic model with clinical data in Wuhan. MedRxiv 2020–02
12 Survey and Analysis of Epidemic Diseases Using Regression Algorithms
149
23. Chimmula VKR, Zhang L (2020) Time series forecasting of COVID-19 transmission in Canada using LSTM networks. Chaos, Solitons Fractals 135:109864 24. Chakraborty T, Ghosh I (2020) Real-time forecasts and risk assessment of novel coronavirus (COVID-19) cases: a data-driven analysis. Chaos, Solitons Fractals 135:109850 25. Buczak AL et al (2012) A data-driven epidemiological prediction method for dengue outbreaks using local and remote sensing data. BMC Med Inf Decis Making 12(1):1–20 26. Karim MN et al (2012) Climatic factors influencing dengue cases in Dhaka city: a model for dengue prediction. Indian J Med Res 136(1):32 27. Huy NT et al (2013) Development of clinical decision rules to predict recurrent shock in dengue. Crit Care 17(6):1–8
Chapter 13
Hybrid Pre-trained CNN for Multi-classification of Rice Plants Sri Silpa Padmanabhuni, Abhishek Sri Sai Tammannagari, Rajesh Pudi, and Srujana Pesaramalli
1 Introduction Different models and techniques have been introduced and worked to detect and separate the rice seed from various distinct varieties. Many have served their purpose, and a few are currently used daily. These models and techniques may have procured less accuracy and produced fewer quality results than the recently developed models. When separating and identifying rice seeds, it is crucial to know whether the rice seed sold is genuine, also closely related to a particular rice plant family that is nutritious and healthy. The system will use the enhanced vanilla CNN and VGG16 model to define the rice plant family to prove which works extensively and accurately in predicting the rice plant family. The proposed system will work on a dataset containing images of various rice plant families planted and harvested in different parts of India. Convolution neural networks are a network architecture used alongside deep learning algorithms. Their specific use for image recognition and image processing makes them more reliable. This model mainly involves the process of pixelation of provided data. The vanilla CNN is said to be an extension of the linear regression supervised algorithm in machine learning. The difference between both models is that the hidden layer is crucial in all the additional computations in a vanilla neural network. This extra layer is introduced between the inputs and outputs. The hidden layer, denoted as H, contains three neurons, namely H 0 , H 1 , and H 2 , assuming that any number of neurons can add any number to the hidden layers. Backpropagation and the hidden layer are also used in vanilla neural network.
S. S. Padmanabhuni (B) · A. S. S. Tammannagari · R. Pudi · S. Pesaramalli Department of Computer Science and Engineering, PSCMR College of Engineering and Technology, Vijayawada, AP, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_13
151
152
S. S. Padmanabhuni et al.
Using a vanilla neural network in this model proves that accuracy might become significant in this context. Its architecture is robust, such that applying each layer’s output is generated using a nonlinear activation function that uses weights and bias. The vanilla neural network determines the weighted sum at each step based on the number of layers when the ReLU function is used as the activation function. In this study, we additionally demonstrate the accuracy attained by the VGG16 design. Convolution neural network (CNN) VGG16 was utilized to triumph in an ILSVR (Imagenet) competition in 2014. It is regarded as having the most prominent vision model architecture at the time. The key feature that sets VGG16 apart from other algorithms is its emphasis on building 3 × 3 filter convolution layers with a stride one, also, consistently using the same padding and maximum pooling layer of a 2 × 2 filter with a stride 2. The 16 layers with weights that comprise the VGG16 are represented by the number 16. Max pooling, fully linked with ReLU, and the final stage in VGG16 all involve convolution and ReLU softmax. When softmax is achieved, it is said that the model is cooked finally. VGG16 in this model is used to evaluate and compare the accuracy achieved by the vanilla CNN model. This paper uses the dataset from Kaggle, which contains the rice seed types of Arborio, Ipsala, Basmati, Jasmine, and Karacadag. Using this, we will be determining the accuracies of the two models.
2 Literature Survey Zhengjun Qiu et al. proposed a model for identifying the variety of rice seeds using hyperspectral imaging that can classify by convolution neural network. This proposed system uses support vector machine (SVM), K-nearest neighbour as (KNN), and convolution neural network (CNN). The spectra variation between the preprocessed and unpreprocessed average spectra is not significant for the varieties of seeds. The reliability of the testing and training dataset with SVM is 86.9 and 84.0%, and the accuracy for the CNN model with training and testing is 89.6 and 87.0% [1]. Samson Damilola Fabiyi et al. defined a way of classifying the variety of rice seeds using the RGB colour scale for hyperspectral imaging. In this model, the author uses a dataset containing 90 species of rice seeds. A brand-new approach for inspecting rice seeds that combine hyperspectral and traditional RGB imaging is proposed. The model has eliminated the impure source by combining spatial characteristics obtained from photos with higher spatial resolution and spectral features from hyperspectral data cubes. The average recall, f1 scores, and precision for the dataset with all parts are 79.64, 78.80, and 78.27% [2]. Cinar Ilkay et al. defined various methods for identifying the rice varieties utilizing machine learning algorithms. The algorithms used in this model are K-nearest neighbour (KNN), random forest, and support vector machine, multilayer perceptron, logistic regression, and decision tree. While different cultivars are more accurate than Arborio and Basmati, they have a lesser accuracy rate. Of all models, random
13 Hybrid Pre-trained CNN for Multi-classification of Rice Plants
153
forest has produced more promising results in accuracy and precision in identifying the rice seed varieties. Random forest produced 98.04% accuracy [3]. Cinar Ilkay et al. performed various methods in classifying the rice varieties using artificial intelligence methods. The algorithms used are logistic regression, multiplayer perceptron, SVM, RF, decision tree, KNN, and Naïve Bayes. Success criteria like efficiency, precision, selectivity, clarity, F 1 measure, negative predictive value, false positive rate, false discovery rate, and false negative rate are calculated for two-class classification performance assessments. The logistic regression method has received higher accuracy than the other models. Its obtained accuracy is 93.02% [4]. Xu Ma et al. proposed a model using a fully convolution network to segment rice seedling weed images in paddy fields. The methods used in this model are FCN, U-Net, and SegNet. The percentage of pixels correctly assigned to a specific class is expressed as several pixels (referred to as the “pixel accuracy” or “PA”). The model can directly extract characteristics from RGB photos and identify and classify the pixels. The SegNet has performed well and gives a higher accuracy rate than the other two models. The accuracy is 93.6% for SegNet [5]. The Table 1 discusses about the methodology used and its merits and demerits of previous works. Table 1 Existing system analysis S. No. Author
Algorithm
Merits
Demerits
1
Zheng jun SVM, KNN, CNN
As the training set increases, CNN outperforms the other two models
In spectral information, some bands are noisy, like first and second, so they cannot be trusted
2
Paul Murray
RGB, hyperspectral imaging
Eliminating not pure species from rice seed samples using high spatial resolution images
Decrease in performance due to level of similarity
3
Ilkay CINAAR
KNN, DT, LR, MLP, RF, SVM
The random forest The models only work algorithm performed well properly if the instances than the other algorithms are correctly recognized
4
Ilkay CINAAR
KNN, DT, LR, MLP, RF, SVM, NB
The logistic regression method has achieved more accuracy than other algorithms
5
Xu Ma
FCN, U-Net, and Extract features directly SegNet from RGB images and classify and recognize the pixel
Only some models can determine the importance of the variables in new predictions The picture patch has an impact on and a restriction on the algorithm’s performance
154
S. S. Padmanabhuni et al.
3 Proposed Methodology 3.1 Dataset Description This model uses the rice seed image dataset, publicly available on Kaggle. The dataset consists of five types of rice seeds, namely Arborio, Ipsala, Basmati, Karacadag and Jasmine sources. These five types of rice seeds are split into training and testing datasets for training and to achieve accuracy in the defined models. These rice seeds are divided into five different classes and are further continued for preprocessing. Figure 1 shows a sample Arborio rice seed used in this model. The total images in the dataset are 75,000 images where a detailed description of the dataset is given in Table 2.
3.2 Preprocessing In machine learning, data augmentation and preprocessing play a crucial role in transforming the data and making it suitable to the format into which the model can quickly process and identify it. The preprocessing in this model is done using Keras’s Image Data Generator class. The Image Data Generator class performs translations, rotations, shearing, changes in scale, image flipping, and zooming of the image dataset. Figure 2 Fig. 1 Sample Arborio rice seed from the dataset
Table 2 Dataset description
Rice seed
No. of images
Arborio
15,000
Basmati
15,000
Ipsala
15,000
Jasmine
15,000
Karacadag
15,000
13 Hybrid Pre-trained CNN for Multi-classification of Rice Plants
155
Fig. 2 Preprocessing of image data generator class
defines the preprocessing of the classes. The process of Image Data Generator class completes the following steps to preprocessing. 1. 2. 3. 4.
It takes a batch of images that the model uses for training. It applies random transformations to each image contained in the batch. The newly altered batch is then substituted for the original bunch of photos. Then it trains a deep learning model on this transformed batch.
3.3 Classification Dividing a dataset into classes is known as classification in machine learning. Neither organized nor unstructured data is used to execute it. Guessing the category of a set of given data points is the first place in the classification process. The created classes are frequently called targets, labels, or categories. Enhanced vanilla convolution neural networks and the VGG16 architecture are the classification algorithms employed in this model. Vanilla CNN. A convolution neural network consists of three layers. Figure 3 defines the CNN model. 1. Pooling layer. 2. Convolution layer. 3. Fully connected layer.
156
S. S. Padmanabhuni et al.
Fig. 3 CNN layers
Pooling Layer. The pooling layer substitutes a statistical summary from the adjacent outputs in place of the network’s production at specific locations. Because of this, the spatial size is lower when the representation is made, which means fewer computations and weights are required. Each representational element is evaluated separately for the pooling operation [6]. Convolution Layer. The convolution layer is the essential element of CNN design. On the network, it transfers the vast bulk of the computing load. The dot product of two matrices is evaluated in this layer. The first matrix serves as the kernel—a set of parameters that can learn—and the second matrix acts as the confined receptive field [7]. The grain is shallower than a picture but smaller. As a result, if a picture contains three (RGB) channels, the kernel’s sizes of height and width are limited, but the depth will affect all three channels. Fully Connected Layer. As in a traditional FCNN, all neurons in the layers above and below are entirely connected. Consequently, it can be calculated by running a matrix multiplication followed by a bias effect. The FC layer maps the representation between the input and the output [8]. There are various types of nonlinear operations. The most popular models are: Sigmoid. The sigmoid nonlinearity’s mathematical expression is σ (κ) = 1/(1 + eκ ). A number with fundamental values is “squshed” into 0 and 1. A particularly undesirable sigmoid characteristic occurs when the activation is at either tail—the gradient practically disappears. A local slope will unintentionally “die” if it shrinks to an exceedingly low value during backpropagation. A zigzag dynamic of gradient updates for weight will result if the data entering the neuron is always positive. Thus, this will result in either all positives or all negatives coming out of the sigmoid. Tanh. Tanh converts an interval [− 1, 1] from a real-valued number [9]. Although the output is zero-centred instead of sigmoid-centred, the activation saturates like the sigmoid. ReLU. The rectified linear unit (ReLU) has received much attention recently. The function f (k) = max (0, κ) is computed. In other words, at 0, the activation is merely the threshold ReLU which is six times faster in achieving convergence than sigmoid and tan h and is also more dependable. However, the weakness of ReLU during training is a drawback. A strong gradient running across it can update it while preventing the neuron from receiving additional updates. We can overcome this by selecting a suitable learning rate.
13 Hybrid Pre-trained CNN for Multi-classification of Rice Plants
157
Fig. 4 VGG16 architecture
VGG16. Convolutional neural networks, a kind of artificial neural network, go by ConvNet. Convolutional neural networks consist of input, output, and several undiscovered layers. The convolutional neural network (CNN) subtype, VGG16, is one of the most influential computer vision models. Figure 4 describes how the data is processed in the VGG-16 model. The model’s creators used an architecture [10] with exceedingly small (3 × 3) convolution filters to analyze the networks and increase the depth, significantly improving over previous approaches. There are now roughly 138 trainable parameters thanks to the increase in depth to 16–19 weight layers. The weighted 16 layers of VGG16 are represented by the number 16. VGG16 includes 21 layers, 13 convolutional layers, five max pooling layers, and three dense layers. However, only 16 are weight layers, also known as learnable parameters layers. VGG16 has an input tensor size of 224, 244 and three RGB channels. The design keeps the max pool and convolution layers in the same sequence. Conv-1 layer is made up of 64 filters, Conv-2 layer is made up of 128 filters, Conv-3 layer is made up of 256 filters, Conv-4 layer is made up of 512 filters, and Conv-5 layer is made up of 512 filters. Three fully connected (FC) layers are added after a collection of convolutional layers. The third has a 1000-way ILSVRC classification [11], whereas the previous two offer 4096 channels.
4 Results and Discussions The results of the proposed model are described below. In Fig. 5, the model calculating and processing the dataset provided for several pictures and the classes into which they are divided. It is represented in a bar chart. This data preprocessing helps the user illustrate the data in which the model is working and helps in accurately generating the results. Figure 6 plots the models accuracy and loss at different epochs and iterations. The model’s accuracy has kept improving as the iterations are performed with training data, and it decreased the accuracy with the validation data. The models loss with training data kept descending as the iterations performed, and the loss with validation data kept improving with the iterations performed.
158
S. S. Padmanabhuni et al.
Fig. 5 Data preprocessing
Fig. 6 Epoch versus accuracy graph for model accuracy and loss using vanilla CNN
On further analysis of the vanilla CNN model, the F1 score, accuracy, precision, recall, and support values are generated in Table 3 for each class of the dataset. Macro averages and weighted averages are also developed for the analysis values. The output values vary from time to time and depend on the model used to train using the dataset. The next step performs analyses and produces a summary of the VGG16 model on the dataset, as described in Table 4. The model summary helps strengthen the model in various aspects. It includes detailed information about the layers used in the model. It estimates how many parameters are trainable, nontrainable, and total parameters. On classification, validation, and data analysis, a graph has been a plot to determine the accuracy versus epochs intensity, as shown in Fig. 7. The model accuracy has kept improving in this model, and model loss has decreased as the epochs iterations increase.
13 Hybrid Pre-trained CNN for Multi-classification of Rice Plants
159
Table 3 Output values for vanilla CNN Precision
Recall
F 1 -score
Support
Arborio
0.000000
0.000000
0.000000
15.000000
Basmati
0.000000
0.000000
0.000000
15.000000
Ipsala
1.000000
0.266667
0.421053
15.000000
Jasmine
0.223881
1.000000
0.365854
15.000000
Karacadag
0.000000
0.000000
0.000000
15.000000
Accuracy
0.253333
0.253333
0.253333
0.253333
Macro avg
0.244776
0.253333
0.157381
75.000000
Weighted avg
0.244776
0.253333
0.157381
75.000000
Table 4 Model summary to evaluate using VGG16 Layer (type)
Output shape
Param #
input_2 (Inputlayer)
[(None, 175,175,3)]
0
vgg16 (Functional)
(None, 5,5,512)
14,714,688
global_average_pooling2d (GlobalAveragepooling2D)
(None, 512)
0
dense_3 (Dense)
(None, 1024)
525,312
dense_4 (Dense)
(None, 5)
5125
Total params: 15,245,125 Trainable params: 530,437 Non-trainable params:14,714,688
Fig. 7 Epoch versus accuracy graph for model accuracy and loss using VGG16
The precision, recall, f1 score and support values are generated and are elaborated in Table 5. It gives a detailed overview of the predicted values of precision, recall, f 1 and support.
160
S. S. Padmanabhuni et al.
Table 5 Output values using VGG16 Layer (type)
Output shape
Param #
input_2 (Input Layer)
[(None, 175,175,3)]
0
vgg16 (Functional)
(None, 5,5,512)
14,714,688
global_average_pooling2d (GlobalAveragepooling2D)
(None, 512)
0
dense_3 (Dense)
(None, 1024)
525,312
dense_4 (Dense)
(None, 5)
5125
Figure 8 represents the final output of the model predicting the type of the seed and specifies what the truth is and the expected name of the rice seed. It determines whether the model can predict accurately or not.
Fig. 8 Prediction output for both VGG16 and vanilla CNN model
13 Hybrid Pre-trained CNN for Multi-classification of Rice Plants
161
5 Conclusion The model proposed in this paper is to predict the accuracies of two different models, vanilla CNN and VGG16, to find and compare them in prediction. This accuracy prediction helps define which model can produce good results when trained and tested with an image dataset. When the models were trained and predicted, the accuracy results, the VGG16 model performed well and predicted the outputs better than the vanilla CNN model. Hence VGG16 undergoes various layers of pooling and convolution, which makes the model more accurate. Hence, the VGG16 model outperforms the vanilla convolution neural network. Furthermore, these models can be compared with various datasets and various models that are developed and more enhanced than the models we use now.
References 1. Qiu Z, Chen J, Zhao Y, Zhu S, He Y, Zhang C (2018) Variety identification of single rice seed using hyperspectral imaging combined with the convolutional neural network. Appl Sci 8(2):212 2. Fabiyi SD et al (2020) Varietal classification of rice seeds using RGB and hyperspectral images. IEEE Access 8:22493–22505. https://doi.org/10.1109/ACCESS.2020.2969847 3. Cinar I, Koklu M (2022) Identification of rice varieties using machine learning algorithms. J Agric Sci 9–9 4. Cinar I, Koklu M (2019) Classification of rice varieties using artificial intelligence methods. Int J Intell Syst Appl Eng 7(3):188–194 5. Ma X, Deng X, Qi L, Jiang Y, Li H, Wang Y, Xing X (2019) Fully convolutional network for rice seedling and weed image segmentation at the seedling stage in paddy fields. PLoS ONE 14(4):e0215676 6. Durai S, Mahesh C (2021) Research on varietal classification and germination evaluation system for rice seed using handheld devices. Acta Agricul Scand Section B Soil Plant Sci 71(9):939–955 7. Liu Z, Cheng F, Ying Y et al (2005) Identification of rice seed varieties using neural network. J Zheijang Univ Sci B 6:1095–1100. https://doi.org/10.1631/jzus.2005.B1095 8. Thu Hong PT, Thanh Hai TT, Lan LT, Hoang VT, Hai V, Nguyen TT (2015) Comparative study on vision based rice seed varieties identification. In: 2015 Seventh international conference on knowledge and systems engineering (KSE), pp 377–382. https://doi.org/10.1109/KSE.2015.46 9. Padmanabhuni SS, Gera P (2022) Synthetic data augmentation of Tomato plant leaf using meta intelligent generative adversarial network: Milgan. Int J Adv Comput Sci Appl (IJACSA) 13(6). https://doi.org/10.14569/IJACSA.2022.0130628 10. Jin B, Zhang C, Jia L, Tang Q, Gao L, Zhao G, Qi H (2022) Identification of rice seed varieties based on near-infrared hyperspectral imaging technology combined with deep learning. ACS Omega 7(6):4735–4749 11. Govathoti S, Reddy AM, Kamidi D, Balakrishna G (2022) Data augmentation techniques on chilly plants to classify healthy and bacterial blight disease leaves. Int J Adv Comput Sci Appli (IJACSA) 13(6). https://doi.org/10.14569/IJACSA.2022.0130618
Chapter 14
Cauliflower Plant Disease Prediction Using Deep Learning Techniques M. Meenalochini and P. Amudha
1 Introduction The recent growth has led to an increase in agricultural and industrial operations and fossil fuel combustion and industrial activities, which has resulted in an increase in gas emissions as a consequence of these activities. The chemical atmosphere has changed due to all these factors [1]. There are many factors to consider while understanding the symptoms and pathology of plant disease [2]. Some of the most common pathogens that affect plants are bacteria and fungus and parasites such as nematodes and parasitic insects. Climate change has made it easier for organisms to travel around, which has led to the emergence of new illnesses that might become uncontrolled outbreaks and threaten food security [3]. Pathogen-detection host proteins have been discovered in the last five to ten years. The intricacy of the plant signalling system extends well beyond the sensing molecules. Experts have employed molecular biology and genetic analysis to find the pathogen proteins that weaken the host’s defences and initiate infection [4]. Temperature and CO2 stimulate the development of fungus spores and reproduction. Climate change is anticipated to worsen crop disease, presenting a danger to global food security [5]. It has always been difficult to determine the extent of illness in rice plants. Visual analysis was historically the sole means of determining rice illness, which needed the naked eye. Someone with expertise in this area must closely monitor disease levels. Visual examination is costly, time-consuming, and difficult when applied to broad regions of plants [6]. Infections cause symptoms in plants and illness because plants are vulnerable to pathogens’ detrimental effects. Most pathogens perform critical functions in
M. Meenalochini (B) · P. Amudha Department of CSE, Avinashilingam Institute for Home Science and Higher Education for Women, Coimbatore, Tamil Nadu, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_14
163
164
M. Meenalochini and P. Amudha
nature, rely on the host for nutrition, and have symbiotic or non-symbiotic partnerships with other plants. A plant’s exterior fruit and foliar symptomatology on fruits and leaves must be recognized before being investigated in the lab [7]. Particles of caulimoviruses are 50 nm in diameter and contain an approximately 8 kbp circular double-stranded DNA particle. Numerous calicivirus genomes, including the cauliflower mosaic virus, have been sequenced. Detecting agricultural illnesses early enough to prevent crop loss is the goal of several researchers who have published their findings [8]. Precision crop production and plant phenotypic plasticity necessitate an accurate evaluation of plant disease occurrence and severity and negative repercussions on yield quality and quantity. A fast and accurate assessment of plant disease incidence is crucial to planning targeted plant protection actions in the field or greenhouse and forecasting disease advances in temporal and geographic dispersion [9]. Virus-host associations evolve and change across time and space, leading to the creation of novel viruses in plants. Bottlenecks, founder events, and drift may be isolated from recombination, mutation, gene flow, and selection in population genomics [10]. Irrigation efficiency is crucial for high-quality food, textiles, and other goods, increasing awareness of water consumption’s economic and environmental effects [11]. Plant breeding relies on genetic resources as a source of genetic variation. Using genetically varied gene banks as genetic resources for breeding cauliflower is possible using genomic prediction and phenotyping of highly heritable characteristics [12]. Cauliflower is a member of the Brassicaceae family and is one of the most significant Brassica crops. As the healthiest food globally, cauliflower has several nutritional and health benefits. It may be used in various dishes because of its mild flavour and distinctive texture [13]. Cauliflower mosaic virus (CaMV) is the target of this investigation to see whether natural plant product carvacrol might be used as an antiphoto viral drug with the postulated mechanism of action [14]. Improving the early identification of cauliflower curd deformation is the primary cause of the later physiological abnormalities of mature cauliflower. For the non-invasive categorization of cauliflower phenotypes, deep learning approaches were used to analyze tomographic pictures acquired with magnetic resonance imaging (MRI) [15]. Plants growing in the field show how difficult it may be to measure in the actual world. Plants treated with diverse fertilization and irrigation treatments had varying leaf area projections based on produced photos, suggesting the method’s ability to predict future plant development under different environmental circumstances [16]. Final findings demonstrated that bioactive chemicals with antioxidant properties, including GLS, phenolics, and flavonoids, are abundant in the aerial cauliflower organs [17].
14 Cauliflower Plant Disease Prediction Using Deep Learning Techniques
165
2 Study on Plant Disease Prediction It is difficult to detect plant diseases directly since it requires a large amount of work and a long period of time. Deep learning techniques are the best approaches for solving this problem. The different study has been carried out by various researchers as listed below. INC-VGGN for image-based plant disease identification was proposed by Chen et al. [18]. Agricultural product quality and quantity may be severely impacted by plant diseases, which have a devastating effect on food production safety. A pretrained VGGNet on ImageNet and Inception has been used in our strategy. In place of randomly initializing the weights, use the pretrained networks on the massive labelled dataset of ImageNet to establish the weights during training. The suggested method has a significant performance advantage over current state-of-the-art techniques. Novel Two-Stage Architecture of a Neural Network for Plant Disease Detection initialized by Arsenovic et al. [19]. Plant diseases are a major source of production losses in agriculture. A comprehensive and reliable plant disease detection tool was already possible because of the recent growth of deep learning algorithms. A unique two-stage neural network design has been developed for plant disease classification. Varied weather conditions, different perspectives, and lighting hours imitate realworld scenarios. Six existing classification algorithms for plant disease prediction are expressed by Morgan et al. [20]. Plant disease classification and prediction have previously been studied using datasets of photographs. This work’s major goal seems to propose classification algorithms that can be applied in datasets that include raw measurements rather than images. Many discrepancies between the fungal and plant datasets that the results in this work might be used in future research for disease prediction and categorization in a range of raw measurement datasets. Cauliflower mosaic virus transmission can be predicted using molecular dynamics simulations expressed by Sharmila et al. [21]. A protein–protein binding relationship between the aphid stylet receptor cuticular protein and viral proteins appears to be particularly likely of the mechanism by which this viral transmission occurs. Proteins aid in the spread of CaMV by aphids. CaMV HCP binds to HCP through a protein–protein docking study and molecular dynamics simulation. Clustered regularly interspaced palindromic repeat, CRISPR-associated (CRISPR-Cas9) editing system for resistance of cauliflower mosaic virus deliberated by Liu et al. [22]. Crop output losses throughout the globe are mostly due to viral infections. The CRISPR-Cas9 technique is for viral control. Combined retargeting the viral coat protein sequence by Cas9-mediated multimodal retargeting of the viral genome may successfully block CaMV. CP sequences from infected transgenic plants were identical to those found in wild-type plants, suggesting that the altered viral chromosome were packed in wild-type coat proteins. Cauliflower mosaic virus infection prediction based on Arabidopsis thaliana proposed by Berges et al. [23]. When abiotic factors impact plant physiology, the pathogenicity of a plant virus was likely to change. We reduced plant vegetative performance and reproductive success by CaMV
166
M. Meenalochini and P. Amudha
infection based on our findings. CaMV tolerance rankings were not affected by water deficiency despite heterogeneity in behaviour across accessions. Early blooming and accession tolerance to CaMV infection are positively associated in our study. Cauliflower mushroom Sparassiscrispa and benefits explained by Kiyama et al. [24]. Phylogenetic research of S. crispa’s genome compared with 25 other fungal species found that S. crispa split from the brown-rot fungus Postia placenta 94 million years ago. S. crispa’s extract and chemicals may be evaluated using two methods: (1) characterization of carbohydrate-active enzyme (CAZyme) genes and glucan synthase genes, and (2) detection of estrogenic activity in its mycelial extract, which was founded in the extract. A better understanding of the genome and genes of S. crispa will allow this organism to be used for medical and pharmacological applications. Cauliflower mosaic virus protein P6-TAV significantly impacts the aphid vector’s ability to feed on the plant discussed by Chesnais et al. [25]. Transgenic Arabidopsis plants with wild-type and mutant forms of P6-TAV were used to assess aphid eating and reproduction to determine the involvement of the CaMV protein P6-TAV. The fecundity of aphids was not affected by a viral infection in any of the transgenic lines, indicating that virulent variables other than infection are to blame. P6-CM plants, but not P6-JI expressing ones, showed altered aphid feeding behaviour. It was discovered that plants with N-terminal alterations in P6 had altered eating habits. Cauliflower mosaic virus across diverse Arabidopsis thaliana accessions expressed by Bergès et al. [26]. CaMV accumulation was favourably linked with aphid transmission of CaMV, but this correlation was reversed when water was scarce. Thus, substantial CaMV accumulation did not promote horizontal transmission in the absence of water. There was no evidence of any additional connections between the virus characteristics studied, and temperature and virus characteristics were linked significantly, although the additional study was performed. When modelling plant virus epidemiology under climate change scenarios, interactions between abiotic stressors and epidemiological features must be considered. An organ and branching system development model and a GRN model were used to simulate the relationship between the GRN explored in this study and the development of a plant architecture described by Azpeitia et al. [27]. Cauliflower’s degree design, with its preserved dynamics of axis elongation, can only be produced by a shift in the genetic network and the organ expansion or suppression that results from it. GRN and growth interact and the plant’s architecture emerge as a result. A wide range of mutant phenotypes associated with genes implicated in the cauliflower GRN may be reproduced owing to this method. IoT network-based plant health detection systems for plant leaves have a variety of patterns that can be spotted easily illustrated by Panhwar et al. [28]. Every nation was working a lot to maintain its agricultural healthy by preserving the environment’s natural beauty. The diagnosis of plant diseases was a difficult task that substantially influenced environmental and industrial growth. Using a CNN model to examine and construct an IoT network system that can identify microscopic objects in plants with
14 Cauliflower Plant Disease Prediction Using Deep Learning Techniques
167
a 95% accuracy rate. We employed an IoT network system and the CNN approach to develop a model for detecting illnesses in leaves. Machine Learning Approach for Predict Quality Parameters for Bacterial Consortium demonstrated by Rashid et al. [29]. The results of comparative research on plant development indicated the following: Cauliflower comes in first, followed by radish, then wheat, rice, and finally hot pepper. It was found that cauliflower, radish, wheat, and rice were the most adaptable plants to the treated hospital wastewater, whereas hot pepper was the most vulnerable. Cauliflower mosaic virus systemic infection was identified by Alers-Velazquez et al. [30]. In two different plant hosts, lower temperatures delayed the onset of systemic symptoms caused by the cauliflower mosaic virus (CaMV). Turnip leaves infected with CaMV nucleic acid had higher quantities of CaMV nucleic acid when exposed to lower temperatures. It was shown that P6, a major CaMV inclusion body (IB) protein, generated aggregates at a lower temperature. The actins scaffolding was finally reorganized to its final position at a lower temperature. A strategy for calibrating the LEACHM and EU-Rotate N models to simulate water and nitrogen processes in cauliflower crops was already being developed by Lidón et al. [31]. An approach that relies on time-dependent generalized sensitivity indices was used to identify the most important model parameters to minimize the number of calibrations required and prevent over-parameterization. Soil water and nitrate content data were used in an optimization procedure to establish the ideal value and produce calibrated models, which then incorporated the most influential factors. Cauliflower was one popular vegetable found in various preparations, including fried or boiled, soup, curry, and pickles expressed by Rakshita [32]. It contains anti-cancer glucosinolates, giving the plant its distinctive aroma, bitter flavour, and spicy taste. Cauliflower contains antioxidants selenium and ascorbic acid. In this study, researchers found that temperature significantly influenced curd initiation, harvesting time, and other features of the cauliflowers tested. Ensiling of vegetable wastes, such as cauliflower leaves was tested under various temperature simulations to see how it affected the ensiling process [33]. A cylindrical plastic silo with an initial bonding strength of ethanol was employed for both the contaminated and controlled cauliflower leaves. Significantly improved silage quality and microbial variety increased the station’s ability to produce lactic acid and lower pH.
3 Merits and Demerits of an Existing Method The merits and demerits of an existing method are given in Table 1.
Title
Using deep transfer learning for image-based plant disease identification
Solving current limitations of deep learning-based approaches for plant disease detection
Plant disease prediction using classification algorithms
Molecular dynamics investigations for the prediction of molecular interaction of cauliflower mosaic virus transmission helper component protein complex with Myzuspersicaestylet’s cuticular protein and its docking studies with annosquamosin. An encapsulated in nano-porous silica
Reference no.
Chen et al. [18]
Arsenovic et al. [19]
Morgan et al. [20]
Sharmila et al. [21]
Table 1 Merits and demerits of an existing method Demerits
A protein–protein binding connection between the aphid stylet receptor cuticular protein and viral proteins appears to be the most likely mechanism by which this viral transmission is achieved
(continued)
Removing the cuticle components and doing biochemical experiments on live insects have prevented the 3D structure of ASR from being established
The classification algorithms can be In the soybean dataset, we have proven that applied in datasets that include raw ANN and KNN are the top classifiers in measurements rather than images terms of accuracy. Still, ANN is likely the preferable option as KNN classification was never frequently used for plant datasets
Use a controlled environment to Multi-disease or multi-occurrence detection evaluate the impact of training in is not possible with the methods currently plant village dataset and use it in in the works real-life circumstances to properly identify plant diseases in a complicated background, including identifying several diseases in a single leaf
There is a significant gain in This method can only be used in a small performance compared with other number of places and cannot be widely current approaches; it reaches applied validation accuracy when tested with (www.plantvillage.org) dataset
Merits
168 M. Meenalochini and P. Amudha
Title
CRISPR/Cas9-mediated resistance to cauliflower mosaic virus
Cauliflower mosaic virus infection prediction based on Arabidopsis thaliana
Genome sequence of the cauliflower mushroom Sparassiscrispa (Hanabiratake) and its association with beneficial usage
Cauliflower mosaic virus protein P6-TAV plays a major role in alteration of aphid vector feeding behaviour but not performance on infected Arabidopsis
Reference no.
Liu et al. [22]
Bergès et al. [23]
RyoitiKiyama et al. [24]
Chesnais et al. [25]
Table 1 (continued)
CaMV P6-TAV was involved in aphid eating and reproduction in transgenic Arabidopsis plants with wild-type and modified iterations of P6-TAV
In addition to identifying CAZyme and glucan synthase genes, the extract was also determined to have estrogenic action in its mycelial form
CaMV infection lowered both plant’s vegetative and reproductive performance. Despite the reality that accessions varied widely in their behaviour, the CaMV tolerance rankings were unaffected by water shortages
As a result, Cas9-mediated combinatorial retargeting of the virally encoded protein sequence may help prevent CaMV, which has a two-stranded genome
Merits
(continued)
Although lower fecundity is a proxy for aphid fitness, it does not necessarily mean that aphids would depart CaMV-infected plants quicker than healthy plants. Consequently, it is unclear if this holds for CaMV on Arabidopsis. Phloem composition may have altered in infected plants, but not transgenic
According to phylogenetic comparison with other fungal chromosomes and particular gene functions, there is little evidence that the stated gene functions are as unique as originally supposed
Even though these plants displayed obvious viral symptoms, CaMV infection had no substantial impact on the dry mass production of accessions from groups I and II
Resistance mediated by R genes is limited by genetic resources’ R gene availability and, in many situations, strain specificity. However, as Cas9 is not present in approximately these siRNAs, these siRNAs are not accountable for CaMV infection inhibition
Demerits
14 Cauliflower Plant Disease Prediction Using Deep Learning Techniques 169
Title
Water deficit changes the relationships between epidemiological traits of caulifower mosaic virus across diverse Arabidopsis thaliana accessions
Cauliflower fractal forms arise from perturbations of floral gene networks
The plant health detection systems for plant leaves based on IoT Network
Machine learning approach to predict quality parameters for bacterial consortium-treated hospital wastewater and phytotoxicity assessment on radish, cauliflower, hot pepper, rice and wheat crops
Reference no.
Bergès et al. [26]
Azpeiti et al. [27]
Panhwar et al. [28]
Rashid et al. [29]
Table 1 (continued)
When exposed to hospital wastewater treatment, the most resilient plants were discovered to be cauliflower, radish, wheat, and rice. In contrast, the most sensitive plants were found to be hot peppers
Using a CNN model to evaluate and develop an IoT network system can identify minute items in plants with a 95% accuracy rate
This method reproduces a wide variety of mutant phenotypes linked to genes implicated in the cauliflower GRN
In climatic variability, abiotic stresses and epidemiological aspects must be considered when predicting the spread of plant viruses
Merits
(continued)
Due to a lack of enforcement of these restrictions, untreated hospital wastewater has been discharged with physicochemical properties that fall beyond the acceptable range
All baseline methods adopted the unknown categorization of plant health detection as an additional challenge. All available plant health monitoring techniques have relied on static mechanisms that can only be detected at a predetermined time
Even though the qualitatively identical finite n scenario calls for additional parameter values, no closed-form solutions are found. There is no auxin signalling route and F may represent various flowering-inducing mechanisms in our paradigm; both auxin and F are phenomenologically active
CaMV accumulation was favourably linked with CaMV transmission by aphids. This association was reversed when a water shortage occurred, and CaMV accumulation did not promote horizontal transmission in a water-deficient environment
Demerits
170 M. Meenalochini and P. Amudha
Title
Lower temperature influences cauliflower mosaic virus systemic infection
Sensitivity analysis and parameterization of agricultural models in cauliflower crops
Understanding population structure and detection of QTLs for curding-related traits in Indian cauliflower by genotyping by sequencing analysis
Effects of Lactobacillus plantarum additive and temperature on the ensiling quality and microbial community dynamics of cauliflower leaf silages
Reference no.
Alers-Velazquez et al. [30]
Lidón et al. [31]
Rakshita et al. [32]
HaiweiRen et al. [33]
Table 1 (continued)
Adding extracellular L. Plantarum to silage fermentation boosted the process significantly. Microorganism diversity decreased as a result of this enrichment of Lactobacillus and Weizella
To investigate the eight traits associated with curding behaviour in a variety of Indian cauliflower germplasm
It was utilized to determine the most relevant model parameters to reduce the number of calibrations and avoid over-parameterization
This method is used to reduce temperature exposure and increase the quantities of CaMV nucleic acid in infected turnip leaves
Merits
In addition, this investigation did not find either butyric acid or propionic acid, two potentially harmful by-products of the ensiling fermentation, suggesting that the silages were well-preserved
The cauliflower’s hypersensitivity to temperature fluctuations is not grower-friendly since a lack of attention to cultivar selection might result in a significant financial loss
In terms of computing costs, global sensitivity approaches are substantially more costly than local sensitivity methods
Chinese violet cress infected with CM1841 was grown in two greenhouses with differing average temperatures
Demerits
14 Cauliflower Plant Disease Prediction Using Deep Learning Techniques 171
172
M. Meenalochini and P. Amudha
4 Analysis and Performance Comparisons The survey is taken for effective analysis of cauliflower plants with disease prediction using IoT and deep learning technique. The results are related to the existing methods such as PCA, LDA, DT, RF, DNN, and CNN as shown in the Fig. 1. The cauliflower disease is identified using the IoT devices placed on the farmland, and the condition is predicted using GNN. The results show higher accuracy than the previous models. The proposed model’s accuracy may be improved by enlarging and refining plant data using better photos. The precision evaluation and comparison results of the proposed GNN-PDP are made, and the results are compared with the previously available models. The combined precision results for both existing and proposed GNN-PDP are depicted in Fig. 2. The precision shows the number of correct results predicted by the proposed model.
Fig. 1 Accuracy evaluation and comparisons
14 Cauliflower Plant Disease Prediction Using Deep Learning Techniques
173
Fig. 2 Precision evaluation and comparison
5 Conclusion Cauliflower is a significant winter crop, both in terms of area cultivated and output generated. Cauliflower plants can be damaged by significant illnesses if they are not properly cared for, resulting in decreased yields and lower quality. Plant disease monitoring by hand is a serious worry since it takes a long time and requires a lot of effort. The use of computer vision for disease detection is on the rise. In order to tackle the problem of disease prevention and treatment information with farmers, plant disease prediction methods (PDPT) are essential. The most effective methods for resolving this issue are those based on deep learning techniques. Many approaches to deep learning techniques are explored and the survey is taken with existing methods such as PCA, LDA, DT, RF, DNN, and CNN. As a result of this investigation, the numerous methods that affect cauliflower plants as well as the ways in which disease prediction might be improved are discussed.
References 1. Gullino ML, Pugliese M, Gilardi G, Garibaldi A (2018) Effect of increased CO2 and temperature on plant diseases: a critical appraisal of results obtained in studies carried out under controlled environment facilities. J Plant Pathol 100(3):371–389 2. Das S, Pattanayak S, Bammidi M (2020) A real time surveillance on disease and pest monitoring, characterization and conventional management strategy of major cultivated crops in tropical savanna climatic region of Srikakulam Andhra Pradesh. IJCS 8(3):958–971 3. Osuna-Cruz CM, Paytuvi-Gallart A, Di Donato A, Sundesha V, Andolfo G, AieseCigliano R, Ercolano (2018) RPRGdb 3.0: a comprehensive platform for prediction and analysis of plant disease resistance genes. Nucleic Acid Res 46(D1):D1197–D1201 4. Nobuta K, Meyers BC (2015) Pseudomonas versus Arabidopsis: models for genomic research into plant disease resistance. Bioscience 55(8):679–686
174
M. Meenalochini and P. Amudha
5. Siciliano I, Berta F, Bosio P, Gullino ML, Garibaldi A (2017) Effect of different temperatures and CO2 levels on Alternaria toxins produced on cultivated rocket, cabbage and cauliflower. World Mycotoxin J 10(1):63–71 6. Sethy PK, Barpanda NK, Rath AK, Behera SK (2020) Image processing techniques for diagnosing rice plant disease: a survey. Procedia Comput Sci 167:516–530 7. Golhani K, Balasundram SK, Vadamalai G, Pradhan B (2018) A review of neural networks in plant disease detection using hyperspectral data. Inf Process Agric 5(3):354–371 8. Rathore NPS, Prasad L (2021) A comprehensive review of deep learning models for plant disease identification and prediction. Int J Eng Syst Modell Simul 12(2–3):165–179 9. Thomas S, Kuska MT, Bohnenkamp D, Brugger A, Alisaac E, Wahabzada M, Mahlein AK (2018) Benefits of hyperspectral imaging for plant disease detection and plant protection: a technical perspective. J Plant Dis Prot 125(1):5–20 10. McLeish MJ, Fraile A, García-Arenal F (2021) Population genomics of plant viruses: the ecology and evolution of virus emergence. Phytopathology 111(1):32–39. https://doi.org/10. 1094/PHYTO-08-20-0355-FI 11. Café-Filho AC, Lopes CA, Rossato M (2019) Management of plant disease epidemics with irrigation practices. Irrigation Agroecosyst 123. https://doi.org/10.5772/intechopen.78253 12. Thorwarth P, Yousef EA, Schmid KJ (2018) Genomic prediction and association mapping of curd-related traits in gene bank accessions of cauliflower. G3: Genes, Genomes, Genetics 8(2):707–718 13. Tan H, Wang X, Fei Z, Li H, Tadmor Y, Mazourek M, Li L (2020) Genetic mapping of green curd gene Gr in cauliflower. Theor Appl Genet 133(1):353–364 14. Bansal A, Jan I, Sharma NR (2020) Anti-phytoviral activity of carvacrolvis-a-vis cauliflower mosaic virus (CaMV). In: Proceedings of the national academy of sciences, India section B: biological sciences, vol 90, no 5, pp. 981–988 15. Zhou Y, Maître R, Hupel M, Trotoux G, Penguilly D, Mariette F, Parisey N (2021) An automatic non-invasive classification for plant phenotyping by MRI images: an application for quality control on cauliflower at primary meristem stage. Comput Electron Agric 187:106303 16. Drabiska N, Je˙z M, Nogueira M (2021) Variation in the accumulation of phytochemicals and their bioactive properties among the aerial parts of Cauliflower. Antioxidants 10(10):1597 17. Drees L, Junker-Frohn LV, Kierdorf J, Roscher R (2021) Temporal prediction and evaluation of Brassica growth in the field using conditional generative adversarial networks. Comput Electron Agric 190:106415 18. Chen J, Chen J, Zhang D, Sun Y, Nanehkaran YA (2020) Using deep transfer learning for image-based plant disease identification. Comput Electron Agric 173:105393 19. Arsenovic M, Karanovic M, Sladojevic S, Anderla A, Stefanovic D (2019) Solving current limitations of deep learning based approaches for plant disease detection. Symmetry 11(7):939 20. Morgan M, Blank C, Seetan R (2021) Plant disease prediction using classification algorithms. IAES Int J Artif Intell 10(1):257 21. Sharmila D, Blessy JJ, Rapheal VS, Subramanian KS (2019) Molecular dynamics investigations for the prediction of molecular interaction of cauliflower mosaic virus transmission helper component protein complex with Myzuspersicaestylet’scuticular protein and its docking studies with annosquamosin-A encapsulated in nano-porous Silica. VirusDisease 30(3):413–425 22. Liu H, Soyars CL, Li J, Fei Q, He G, Peterson BA, Wang X (2018) CRISPR/Cas9-mediated resistance to cauliflower mosaic virus. Plant Direct 2(3):e00047 23. Berges SE, Vasseur F, Bediee A, Rolland G, Masclef D, Dauzat M, Vile D (2020) Natural variation of Arabidopsis thaliana responses to Cauliflower mosaic virus infection upon water deficit. PLoS Pathog 16(5):e1008557 24. Kiyama R, Furutani Y, Kawaguchi K, Nakanishi T (2018) Genome sequence of the cauliflower mushroom Sparassiscrispa (Hanabiratake) and its association with beneficial usage. Sci Rep 8(1):1–11 25. Chesnais Q, Verdier M, Burckbuchler M, Brault V, Pooggin M, Drucker M (2021) Cauliflower mosaic virus protein P6-TAV plays a major role in alteration of aphid vector feeding behaviour but not performance on infected Arabidopsis. Mol Plant Pathol 22(8):911–920
14 Cauliflower Plant Disease Prediction Using Deep Learning Techniques
175
26. Bergès SE, Vile D, Yvon M, Masclef D, Dauzat M, van Munster M (2021) Water deficit changes the relationships between epidemiological traits of Cauliflower mosaic virus across diverse Arabidopsis thaliana accessions. Sci Rep 11(1):1–11 27. Azpeitia E, Tichtinsky G, Le Masson M, Serrano-Mislata A, Lucas J, Gregis V, Parcy F (2021) Cauliflower fractal forms arise from perturbations of floral gene networks. Science 373(6551):192–197 28. Panhwar AO, Sathio AA, Lakhan A, Umer M, Mithiani RM, Khan S (202) Plant Health Detection Enabled CNN Scheme in IoT Network. Int J Comput Digit Syst 11(1) 29. Rashid A, Mirza SA, Keating C, Ijaz UZ, Ali S, Campos LC (2022) Machine learning approach to predict quality parameters for bacterial consortium-treated hospital wastewater and phytotoxicity assessment on Radish, Cauliflower, hot pepper. Rice Wheat Crops Water 14(1):116 30. Alers-Velazquez R, Khandekar S, Muller C, Boldt J, Leisner S (2021) Lower temperature influences Cauliflower mosaic virus systemic infection. J Gen Plant Pathol 87(4):242–248 31. Lidón A, Ginestar D, Carlos S, Sánchez-De-Oleo C, Jaramillo C, Ramos C (2019) Sensitivity analysis and parameterization of two agricultural models in cauliflower crops. Span J Agric Res 17(4):e1106–e1106 32. Rakshita KN, Singh S, Verma VK, Sharma BB, Saini N, Iquebal MA, Behera TK (2021) Understanding population structure and detection of QTLs for curding-related traits in Indian cauliflower by genotyping by sequencing analysis. Funct Integr Genomics 21(5):679–693 33. Ren H, Feng Y, Pei J, Li J, Wang Z, Fu S, Peng Z (2020) Effects of Lactobacillus plantarum additive and temperature on the ensiling quality and microbial community dynamics of cauliflower leaf silages. Biores Technol 307:123238
Chapter 15
Disease Detection and Prediction in Plants Through Leaves Using Convolutional Neural Networks Satyam R. D. Dwivedi, N. Sai Gruheeth, P. Bhargav Narayanan, Ritwik Shivam, and K. Vinodha
1 Introduction Agriculture is a crucial sector in our country, contributing to 20.2% of our GDP and employing 66 percent of the population [1]. However, the increasing population and industrialization have led to a decline in land for cultivation, with a net area sown decreasing by about 15 thousand hectares annually in the last decade [2]. Furthermore, the quality of land and water for irrigation has deteriorated due to the continuous production of non-biodegradable waste by industries, and the emission of harmful gases has had an adverse effect on plant health. These factors have caused crop production failures, leading to an estimated global economic loss of 20 billion dollars per year due to crop plant disease. Even with good initial cultivation, crucial traits determining a crop’s health may go unnoticed by farmers. Thus, there is a pressing need to address these issues and increase food and raw material production to meet the demands of our growing population while preserving our land and resources. India has a rich history of agriculture dating back to the Indus Valley Civilization, where it was a leading producer of wheat and grain, our agricultural output has had an impact on the economies of other parts of the world. This practice can be traced back to the Middle Ages [3]. Despite this, the country is experiencing economic losses due to crop damage caused by disease. Identifying these diseases is essential to ensure food security for India‘s rapidly growing population. To address this issue, a deep learning model will be developed that can analyze photos of plant leaves to determine N. Sai Gruheeth, P. Bhargav Narayanan, Ritwik Shivam, K. Vinodha are contributed equally to this work. S. R. D. Dwivedi (B) · N. Sai Gruheeth · P. Bhargav Narayanan · R. Shivam · K. Vinodha Department of CSE, PES University, Bengaluru, India e-mail: [email protected] K. Vinodha e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_15
177
178
S. R. D. Dwivedi et al.
if a plant is affected. Dark spot patterns on leaves indicate disease symptoms and convolutional neural networks will be used to create a system that can identify and classify diseases. The goal is to create a user-friendly mobile application that can capture photographs and provide quick results. With the widespread adoption of mobile phone technology and increased access to smartphones and mobile data, mobile scientific tools like this application can be used to help identify and treat diseased plants early on, ultimately improving agricultural output and food security in India. We intend to design a system that will detect and identify disease in plants using leaf images based on the deep learning concept—convolutional neural networks. The main knowledge gap resides in preparing the model for prediction. The model is trained on a generalized training platform like Google Colab usually consumes time based on each epoch due to the number of training parameters. Most cited papers have used deep architectures which is not necessary for obtaining the best results considering that it might lead to overfitting of data. Very minimal efforts have been made to control this issue in other research. Considering these parameters, we intend to address these issues. The most difficult part is dealing with large datasets, which although required to train the model, also cause issues such as overfitting. Integration of model and application components to ensure proper operation is one of our top priorities, as the application is a bi-product containing a novelty feature. The decision to use CNN for this purpose was made in order to obtain spatial and temporal information about an object via feature maps. CNN uses fixed-dimensional kernels as filters. The feature maps are then fed into the classifier via the feedforward neural network. Because kernels require significantly less space to process an image of the same size as feed-forward neural network layers, CNN has simplified and improved a wide range of applications, including image processing. There are multiple CNN architectures used, as well as different stride and padding settings, counts of convolutional and pooling layer occurrences, and sequences. Because of residual connections in more recent designs, these models can have greater depth without suffering the negative effects of vanishing gradients. The Flutter framework was used to develop the application, using the Dart programming language and in-built packages to create widgets. Flutter’s features, such as hot-reload, make the development process smoother, and it also offers good state management. The mobile application is designed to be user-friendly and simple, allowing any user to upload an image from their gallery or camera and obtain the results. Flask, a Python package, is used to integrate the frontend and backend in this work. The backend is hosted on AWS EC2, a cloud platform, and AWS S3 is used for storing and formatting the data. The Plant Village dataset, which has 54,305 images classified into 38 classes, is used in this work. The dataset has a uniform background. The data is anonymized, and no user information is stored as the file name formatting changes when the image is picked from the application.
15 Disease Detection and Prediction in Plants Through …
179
2 Literature Survey The subsequent paragraphs review some of the earlier approaches supporting disease detection using machine learning. A thorough review of the literature reveals that the focus has largely been on specific disease classes affecting a vast majority of plants or on a wide array of diseases that affect a single plant belonging to various geographical locations. Hirani et al. [4] present a customized convolutional neural network featuring three convolutional layers, ’relu’ activation, MaxPooling2D layers, and a single fully connected layer with ’relu’ activation. The model employs batch normalization and dropout functions to enhance learning efficiency and reduce overfitting. With an input image size of 256 × 256, the model utilizes 38 neurons in the final layer with ‘softmax’ activation, comprising 28,903,164 trainable and 64 non-trainable parameters. Marzougui et al. [5] propose a neural network inspired by notable architectures, particularly ’ResNet’, combined with data augmentation for early plant disease identification and overfitting reduction. ResNet’s shortcut connections perform identity mapping without adding parameters or complexity, leading to a 3.57% error rate. The success of ResNet is attributed to its “residual connections”, achieving the accuracy of more complex networks. Bedi and Gole [6] introduce a hybrid model that merges convolutional autoencoders (CAE) and convolutional neural networks (CNN). The hybrid model features 5751 trainable and 2776 non-trainable parameters, while the CAE network contains 4163 trainable parameters, totaling 9914 trainable parameters. The model is trained using 3342 peach leaf images and tested on 1115 leaf images. Guan [7] proposes a deep learning-based disease detection method by combining four CNN models-Inception, ResNet, Inception-ResNet, and DenseNet-using stacking, an ensemble learning technique. This approach improved prediction accuracy by 3–4% through fine-tuned models, batch normalization, and processing images in a “fine-tuning step”. The final prediction is based on the highest probability class after stacking learners together, increasing training time to over 90 h. Hassan et al. [8] compare AlexNet and VGGNet, highlighting the lower computational efficiency and parameter count of the Inception model. They employ depthwise separable convolution in the inception block and choose MobileNetV2 for its cost-effective convolutional layer. The residual network focuses on retaining useful learned aspects while minimizing computations and parameters, achieving a more efficient decision-making process. Mohanty et al. [9] evaluate the applicability of deep convolutional neural networks, specifically AlexNet and GoogLeNet, for classification problems using the PlantVillage dataset. Both architectures are assessed through 60 experimental configurations with two training mechanisms (training from scratch and transfer learning), different dataset types (Color, Grayscale, Leaf Segmented), and varying trainingtesting set distributions. Each experiment runs for 30 epochs with standardized hyperparameters. Sharma et al. [10] explain an accurate AI solution for detecting and classifying plant leaf diseases using convolutional neural networks. The model utilizes a dataset
180
S. R. D. Dwivedi et al.
of over 20,000 images across 19 classes and can be improved by using larger datasets, tuning hyperparameters, and including remedies. It can be deployed on Android and iOS platforms to assist farmers. Omkar [11] discusses the implementation of a deep convolutional neural network in several phases for crop disease detection and classification using the PlantVillage dataset. Phases include collecting the dataset, pre-processing images, training the CNN model to identify crop types and detect diseases, and validating the model with obtained results. Transfer learning with pre-trained models, such as InceptionV3 and MobileNet, is used to build the deep learning model. Karlekar et al. [12] compare SoyNet, their ML model with three hand-crafted feature-based methods and six popular CNN models, including VGG19, GoogleLeNet, Dense121, XceptionNet, LeNet, and ResNet50. All experiments are performed on the PDDB database, which has 16 classes. The results show that SoyNet outperforms the nine state-of-the-art methods/models in average accuracy, average precision, average recall, and average F1-score. Sharma et al. [13] demonstrate the feasibility of training a CNN model using segmented and annotated images instead of full images, resulting in improved performance on independent data (42.3–98.6%). The self-classification confidence also significantly increases. Pre-processing images before model training can prove invaluable for achieving high real-world performance as better datasets become available. Mohameth et al. [14] implement deep feature extraction and deep learning techniques on the Plant Village dataset to detect plant diseases, testing VGG16, Google Net, and ResNet 50. Results favored feature extraction over transfer learning in terms of accuracy and execution time. Future work aims to collect data in sub-equatorial zones, studying plant behavior and comparing it with the Plant Village dataset to determine optimal growth environments. Singh et al. [15] survey plant leaf disease classification techniques and introduce an image segmentation algorithm for automatic detection and classification. Tested on ten species, including bananas, beans, and tomatoes, the proposed algorithm efficiently recognizes and classifies leaf diseases with minimal computational effort. It also enables early stage disease identification. To improve classification, Artificial Neural Networks, Bayes classifiers, Fuzzy Logic, and hybrid algorithms can be employed. Qin et al. [16] explore lesion image segmentation, feature extraction, normalization, and selection for alfalfa leaf disease recognition. Among the twelve segmentation methods, the best results were obtained using K-median clustering and linear discriminant analysis. A total of 129 features were extracted, with the ReliefF method selecting the top 45 for the SVM model, achieving 97.64% training set and 94.74% testing set accuracies. The study provides a feasible solution for diagnosing and identifying alfalfa leaf diseases. Tiwari et al. [17] utilize transfer learning to develop an automated system for diagnosing and classifying potato leaf diseases, such as early blight, late blight, and healthy leaves. The novel solution achieves a 97.8% classification accuracy, improving by 5.8 and 2.8% compared to previous studies. This technique can assist farmers in early disease detection and improve crop yields.
15 Disease Detection and Prediction in Plants Through …
181
The reader will clearly understand how the novel feature is introduced and how the work is proposed in the following sections.
3 Proposed Methodology Some of the various techniques being practiced worldwide are discussed below along with amendments made to them. Keeping the old approaches (contributed via literature survey) in consideration, we move ahead with an amalgamation of all the ideas we have encountered so far. These include the 3 types of image representations namely Black and White, RGB, and Grayscale images, with the cumulative results we compare and pick the best methods. With the addition of batch normalization to a set of layers in the sequential model, the model limits overfitting, higher learning rates, ensures better results, and with extra aid of data augmentation we refine our control over overfitting the data which is a general characteristic of deep learning architectures. Batch normalization is the innovative element that we include in our approach to prevent overfitting and broaden the learning curve of the model. Although batch normalization is used in many research studies, one of the novel aspects of this research is the introduction of batch-normalization layers in specific locations to reduce overfitting and produce better results. The above approach bundled with the rich catalog of optimizers and regularizes, for instance, Adam, RMS, RMS prop, etc. enables us to simulate various permutations and combinations with a tradeoff between time to evaluate metrics and gain a better combination of the lot. The capabilities of the model when combined with the application elevate the system to new realms of usability and practicality. The app communicates with the backend via RESTful API calls and receives the response via the same method, and this seamless integration results in an outstanding user experience. Figure 1 provides a general idea of how the complete system will be integrated, and a more detailed description of the system is provided in the implementation section.
Fig. 1 Represents the three components of the system when viewed from a higher level. It shows the way the components communicate with each other. However, this diagram is explained in detail in the upcoming section
182
S. R. D. Dwivedi et al.
This research utilizes data from a source dataset, as presented in Table 1, which provides information on the content and characteristics of the input data used in the study.
4 Implementation The implementation process is split into 3 parts for a better overview and understanding as shown in Fig. 1. The frontend, backend, and model. The frontend is an application built on the Flutter framework using the Dart programming language. The application is very user-friendly and scripted using the basic state management provided by the material package. The application uses some imported functions to make RESTful API calls (HTTP calls for upload-POST and predict-GET). To do this, the Dart programming language offers the ‘http’ package via which POST and GET requests are made. To upload the image via the application, a feature to select the image from either capturing the image using an in-built camera as well as select the image from a gallery is provided. This is done using a package named ‘image-picker’. This brings out a feature in our application in order to keep the uploaded images anonymous by keeping the user’s metadata private by renaming the file to a standard format while picking the image. Leading the image selection comes the POST request. This is done using a Multi-Part request which is used to upload files over to a server. It sends/uploads files over to the backend/server in a single request. Once the image is uploaded onto the backend, the image is processed, and the prediction function is initiated which returns a JavaScript Object Notation (JSON) object from the backend as shown in Fig. 1. This JSON object is parsed using the ‘dart:convert’ package by decoding it from a string to an object. The necessary details such as the sample label, its status (Healthy/Diseased), and the confidence metric of prediction in percentage are packed in this JSON which is parsed and displayed as the result on the screen. This concludes the view of the frontend from the user’s end. The backend is scripted using the Flask framework using the Python programming language. It basically handles image retrieval and the prediction of the input image. To get into depth, this can be split into two functions: upload and predict. Before visiting the upload and predict functionality, the backend loads a saved model (.h5 file) to perform the prediction. This saved model comes from the model part of the system where the model is trained, evaluated, and saved. The upload function takes care of retrieving the image from the multi-part request made at the front- end. This image is uploaded to AWS S3 storage to access it as a file. The S3 access is done using a package called ‘boto3’. The image is saved in an S3 bucket. The predict function is associated with a GET request at the frontend. The image file name is passed as a parameter in the URL. This parameter helps in tracking the file name in S3. A GET request is performed using the ‘requests’ package to get the file from the S3 bucket. This image is then converted to an array and its dimensions (shape of the input array) are expanded in order to pass it through
15 Disease Detection and Prediction in Plants Through … Table 1 Plant Village dataset’s classes Class # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
183
Plant disease classes Apple scab Apple black rot Apple cedar apple rust Apple healthy Blueberry healthy Cherry healthy Cherry powdery mildew Corn cercospora gray leaf spot Corn common rust Corn healthy Corn northern leaf blight Grape Black rot Grape esca black measles Grape healthy Grape blight Isariopsis Orange citrus greening Peach bacterial spot Peach healthy Pepper bell bacterial spot Pepper bell healthy Potato early blight Potato healthy Potato late blight Raspberry healthy Soybean healthy Squash powdery mildew Strawberry healthy Strawberry leaf scorch Tomato bacterial spot Tomato early blight Tomato healthy Tomato late blight Tomato leaf mold Tomato septoria leaf spot Tomato two-spotted spider mite Tomato target spot Tomato mosaic virus Tomato yellow leaf curl virus
Source Original dataset was taken down from plantvillage.org. This dataset is sourced from a GitHub repository containing original data of 38 classes and a dataset of 54,305 images. All images have a uniform background.
184
S. R. D. Dwivedi et al.
the model for prediction. This is a measure of pre-processing the input image for prediction. Following this, the prediction is done using model.predict() which results in an array that produces confidence values of all the classes present in the dataset. The element having the largest value in this array is the label of the input image. Therefore, the class name at that index is retrieved and the confidence of that class name is returned with the filename enclosed in a dictionary with the key and its values. A JSON object is created using jsonify(dictionary) converting the dictionary into a JSON object. This JSON object is then returned to the GET request made from the frontend. The model is built using the TensorFlow and Keras packages of Python language. The initial step is to include necessary packages such as NumPy, TensorFlow, and Keras. In our method of preparing the data to train the model, instead of using the SKLearn package’s train-test-split, we have manually split the data into train, test, and validation data with the ratios 0.8, 0.1, and 0.1. This was done in order to test the model manually with the images from the test dataset as input to the application once the system is integrated. The train split consists of 43429 images and the validation split consists of 5417 images. The model constructed is a custom-built sequential model consisting of Conv2D, LeakyRelu, BatchNormalization, MaxPooling2D, Dropout, and some Flatten and Dense layers. A particular sequence is maintained in arranging the set of layers. The set begins with a convolutional(Conv2D) layer followed by an activation layer (LeakyRelu). To normalize the learning, we add the batch-normalization layer. Following this is the Max-Pooling layer which reduces the number of dimensions. A dropout layer is added to this in order to maintain the balance in training weights. These layers are arranged in a specific set and in a repetitive manner with different parameters. Finally, toward the end, a flatten layer is added to bring down the number of parameters without affecting the training accuracy. To reduce losses from the model, the Adam optimizer is used which controls loss over categorical cross-entropy based on the accuracy metric. This model is then trained over 10 epochs with a training and validation dataset. To check the model’s performance on unseen data, we evaluate the model with the test data split. This model is saved for prediction which happens at the backend with initiation at the frontend.
5 Experiments and Results To run and evaluate our methods, we used the following hardware; an AMD Ryzen 5 3600 CPU 4.2 GHz x 6 processor (CPU) with 16 GB of RAM, and a graphics processing unit (GPU) NVIDIA GTX 1660 Super with an internal V-RAM of 6 GB under the Windows 11 operating system (64 bits). This work started by preparing a basic model able to predict the classes of diseases without any load. The first model prepared was a sequential model consisting of 4 convolutional (conv2D) layers and 4 Max-Pooling (maxPooling2D) layers appended with 2 dense layers and a flatter layer. There were approximately 8 million trainable
15 Disease Detection and Prediction in Plants Through …
185
parameters, and the model was run on 5 epochs. This gave an accuracy of 86% with a high loss in features of trained images. To avoid the issue of wrong classification, an increase in the number of epochs was made. On training the model with 10 epochs, the model was over-fitted. To control the learning rate, an addition of a batch-normalization layer at the end (traditional method) and dropout layers was made. This resulted in an accuracy of 91%. The prediction array gave inaccurate results with high confidence(probability). This happened due to the mismatched arrangement of prediction classes and input data. This problem was overcome by defining a global list containing the class names. To make the model better, more convolution layers were added with max-pooling layers. The set of these layers was increased from 4 to 8. In addition to this, an addition in activation function layers was made. We chose to go with the LeakyRelu Activation function as it manages to avoid the problem of exploding and vanishing gradients which results in a loss in feature capturing during training. With approximately 42 million trainable parameters the model was giving an accuracy of 96% with an evaluation accuracy of 92% on unseen data. To keep the model away from overfitting, we introduced a dropout feature to every set of Conv2D and Max-Pooling layers, but batch-normalization plays a better role than the dropout feature although it doesn’t bring any change in the number of parameters. Hence the dropout layer was replaced with a batch-normalization layer by avoiding the traditional method of keeping a single batch-normalization layer at the end. At this phase, the model had 54 million parameters to train and had to be optimized. This model gave an accuracy of 92% with a test accuracy of 89%. The optimization of the model was done by increasing the pooling layer size from 2 × 2 to 3 × 3 which reduced the parameters from 54 million to 7 million. The same model previously giving an accuracy of 92% gave an accuracy of 97% with validation accuracy of 92% when trained on 10 epochs. An observation made here was when the prediction array was generated for the input image, there was the existence of negative values in the prediction array. These negative values belonged to the class to which the input image did not belong. To retrieve the confidence(probability) of the prediction, an addition of the Softmax activation function was made. The highest value in the probability array is the predicted class of the input image. Softmax also has this feature where it converts this probability as a percentile system, where the class with the highest probability is given 100% and with respect to that other classes are assigned a percentile. So the confidence metric for the rightly classified element would display a 100%. Given the system, there is an accuracy of 92% on unseen data and 96–97% accuracy on trained data. Max-pooling is used to preserve the spatial information and also reduce the number of parameters in the model which reduces the run time and storage space occupied by the model in the backend and optimizes the overall efficiency of the product. Leaky relu allows the negative values as well as positive values and avoids all the negative values to clustering around zero and adds balance and restraints loss of data. Maxpooling alongside with dropout layer which randomly sets input units to 0 with a frequency of rate at each step during training time, which helps prevent overfitting and improves accuracy over unseen data of wider variety.
186
S. R. D. Dwivedi et al.
Table 2 CNN architecture-model summary and parameter description Layer (type) Output shape Param # Rescaling Conv2D LeakyReLu BatchNormalization MaxPooling2D Dropout Conv2D LeakyReLu BatchNormalization ... Flatten Dense LeakyRelu BatchNormalization Dropout Dense Activation
(None, 256, 256, 3) (None, 256, 256, 32) (None, 256, 256, 32) (None, 256, 256, 32) (None, 85, 85, 32) (None, 85, 85, 32) (None, 85, 85, 64) (None, 85, 85, 64) (None, 85, 85, 64)
0 896 0 128 0 0 18,496 0 256
(None, 4608) (None, 1024) (None, 1024) (None, 1024) (None, 1024) (None, 38) (None, 38)
0 4,719,616 0 4096 0 38,950 0
Total params: 7,111,590 Trainable params: 7,106,662 Non-trainable params: 4928
As our prime focus is producing results in terms of probability and the values must lie between 0 and 1, this can be achieved by using the softmax activation function at the output of the last layer which devices a percentile system of ranking results and produces probability for the input that was fed into the neural network for prediction. The first set of layers shown in Table 2 are repeated with different parameters to train the model. To be exact it is repeated five times with the number of kernels increasing in multiples of 2 ranging from 32 to 512. This summary shows useful information comprising the name of the layer, its output shape, and the number of parameters it constitutes to. Considering the number of parameters, the model is trained on 10 epochs giving us a training accuracy of 97.71% with a test accuracy of 91.06% and a validation accuracy of 94.18%. The saved model is referred to at the time of prediction where the image is uploaded with the help of an application on a mobile device with the help of API calls. Once the image is uploaded, the prediction function is initiated at the backend and returns a JSON attribute containing the sample, its predicted label, and the confidence of the prediction. This JSON is parsed and the results are displayed on the screen. From the literature survey and past works done in this field, it can be concluded that the accuracy depends crucially on the type of model chosen and the input provided for the model to train. The important factor to be considered here is, that state-of-the-
15 Disease Detection and Prediction in Plants Through …
187
Fig. 2 Evaluating model performance based on accuracy and loss metrics for training and validation datasets. In the above image, two graphs are present. Training accuracy versus validation accuracy and training loss versus validation loss
art models use a high number of parameters which increases the size of the model or system and takes a lot of time to train a decent accuracy. There is also a risk of overfitting. The sequential model discussed in this paper has a significantly lesser number of parameters and the probability of overfitting is also very low. Since the scope of input is limited to the images of leaves, this model is accurate and efficient. The training accuracy plot as shown in Fig. 2, increases exponentially as the model is trained for more epochs. The same, however, can’t be said for the validation accuracy plot which appears to zig-zag, as it increases along the graph. This is due to the noise generated from batch normalization. The randomness of the noise in our input data coupled with the noise generated due to batch normalization causes sharp fluctuations in validation accuracy during initial epochs. The same effect can be observed in the plots for validation loss which become more stable as training continues. As soon as validation loss shows no further improvement and starts to increase, the training is stopped. This is to prevent the model from overfitting.
188
S. R. D. Dwivedi et al.
6 Conclusion Considering the main impact feature, i.e., overfitting the data by training, we have addressed the issue by introducing the concepts of batch normalization and optimizers to the model in a unique retrospective manner. The concept of controlling the learning rate spontaneously during the training helps us bring in the novel feature along with the integration of the application with the model. With respect to the system component integration, the saved model is loaded at the backend when the prediction function is initiated and the model predicts the incoming image data. This incoming data image is uploaded to the backend via APIs. Once the prediction is done, a JSON object containing the result is returned. Currently, the application is compatible with Android only, but the application can be extended over platforms like iOS. The vernacular aspect (Multi-Lingual support) can be integrated with the application. As the model is custom-built, it can be refurbished in a better way in order to increase accuracy and produce better results.
References 1. The World Bank Group (2022) The world bank. Available at: https://data.worldbank.org/ country/IN. Accessed 4 May 2022 2. Sinha DK, Ahmad N, Singh KM (2017) Shrinking net sown area and changing land use pattern in Bihar: an economic analysis. J AgriSearch 3:238–243. https://doi.org/10.21921/jas.v3i4. 6709 3. Wikipedia contributors (2022) History of agriculture in the Indian subcontinent. Available at: https://en.wikipedia.org/wiki/History_of_agriculture_in_the_Indian_subcontinent. Accessed 4 May 2022 4. Hirani E, Magotra V, Jain J, Bide P (2021) Plant disease detection using deep learning. In: 2021 6th international conference for convergence in technology (I2CT) Pune, India, pp 1-4, https://doi.org/10.1109/I2CT51068.2021.9417910 5. Marzougui F, Elleuch M, kherallah M (2020) A deep CNN approach for plant disease detection. In: 21st international arab conference on information technology(ACIT), pp 1–6, https://doi. org/10.1109/ACIT50332.2020.9300072 6. Bedi P, Gole P (2021) Plant disease detection using hybrid model based on convolutional autoencoder and convolutional neural network. Artif Intell Agric 5:90–101. https://doi.org/10. 1016/j.aiia.2021.05.002 7. Guan X (2021) A novel method of plant leaf disease detection based on deep learning and convolutional neural network. In: 2021 6th International conference on intelligent computing and signal processing (ICSP), pp 816–819. https://doi.org/10.1109/ICSP51882.2021.9408806 8. Hassan SM, Maji AK, Jasi´nski M, Leonowicz Z, Jasi´nska E (2021) Identification of plant-leaf diseases using CNN and transfer-learning approach. Electronics 10(12):1388 9. Mohanty SP, Hughes DP, Salathé M (2016) Using deep learning for image-based plant disease detection. Front Plant Sci 7. https://doi.org/10.3389/fpls.2016.01419 10. Sharma P, Hans P, Gupta SC (2020) Classification of plant leaf diseases using machine learning and image preprocessing techniques. In: International conference on cloud computing data science and engineering (confluence), pp 480–484. https://doi.org/10.1109/confluence47617. 2020.9057889
15 Disease Detection and Prediction in Plants Through …
189
11. Omkar K (2018) Crop disease detection using deep learning. In: 2018 4th International conference on computing communication control and automation (ICCUBEA), vol 6, no 13, pp 1–4, https://doi.org/10.1109/ICCUBEA.2018.8697390 12. Karlekar A, Seal A (2020) SoyNet: soybean leaf diseases classification. Comput Electron 172:105342. https://doi.org/10.1016/j.compag.2020.105342 13. Sharma P, Berwal YPS, Ghai W (2019) Performance analysis of deep learning CNN models for disease detection in plants using image segmentation. Inform Proc 1–9 14. Mohameth F, Bingcai C, Sada KA (2020) Plant disease detection with deep learning and feature extraction using Plant Village. J Comp Commun 8(6):10–22 15. Singh V, Misra AK (2017) Detection of plant leaf diseases using image segmentation and soft computing techniques. Inform Proc Agric 4:41–49 16. Qin F, Liu D, Sun B, Ruan L, Ma Z, Wang H (2016) Identification of alfalfa leaf diseases using image recognition technology. PLoS One 11(12):1–26 17. Tiwari D, Ashish M, Gangwar N, Sharma A, Patel S, Bhardwaj S (2020) Potato leaf diseases detection using deep learning. 2020 4th International conference on intelligent computing and control systems (ICICCS). IEEE, Madurai, India, pp 461–466
Chapter 16
Classification of Breast Cancer Using Machine Learning: An In-Depth Analysis Shweta Saraswat , Bright Keswani, and Vrishit Saraswat
1 Introduction Experts agree that there has been a concerning rise in the number of women who have passed away as a result of breast cancer. As reported by the World Health Organization, there were more than 627,000 female deaths in 2018 (WHO). In addition to this, they forecast that in the year 2030, the global total might have reached 2.7 million [1]. This disease’s poor prognosis is mostly attributable to a number of factors, the most significant of which are its tendency to be diagnosed belatedly and its notoriously difficult treatment. Both the difficulties of treating breast cancer and the risk of metastasis, which is the spreading of cancer to other regions of the body, underscore the need for early detection. Late detection and treatment will always result in the deadly progression of the illness. This is due to the fact that cancer starts with the creation of aberrant cells caused by a mutation in the DNA of the cells themselves. There is a subtype of breast cancer known as invasive breast cancer. The former may lead to cancer, which is a dangerous condition that is malignant and can extend to certain other areas of the body. Due to the fact that it does not penetrate healthy tissue, the latter is not causing any damage to the body and does not spread to other places. Breast cancer first begins in the breasts and then travels to the lymph glands and milk ducts, after which it may spread to other organs, occasionally via the bloodstream [2]. Several diagnostic procedures, such as ultrasound sonography, thermography, biopsy, magnetic resonance imaging (MRI), digital mammography, and traditional mammography may be used to locate breast cancer (DMG). In computed tomography (CT), a kind of computerized x-ray imaging, the scanner is rotated while a S. Saraswat (B) · B. Keswani Suresh Gyan Vihar University, Jaipur, Rajasthan 302017, India e-mail: [email protected] V. Saraswat Medanta Hospital, Gurugram, Haryana 122001, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_16
191
192
S. Saraswat et al.
Fig. 1 Mammograms illustrating the types of breast cancer (a Malignant; b Benign) [9]
concentrated x-ray beam is focused on the patient. This methodology involves the generation of signals, which, when combined with computers, result in the production of cross-sectional pictures. The name “tomography” alludes to the wealth of data that can be extracted from tomographic pictures, which cannot be obtained from regular x-rays. This information is not accessible in any other way. The latter is a frequent method that is used to locate the margins of the tumor from a variety of different viewpoints [3]. Digital mammography has the capability of detecting breast cancers of all types, even those that are benign (Fig. 1). Through the examination of the pictures, a radiologist may obtain a better understanding of the tumor and the extent of its propensity for metastasis. The ultimate determination is made by the radiologist after intensive manual exams and consultations with several additional specialists. It’s a lengthy process, the outcomes are heavily reliant on the competence of the group, and it is not always simple to get in contact with subject matter experts in the field. Therefore, researchers came up with a computer-aided diagnosis (CAD) system that can categorize tumors in a way that is both rapid and accurate, and it does so without the need for radiologists or any other kind of professional [4]. When it comes to analyzing medical photos and making the appropriate decisions, it has been suggested that machine learning (ML) algorithms may serve as a suitable substitute for the human eye and expertise [5]. Putting machine learning ideas into action often entails going through six distinct steps. The primary purpose of the first three approaches is to get rid of any distracting background noise and zero in on the tumor itself. The size of an image’s file may be shrunk by a technique known as “feature extraction,” which involves identifying the aspects of a picture that is most important in order to keep the image’s original meaning, while simultaneously reducing its file size. Learning algorithms are being applied to data in order to categorize it and arrive at conclusions. With the introduction of artificial intelligence (AI) and machine learning approaches, current research has focused on deep learning-based solutions for breast cancer screening [6]. The most popular deep learning method is CNN, which allows for automated feature learning, classification, and mass identification with smaller training datasets and no human involvement [7]. Deep learning techniques often demand a lot of data to train the model and increase performance. Thus, the paucity of data is a significant obstacle
16 Classification of Breast Cancer Using Machine Learning: An In-Depth …
193
to apply algorithms based on deep learning for medical diagnosis [8]. This method may perform better with larger datasets since it requires no time-consuming initial preprocessing or feature extraction. This research presents a comprehensive assessment of the basic phases of the whole categorization process and an in-depth examination of the most current approaches that are utilized in this area. Both of these aspects can be found in the introduction section of the study. In the second part of this study, the primary approaches to classify and clean data are shown and discussed. In Sect. 3, we will cover machine learning approaches, and then in Sect. 4, we will give an extensive literature review on breast cancer. The discussion and concluding remarks are offered in Sect. 5.
2 Several Levels of Classification Computer-aided diagnostic (CAD) technologies, which were made by both universities and private companies, make it possible to tell right away whether a breast tumor is benign or cancerous. Radiologists may be better able to tell the difference between cancerous and noncancerous changes in tissue by using these methods. To choose the right algorithms for the CAD system, the information in the cancer images needs to be carefully looked at. In Fig. 2, we see how a very advanced computeraided diagnosis (CAD) system has been integrated into the diagnostic procedure. In a number of medical diagnoses, this CAD method has been used. There are four distinct procedures that must be performed before determining if a cancer is benign or malignant. The next paragraphs provide further context for these measures as well as detailed instructions on how to implement them effectively.
Fig. 2 Organizational framework of the CAD system for the detection of breast cancer
194
S. Saraswat et al.
2.1 Data Acquisition: Techniques for the Accumulation of Information At this point, a dataset is chosen to be used to train a model, and then that model is evaluated. The Wisconsin Breast Cancer dataset (WBCD), the Digital Database for Screening Mammography (DDSM), the Wisconsin Diagnostic Breast Cancer (WDBC) dataset, and the Mammographic Image Analysis Society (MIAS) dataset are the breast cancer tumor datasets that are used the most often (MAIS). Each dataset has enough information to do the necessary filtering and modeling. For example, the second dataset has the identification number, features, and diagnosis for each patient, patient diagnostic (positive or negative) [6], patient unique identifier (ID), breast mass digital image features (n = 10), and patient diagnostic [6]. There are a total of 469 patients, and 307 of them have diagnoses that are considered harmless, while 162 have diagnoses that are considered dangerous. These qualities have been used as a foundation for further study, which has led to the calculation of standard deviations and means for each feature including perimeter, symmetry, area, smoothness, fractal dimension, radius, texture, compactness, concavity and concave points.
2.2 Data Preprocessing or Image Preprocessing The next step in making ML algorithms even better is data preprocessing, which involves getting rid of duplicate and unnecessary data. During this stage (also known as phase 9), the following operations are carried out: Operation 1: Cleaning the dataset of duplicate records. Operation 2: Inserting a suitable value into a cell when a missing one exists in a dataset. Operation 3: Any string attribute must be changed to a number because MLs can’t process strings. Operation 4: Normalizing values in the ranges 0–1 or 1–1, since MLs perform better with smaller numbers. Operation 5: When dividing a dataset in two (i.e., for training and testing), it is possible to use a split of (80–20%) or (50–50%).
16 Classification of Breast Cancer Using Machine Learning: An In-Depth …
195
2.3 Segmentation As it was said at the beginning of the paragraph, radiologists use a wide array of imaging techniques. Mammograms and ultrasound imaging are the two diagnostic tools that are used the most often in the classification of breast cancer. The more traditional imaging approach has poor contrast and fuzzy edges, both of which make automatic segmentation more difficult. On the other hand, the next imaging method uses low-energy x-rays with good resolution to help find cellular abnormalities in tissue [8]. Using segmentation, cancer lesions, pectoral muscles, and other parts of a mammogram image that have nothing to do with the breast can be taken out [10]. Segmentation is a technique that is used to remove these items. A photograph may be segmented in a variety of different ways, depending on the purpose of the segmentation. Utilizing a threshold is one of the more prevalent approaches, while others depend on sections, and yet others concentrate on sharp edges, e.g., the study’s researchers devised and published a novel method for determining the pectoral muscle boundary. In this study, an estimation of the value of the intensity function was produced by making use of a differentiation operator to locate the boundaries of the areas of interest. Convexity is produced when one attempts to determine the final location of the breasts on a body. The computation of a convex hull function is ultimately what leads to the creation of a topographic map.
3 Features Extraction and Selection For feature extraction and selection, CNN was also employed. Despite the fact that this is a fairly wide subject, you may look at the basic CNN design for the two major forms of image classification: “feature extractor,” based on convolutional layers, and “classifier,” often based on fully connected layers [11]. “Feature extraction” is the process of reducing the number of features by making a new set of attributes that have the same information as the original features. The phrase “feature extraction” is used to describe this procedure. While dealing with a huge dataset that contains hundreds or thousands of features without deleting the ones that are most important to represent the real observations about the given variables, performance issues may occur in a machine learning model. This can happen when working with a large dataset. As a result, using this strategy boosts the effectiveness of the machine learning model, while simultaneously lowering the likelihood of it being too accurate. During the process of feature extraction, the aim is not to maintain the original features that were included in the dataset, but rather to replace those original features with new ones that better reflect the core of the original characteristics. This strategy not only makes training more efficient, but it also makes it more precise and makes it possible to visualize [12]. The principle component analysis (PCA), the linear discriminant analysis (LDA), and the t-distributed stochastic neighbor embedding are all examples of feature
196
S. Saraswat et al.
extraction approaches (t-SNE). The first technique, known as principle component analysis, is a kind of data analysis that may help to streamline a dataset without changing the data’s fundamental characteristics. Using this strategy, large datasets are located in order to derive from them a collection of independent features that are referred to as “primary components.” Principal component analysis (PCA) is a strong tool that may be used in a variety of contexts [13]. The second approach, known as latent difference analysis (LDA), is very comparable with principal component analysis (PCA) in terms of the reduction of the amount of data and the concentration on the vectors that highlight the most significant differences in the data. In addition to the first axes that it finds, LDA looks for more axes that show how groups are different. Because of this, principal components analysis (PCA) is called unsupervised because the class label is not taken into account when the order of variance is calculated. On the other hand, the LDA is a supervised method because it uses categories that have already been setup to decide how to divide the data [14]. In contrast to PCA, which was initially developed in 1933, the t-SNE method was introduced for the first time in 2008. This makes it a relatively recent method. t-SNE gives a more accurate representation of the data than the principal component analysis (PCA) can [15] because it can deal with the nonlinearities contained in the data and maintain a reasonable pair-wise distance. Once an area of interest within a breast tumor has been segmented, the features within the tumor are then measured based on how the tumor is built. Radiologists have found that the benign tumor is round and has a clear edge. Malignant tumors, on the other hand, are often rough, unclear, and speculative [16]. The authors of [17] used a co-occurrence matrix and an operated matrix to pull out two kinds of texture features. These qualities are structural and statistical. The graylevel co-occurrence matrix (GLCM) was used to derive statistical parameters such as the median, mean, variance, and standard deviation [18]. The GLCM method, which is a technique of the second order, provides statistical information on the image’s textural qualities. Using this strategy, we can determine the relationship that exists between the reference pixel and the pixels in its immediate vicinity. This application can be used to zero in on certain locations within an image by isolating the constituents of a picture’s texture [19]. Still, these systems have limits that can be different depending on the strategy or method used to extract features. So, deep learning is suggested as a good option because it doesn’t need any preparation phases and still produces great results [20]. This is one of the reasons why it has been so successful in recent years. Compared with traditional MLs, this method makes it easier to work with raw photos, requires less expert knowledge, makes it easier to change important attributes, and takes less time to finish. Other advantages involve working with raw photos. The fact that the deep learning technique can improve its level of accuracy with more input data is one of its most significant advantages [21].
16 Classification of Breast Cancer Using Machine Learning: An In-Depth …
197
3.1 Features Selection It is important to have a way to choose features if you want your classification system to work. This method cuts down on the number of characteristics and also arranges them from most important to least important. When this reduction is looked at from a statistical point of view, it may have a number of benefits, such as better accuracy, less risk of overfitting, shorter training times, and the ability to see data [22]. There are different ways to use feature selection with an algorithm, such as the filter method, the wrapper method, and the embedded method. The first method uses filtering techniques like Pearson correlation to choose a sample of the dataset that only has the qualities that are important for the study. The second method is more accurate than the first because it uses machine learning strategies to evaluate attributes. However, it takes more work on the computer side of things [23]. Based upon how well the model is performing, either new features will be introduced or existing ones may be deleted.
3.2 Classification Using Machine Learning Algorithms Supervised learning (SL) and unsupervised learning (UL) are the two basic types of machine learning algorithms. The first variation must be trained with a set of inputs and outputs that have been labeled. The process includes a training phase and a testing phase. The model is formed during the training phase of SL using data that has been manually labeled by human interaction; during the testing phase, the model is evaluated using data that it has never seen before [24]. In contrast, unsupervised learning, the second kind of machine learning approach, does not need model training (USL). This classification approach may be used in the absence of labeled data since it classifies samples based on their common properties. There are many types of machine learning between these two extremes. Semi-supervised learning (SSL) needs a limited number of labeled data samples in order to identify unlabeled data. Classification and regression problems may be solved with SL algorithms. Classification is a kind of machine learning that organizes discrete data into categories. In contrast, regression deals with real data variables such as temperature or time [25]. In the USL, clustering is used to identify groups of samples with similar qualities. Researchers and programmers depend on a vast array of algorithms to quickly and automatically address sample classification and identification problems. Support vector machine (SVM), K-NN, Naive Bayes (NB), and C-means are the most well-known algorithms for the identification of breast cancer tumors [26].
198
3.2.1
S. Saraswat et al.
SVM (Support Vector Machine)
This method has been used to solve a wide range of problems with classification and regression. As the number of features rises, new coordinates are created for each feature inside n-spaces. This method looks in the n-spaces for the hyperplane with the largest margin by drawing several new lines. Maximum margin of separation is between data point classifications [27, 28]. In a number of studies [29, 30], these algorithms have shown promise in the categorization of breast cancer tumors. Various algorithmic methods were used in the aforementioned studies (i.e., SVM, K-NN, C4.5, NB, K-means, EM, PAM, and fuzzy c-means). In terms of accuracy, they observed that the SVM algorithm surpassed the competition.
3.2.2
K-NN (K-Nearest Neighbor)
This method has been applied in a wide variety of fields, including medicine, finance, the visual arts, and even handwriting recognition. The method starts by evaluating the model with new data and then trains it with labeled data from different categories. The algorithm picks one of many ways to find the data points that are closest to each other. These ways include the Manhattan distance, the Hamming distance, the Minkowski distance, and the Euclidean distance. In order to put a new point into a known class, the algorithm searches in a loop for the class that fits it the best [31]. Using K-NN in tandem with support vector machines (SVM), as the authors of two studies [32, 33] have argued, may increase the scheme’s efficiency.
3.2.3
The Random Forest (RF)
This algorithm is a quick and effective way to look at a large amount of data. Numerous researchers have adopted this technique in their own work and have used it in a wide range of real-world applications [34]. The method uses the idea of “ensemble learning” [35] to put together the results of many classifiers. Using the same dataset, the performance of a single classifier is worse than that of many weak classifiers. Among the many ensemble approaches available are boosting, bagging, and, more recently, random forest. The boosting technique [36] begins by assigning equal weights to each instance, and then it iteratively increases the weights for instances that were incorrectly classified while decreasing the weights for correctly classified ones. Bagging [37] is a technique where the dataset is split into several training subsets and supplied in parallel to a classifier using a majority vote. The random forest, on the other hand, is an ensemble technique that uses majority voting and the construction of numerous decision trees to classify new cases. In this approach, features are partitioned into several smaller groups, each of which represents a randomly generated decision tree. When compared with bagging and boosting, random forest is quicker and more resilient against noise [38].
16 Classification of Breast Cancer Using Machine Learning: An In-Depth …
199
4 Literature Review When it comes to providing safe and high-quality care, artificial intelligence (AI) has been a game-changer in the healthcare industry. AI methods like machine learning and deep learning are used to find and classify breast and brain tumors in particular [39]. Based on the Wisconsin Breast Cancer (original) dataset with 11 attributes and 699 instances, the author of the study [29] used four SVMs (C4.5, NB, and kNN) for the classification of breast cancer. In comparison with other classifiers, the results showed that the SVM achieved the highest accuracy, 97.13%. In light of these results, other research, such as that shown in [27] has started to look into how linear, polynomial, and RBF kernel functions can be used with SVM and more advanced features like bagging and boosting. There were two datasets used to evaluate these parameters: one with 11 attributes and 699 instances, and another with 117 attributes and 102,294 instances. The research concluded that for a small dataset, the best options are the linear kernel based on SVM with the bagging feature and the RBF kernel. Based on SVM with the boosting feature, furthermore, the latter outperformed competing classifiers when applied to sizable datasets. Likewise, Y. Khourdifi and M. Bahaj used the WEKA tool to implement the machine learning algorithms random forest, Naive Bayes, SVM, and K-NN across two separate works in 2018 [32, 33]. The experiments used a dataset of 699 instances and 30 attributes to evaluate the algorithms. A high level of accuracy (up to 97.9%) was also found for the SVM model compared with the others. An alternative study [30] contrasted the classification algorithms SVM and C5.0 with the clustering algorithms K-means, expectation maximization, partitioning around medoids (PAM), and fuzzy c-means. SVM and C5.0 were found to be more accurate than other clustering models in the study by a margin of 81%. In a new study [7], researchers compared and evaluated nine machine learning algorithms and found that their results were different. Logistic regression, Gaussian Naive Bayes, linear support vector machines, RBF support vector machines, decision trees, random forests, XGBoost, gradient boosting, and K-nearest neighbors are some of the algorithms that can be used. The study utilized the Wisconsin Diagnostic Breast Cancer (WDBC) dataset to evaluate these models. The study compared SL with semisupervised SSL, and the results showed that k-NN and logistic regression algorithms achieved higher accuracy. The accuracies for both algorithms were (SL = 98% and SSL = 97%) and (SL = 97% and SSL = 98%), respectively. Moreover, the research revealed good accuracy for linear SVM at 97%. Other research [26] suggested that the WBCD dataset could be used to classify breast cancer tumors by using ensemble learning. The research [27] coupled a boosting artificial neural network (BANN) with two SVMs. The authors stated that they acquired extremely high accuracy, reaching 100%. The research [28] suggested integrating three classifiers: SVM learning with stochastic gradient descent optimization, simple logistic regression learning, and multilayer perceptron networks. These classifiers are applied as ensemble classifications, employing a voting mechanism.
200
S. Saraswat et al.
The study’s accuracy was approximately 99.42%. Similarly, the authors of the work [29] suggested that multi-verse optimizer (MVO) and gradient boosting decision tree (GBDT) could be used together to make a learning method called “ensemble learning.” The former is responsible for modifying the settings of the latter and also optimizing the selection of the characteristics. The research employed two datasets, Wisconsin Diagnostic Breast Cancer and Wisconsin Breast Cancer, to examine the suggested technique. The proposed technique exhibited better accuracy and had less variation than other models that are recommended from prior research. According to the authors’ finding in their paper [30], which relates to ensemble learning, the scheme not only improves the base learner but also decreases the bias or variance. Even though it was suggested in research [31] that using a boosting feature would increase accuracy in ensemble learning, this did not prove to be the case. On the other side, deep learning has caught the interest of schools in recent years. The focus of students in recent years, however, has shifted to deep learning. This strategy does not necessitate the preparation of features. Instead, it can automatically, without human interaction, extract the characteristics from the medical photos. In order to categorize photos of breast cancer, the researchers in the aforementioned study [32] used a deep learning strategy, namely the convolutional neural network (CNN) method. The research used the DDSM, IN breast, and BCDR datasets to assess the approach, finding accuracies of 97.35 percent, 95.50 percent, and 96.67 percent, respectively. When the CNN technique was applied to a larger dataset (about 5699 occurrences), as in another work [33], the accuracy increased to 98.62%. The dataset was gathered at two hospitals in China: the Sun Yat-sen University Cancer Center and the Nanhai Affiliated Hospital of Southern Medical University. Other studies [34] reported lesser accuracy, amounting to 87%. Two studies [30, 31] gathered extremely enormous datasets; the first research was collected from 2010–2016 at five imaging facilities connected with the New York University School of Medicine, with over a million pictures and more than 140,000 patients. Each investigation used CNN, with under-the-curve accuracy calculations made for both. The second research used datasets including 12,000 instances (AUC). The authors of the research [32] gathered 67,520 photos privately and obtained great accuracy compared with the vast dataset—approximately 95%.
5 Conclusion and Discussion Tumors from people with breast cancer can be grouped using either traditional machine learning techniques or newer advances in deep learning. The former made use of many techniques, but the support vector machine approach was the most efficient. The accuracy of SVM was shown to be as high as 97% in studies [29, 30] that compared it with other algorithms like K-NN, C4.5, NB, K-means, EM, PAM, and fuzzy c-means. So, research [24–26, 29, 30] used different SVM functions and added new features like bagging and boosting to learn more about these algorithms. Based on a huge set of data with about 102,294 records, the research found that it
16 Classification of Breast Cancer Using Machine Learning: An In-Depth …
201
was accurately 95% of the time. In the other experiments [32, 33], which used a mix of SVM and other methods (like random forests, Naive Bayes, and K-NN), the accuracy was about 98%. For standard machine learning to work, it needs time and computing resources for preprocessing and selecting features. In recent studies, the deep learning approach has been used because it can automatically pull out the traits that are most important to the situation at hand. CNNs have been used in multiple studies to classify breast cancers, e.g., [32] looked at three datasets of different sizes and found that they were 97% accurate. In a second study that used the same dataset and traditional ML techniques, the accuracy was also very high, at more than 98%.The deep learning method led to very accurate results, but it took a lot of time and resources to use. In the end, this paper gives a summary of the results of current studies on breast cancer cells. Traditional MLs were used to keep an eye on one stream. They used a variety of algorithms, with support vector machines trained on human data being the most accurate. The authors showed that combining this method with others, like random forest, Naive Bayes, and K-NN, can make the results more accurate. The accuracy of deep learning went up to 98% when a convolutional neural network was added (CNN). More research into deep learning algorithms and more medical breast imaging databases could help reduce the number of mistakes in future work. Machine learning and computer vision are always getting better, and new ways of doing things are being tried out and improved. Although it’s natural that a paper’s authors would like to avoid the computational burden of deep learning, it’s also vital to recognize that certain techniques that may be deemed antiquated are still highly relevant and effective in specific settings. Ensemble learning is one technique of this kind, since it combines different models to get better results. Several machine learning applications, including computer vision, have found success using ensemble learning. Ensemble learning is flexible since it may be used with both classical ML methods and deep learning models. Convolutional neural networks (CNNs) are another crucial technique in computer vision. State-of-the-art performance in several computer vision applications, such as object identification, picture segmentation, and image classification has been achieved using CNNs. While CNNs need more processing power than other methods, they have excelled in many contexts. Reinforcement learning, in which models are taught to make decisions based on rewards or punishments, and transfer learning, in which models that have already been trained are used to improve their performance on a new task, are also possible. Although it’s crucial to prioritize techniques that use less processing power, it’s also necessary to recognize the importance of other methods like ensemble learning, convolutional neural networks (CNNs), transfer learning, and reinforcement learning when it comes to computer vision. Researchers are able to choose the optimal method for their project by thinking about a number of viable options.
202
S. Saraswat et al.
References 1. World Health Organization (WHO) (2023) World Health Organization (WHO). https://www. who.int 2. Priyanka, Sanjeev K (2021) A review paper on breast cancer detection using deep learning. IOP Conf Ser Sci Eng 1022(1) 3. Mahmood T, Li J, Pei, Y, Akhtar F, Imran A, Ur Rehman K (2020) A brief survey on breast cancer diagnostic with deep learning schemes using multiple image modalities. IEEE Access 8:165779–165809 4. Battineni, Chintalapudi N, Amenta F (2020) Performance analysis of different machine learning algorithms in predicting breast cancer. EAI Endorsed Trans Pervasive Heal Technol 6(23):1–7 5. Guirguis MS, Adrada B, Santiago L et al (2021) 12, 53 6. Rautela K, Kumar D, Kumar V (2022) A systematic review on breast cancer detection using deep learning techniques. Arch Comput Methods Eng 29(7):4599–4629. https://doi.org/10. 1007/s11831-022-09744-5 7. Oza P, Sharma P, Patel S, Kumar P (2023) Computer-aided breast cancer diagnosis: comparative analysis of breast imaging modalities and mammogram repositories. Curr Med Imaging Formerly Curr Med Imaging Rev 19(5):456–468. https://doi.org/10.2174/157340561866622 0621123156 8. Nasser M, Yusof UK (2023) Deep learning based methods for breast cancer diagnosis: a systematic review and future direction. Diagnostics 13(1):161. https://doi.org/10.3390/diagno stics13010161 9. Nave P, Elbaz M (2021) Artificial immune system features added to breast cancer clinical data for machine learning (ML) applications. BioSystems 202(April) 10. Al-Azzam, Shatnawi I (2021) Comparing supervised and semi-supervised machine learning models on diagnosing breast cancer. Ann Med Surg 62(December):53–64 11. Khorshid F, Abdulazeez AM (2021) Breast cancer diagnosis based on k-nearest neighbors: a review. PalArch’s J Archaeol Egypt/Egyptology 18(4):1927–1951 12. Nassif AB, Talib MA, Nasir Q, Afadar Y, Elgendy O (2022) Breast cancer detection using artificial intelligence techniques: a systematic literature review. Artif Intell Med 127:102276. https://doi.org/10.1016/j.artmed.2022.102276 13. Mateen J, Wen J, Nasrullah, Song S, Huang Z (2019) Fundus image classification using VGG-19 architecture with PCA and SVD. Symmetry (Basel) 11(1) 14. Raschka, “Linear discriminant analysis,” (2014). [Online]. Available: https://sebastianraschka. com/Articles/2014_python_lda.html. Accessed 23 Jan 2021 15. Violante, “An Introduction to t-SNE with Python Example,” (2018). [Online]. Available: https://towardsdatascience.com/an-introduction-to-t-sne-with-python-example-5a3a29 3108d1. Accessed 23 Jul 2021 16. R, D JEL, Mudigonda NR (2000) Gradient and texture analysis for the classification of mammographic masses. EEE Trans Med Imaging 1032–1043 17. Bhargava, Vyas S, Bansal A (2020) Comparative analysis of classification techniques for brain magnetic resonance imaging images. Adv Comput Tech Biomed Image Anal 133–144 18. Khan A, Jue W, Mushtaq M, Mushtaq MU (2020) Brain tumor classification in MRI image using convolutional neural network. Math Biosci Eng 17(5):6203–6216 19. Ippolito, “Feature Selection Techniques,” (2019). [Online]. Available: https://towardsdatascie nce.com/feature-extraction-techniques-d619b56e31be. Accessed 28 Jun 2021 20. Fatima, Pasha M (2017) Survey of machine learning algorithms for disease diagnostic. J Intell Learn Syst Appl 09(01):1–16 21. Abdulqader M, Abdulazeez AM, Zeebaree DQ (2020) Machine learning supervised algorithms of gene selection: a review. Technol Rep Kansai Univ 62(3):233–244 22. Huang W, Chen CW, Lin WC, Ke SW, Tsai CF (2017) SVM and SVM ensembles in breast cancer prediction. PLoS ONE 12(1):1–14 23. Asri H, Mousannif H, Al Moatassime, Noel T (2016) Using machine learning algorithms for breast cancer risk prediction and diagnosis. Procedia Comput Sci 83(Fams):1064–1069
16 Classification of Breast Cancer Using Machine Learning: An In-Depth …
203
24. Rawal (2020) Breast cancer prediction using machine learning. J Emerg Technol Innov Res 7(5) 25. Cherif (2018) Optimization of K-NN algorithm by clustering and reliability coefficients: application to breast-cancer diagnosis. Procedia Comput Sci 127:293–299 26. Khourdifi Y, Bahaj M (2018) Feature selection with fast correlation-based filter for breast cancerprediction and classification using machine learning algorithms. In: 2018 International symposium on advanced electrical and communication technologies (ISAECT), pp 1–6 27. Nguyen C, Wang Y, Nguyen HN (2013) Random forest classifier combined with feature selection for breast cancer diagnosis and prognostic. J Biomed Sci Eng 06(05):551–560 28. Richman, Wüthrich MV (2020) Bagging predictors. Risks 8(3):1–26 29. Pavlov L (2019) Random forests. Random For 1–122 30. Assiri S, Nazir S, Velastin SA (2020) Breast tumor classification using an ensemble machine learning method. J Imaging 6(6):39 31. Abdar, Makarenkov V (2019) CWV-BANN-SVM ensemble learning classifier for an accurate diagnosis of breast cancer. Meas J Int Meas Confed 146(May):557–570 32. Tabrizchi, Tabrizchi M, Tabrizchi H (2020) Breast cancer diagnosis using a multi-verse optimizer-based gradient boosting decision tree. SN Appl Sci 2(4):1–19 33. G, Lee S, Amgad M, Masoud M, Subramanian R (2019) An ensemble-based active learning for breast cancer classification. In: IEEE international conference on bioinformatics and biomedicine (BIBM), pp 2549–2553 34. Osman H, Aljahdali HMA (2020) An effective of ensemble boosting learning method for breast cancer virtual screening using neural network model. IEEE Access 8:39165–39174 35. Chougrad H, Zouaki H (2018) Deep convolutional neural networks for breast cancer screening. Comput Methods Prog Biomed 157:19–30 36. Oza P, Sharma P, Patel S, Adedoyin F, Bruno A (2022) Image augmentation techniques for mammogram analysis. J Imaging 8(5):141. https://doi.org/10.3390/jimaging8050141 37. Mahmood T, Li J, Pei Y, Akhtar F (2021) An automated in-depth feature learning algorithm for breast abnormality prognosis and robust characterization from mammography images using deep transfer learning. Biology 10(9):859. https://doi.org/10.3390/biology10090859 38. Oza P, Sharma P, Patel S (2022) A drive through computer-aided diagnosis of breast cancer: a comprehensive study of clinical and technical aspects. In: Lecture notes in electrical engineering, pp 233–249. https://doi.org/10.1007/978-981-16-8248-3_19 39. Oza P, Sharma P, Patel S, Kumar P (2022) Deep convolutional neural networks for computeraided breast cancer diagnostic: a survey. Neural Comput Appl 34(3):1815–1836. https://doi. org/10.1007/s00521-021-06804-y
Chapter 17
Prediction of Age, Gender, and Ethnicity Using Haar Cascade Algorithm in Convolutional Neural Networks D. Lakshmi, R. Janaki, V. Subashini, K. Senthil Kumar, C. A. Catherine Aurelia, and S. T. Ananya
1 Introduction Deep learning, a subset of Machine Learning (ML), which trains the machine to think and analyze situations like humans, involves frame works called Artificial Neural Networks (ANN). ANNs are computational systems inspired from biological neural networks present in human brains. ANNs are made of perceptrons (artificial neurons), thus replicating the design and processes of neural networks in humans. A Convolutional Neural Network (CNN) is a class of ANN, predominantly used in the processing of multimedia information. CNNs find wide application in the image/ video processing, classification, regression, and segmentation fields. Computer Vision (CV) is a field in Artificial Intelligence that focuses on mimicking the functionality of the human visual system, by allowing machines to identify and analyze objects in images and videos. CV algorithms work to replicate mechanisms used by the human eye to process what they visualize. Initially, CV was a developing field, with very little potential. Facial analysis is one significant application of CV which led to the evolution of novel algorithms and different techniques to recognize and detect faces. Among the various attributes used in facial analysis, age, gender, and ethnicity are few features that distinguish people from one another. They may be fairly predicted from unfiltered facial images. The estimation of such features from an image plays a vital role in D. Lakshmi (B) · C. A. Catherine Aurelia · S. T. Ananya St. Joseph’s College of Engineering, OMR, Chennai, India e-mail: [email protected] R. Janaki · V. Subashini Sri Sairam Institute of Technology, Chennai, India K. Senthil Kumar Central Polytechnic College, Chennai, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_17
205
206
D. Lakshmi et al.
intelligent applications like surveillance, law enforcement, marketing, access control, forensic analysis, social platforms, etc. These attributes also find wide use indifferent security-based applications.
2 Method We propose an algorithm to perform real-time prediction of the following characteristics from facial images: (i) the age group of people in the ranges of 0–9, 10–19, 20–39, 40–59, and above 60 years of age, (ii) the gender, either male or female, and (iii) the ethnicity (Indian, Asian, Caucasian Latin, or Black).
2.1 VGG Model The Visual Geometry Group (VGG) face recognition model was trained using a large dataset called the ImageNet. The models trained using ImageNet are further trained using face recognition datasets in a method known as Transfer Learning. This pretrained model is found to be optimum in generating generalized features from faces. Further, to improve accuracy, fine tuning is done to make the Euclidean distance minimum for the same identity vectors, and maximum for different identity vectors. When Transfer Learning is used to train the new dataset, the fully connected layers (also known as classification layers) are replaced with customized layers. The VGG face model architecture contains 22 layers and uses 224 × 224 RGB images as input. Figure 1 displays the architecture of the VGG face model architecture referred from Serengil [1]. ReLu activation is applied to every convolution layer. Max pooling layers between the convolution layer blocks down sample the images with a 2 × 2 pool and a stride of (2,2). Following the feature extraction layers are the classification layers consisting of a set of 3 fully connected layers and a soft max activation layer. While using Transfer Learning, either the whole set of classification layers, or the final fully connected layer alone is replaced by a set of custom layers according to the dataset
Fig. 1 VGG face model architecture
17 Prediction of Age, Gender, and Ethnicity Using Haar Cascade …
207
used. Models using Transfer Learning are proven to perform better in classification functions, leading to their wide application in recent times.
2.2 CNN Model The deep learning-based Convolutional Neural Network uses multiple perceptron layers to extract features from media files and classification can be made based on the requirement. It has one layer for input, n number of hidden layers and one layer for output. The intermediate layers are hidden as their inputs and outputs are masked by the activation function and convolution. These hidden layers generally perform the convolution function, forming the essence of a CNN. Multilayer perceptrons give rise to a fully connected network. The general architecture of a CNN consists of the inter-connection of multiple layers. Figure 2 shows the face recognition architecture of a CNN model. The input image of required size is fed into the convolution layer and activated using an activation function like the ReLU. The convolution layer multiples the input image with a weighted filter, and results in a feature map. The max pooling layer is used to down sample the feature maps from the convolution layers to reduce the computation complexity. A couple of convolution layers are stacked before feeding the feature maps into the fully connected classification layer, which classifies each image into its respective class using the obtained class probabilities for each image.
Fig. 2 CNN face recognition architecture
208
D. Lakshmi et al.
3 Proposed System The proposed system is a customized real-time age, gender, and ethnicity predictor using a Convolutional Neural Network model. Figure 3 depicts the flow diagram of the proposed system. Image acquisition is done by obtaining a real-time screen capture using a webcam. The captured image is then detected for faces using Haar Cascades. This separates out the faces from the background and returns the face alone. The next step is resizing the face image according to our model input requirements. Following this, the image will be converted to grayscale to reduce computational complexity. Contrast Limited Adaptive Histogram Equalization (CLAHE) is then applied to the grayscale image. The images are then normalized and split into a training set and a testing set by a method called Stratified K-Fold Cross-Validation that takes into account the imbalances in the amount of data in each class and helps to balance the split up of various classes in the training and test sets. The model is trained and validated using the Fair Face dataset. Three CNN models with different numbers of convolution layers are trained and validated. Starting from 3 layers of convolution, the training was performed for models up to 5 convolution layers. Features extracted from the convolution layers are fed into the classification layers. The classification layers split into three different categories, namely gender, age, and ethnicity. The same feature vector obtained as a result of feature extraction is passed into each of these classifiers. Thus, we propose a custom CNN model to perform the gender, age, and ethnicity prediction of the input image.
3.1 Implementation The proposed system puts forth a customized real-time age, gender, and ethnicity prediction, where a person’s image is captured using a webcam, face is detected, and the obtained facial image is resized and preprocessed for prediction. Table 1 describes the specifications of the work environment and the dataset used.
Fig. 3 Flow diagram of proposed system
17 Prediction of Age, Gender, and Ethnicity Using Haar Cascade …
209
Table 1 System specifications Attributes
Value/description
Dataset used
Fair face dataset
Number of images in training dataset
8764
Number of images in testing dataset
2190
Image size
224 × 224 × 3 (RGB images)
System Specifications Table 1 gives the specifications of the system used. Among the few datasets with ethnicity classification used by Agbo-Ajala and Viriri [2], Micheal and Shankar [3], Greco and Percannella [4], Karkkainen and Joo [5], the Fair Face dataset has a balanced number of images on all 7 different classes of ethnicity. Gender Annotation The two classes of gender in the dataset are male and female where fair distribution is maintained. Table 2 gives the gender distribution of the Fair Face dataset. Age Annotation The different categories of age available in the dataset are 0–2, 3–9, 10–19, 20–29, 30–39, 40–49, 50–59, 60–69, and more than 70 years. These nine categories have been scaled down to five categories: 0–9, 10–19, 20–39, 40–59, and more than 60 years of age. The age distribution of images is given in Table 3. Ethnicity Annotation The various categories of ethnicity are consolidated into: Indians, Asians including East Asians and Southeast Asians, Caucasian Latin includes White, Middle Eastern and Latino Hispanic, and Black. The ethnicity distribution of the dataset is given in Table 4. Table 2 Gender annotation
Table 3 Age annotation
Gender category
Number of images
Male
5792
Female
5162
Age category
Number of images
0–9
1555
10–19
1181
20–39
5630
40–59
2149
More than 60
439
210 Table 4 Ethnicity annotation
D. Lakshmi et al.
Ethnicity category
Number of images
Indian
1516
Asian
2965
Caucasian Latin
4917
Black
1556
3.2 Preprocessing The images are preprocessed before they are fed into the CNN model. A few images from the dataset before preprocessing are depicted in Fig 4. RGB to Gray Scale Conversion The RGB images are converted to grayscale images that have a single channel of pixel values ranging from 0 to 255. For face recognition and classification purposes, grayscale images prove sufficient, as RGB images do not necessarily add any important information. Figure 5 shows the grayscale converted images. Contrast Limited Adaptive Histogram Equalization (CLAHE) CLAHE works on small regions of the image, called as tiles, rather than the entire image. The neighboring tiles are then combined using bilinear interpolation to remove any artificial boundaries. CLAHE is applied on the grayscale image to enhance the contrast, since the images in the dataset have various ranges of illumination. Figure 6 shows the images after the application of CLAHE.
Fig. 4 Sample images from dataset
Fig. 5 Grayscale converted images
17 Prediction of Age, Gender, and Ethnicity Using Haar Cascade …
211
Fig. 6 CLAHE output images
Normalization and Stratified K-Fold Cross-Validation When image pixels of intensity 0–255 are processed using Deep Neural Networks, it leads to complexity in computation due to large numeric values. To overcome this, it is essential to normalize the values to the range 0–1. Normalization has been carried out by dividing all the values by 255. To split the dataset into a training set and validation set, we use the Stratified K-Fold Cross-Validation technique. Crossvalidation is a resampling method that uses different portions of the data to test and train a model on different iterations. K-fold cross-validation splits the data into ‘k’ parts. In each of the ‘k’ iterations, one portion is used for testing, while the remaining portions are used for training. K-fold cross-validation implemented using stratified sampling ensures that the proportion of the features of interest is the same across the original data, training set and the test set. This ensures that no value is over or under-represented in the training and test sets and gives a more accurate estimate of performance and error. This method is said to improve the accuracy of a model due to its balanced distribution of data.
3.3 System Architecture The proposed CNN models have a varying number of convolution layers as compared in Venkat and Srivastava [6]. Table 5 shows the architecture of the proposed CNN models. The first convolution layer uses a 7 × 7 kernel, with 96 filters and a stride of (4,4). The second convolution layer has a 5 × 5 kernel, with 256 filters. The third convolution layer uses a 3 × 3 kernel, with 384 filters. These three layers are common for all the three variations of the model. The 4- and 5-layer CNNs have an additional convolution layer of a 3 × 3 kernel with 384 filters. The 5-layer CNN has one more convolution layer of a 3 × 3 kernel with 512 filters. All these convolution layers are activated by the ReLU activation function, which introduces non-linearity in a network, and followed by a Batch Normalization layer as used in Ahmed et al. [7], which speeds up the training by using a higher learning rate. A max pooling layer of 3 × 3 pool is added for the purpose of down sampling the image size. Dropouts are a regularization technique used to prevent overfitting in the model. Dropouts are added to randomly switch off some percentage of neurons of the
212
D. Lakshmi et al.
Table 5 CNN architectures 3 layer CNN
4 layer CNN
5 layer CNN
(224 × 224 × 1)—input
(224 × 224 × 1)—input
(224 × 224 × 1)—input
Conv_1 − (7 × 7), 96 filters
Conv_1 − (7 × 7), 96 filters
Conv_1 − ( 7 × 7), 96 filters
Conv_2 − (5 × 5), 256 filters
Conv_2 − (5 × 5), 256 filters
Conv_2 − (5 × 5), 256 filters
Conv_3 − (3 × 3), 384 filters
Conv_3 − (3 × 3), 384 filters
Conv_3 − (3 × 3), 384 filters
–
Conv_4 − (3 × 3), 384 filters
Conv_4 − (3 × 3), 384 filters
–
–
Conv_5 − (3 × 3), 512 filters
Dense layer—1 (512 units)
Dense layer—1 (512 units)
Dense layer—1 (512 units)
Batch normalization + 30% dropout
Batch normalization + 30% dropout
Batch normalization + 30% dropout
Dense layer—2 (64 units)
Dense layer—2 (64 units)
Dense layer—2 (64 units)
Batch normalization + 40% dropout
Batch normalization + 40% dropout
Batch normalization + 40% dropout
network. When the neurons are switched off, the incoming and outgoing connection is also switched off. This is done to improve the learning of the model. It is always preferred to only switch off the neurons to 50%. If more than 50% are switched off, then there can be chances that the model learning would be poor, and the predictions will not be good. Hence, a spatial drop out layer with 25% drop out rate is used after the max pooling layer. The above layers form the feature extraction layers. A feature vector is obtained at the end of the convolution layers. The three classification layers similar to Patil et al. [8] begin with a flatten layer to flatten the feature vector to one dimension. Following this, a dense layer (fully connected layer) with 512 units and another dense layer with 64 units are present. Following each dense layer are a Batch Normalization layer and a dropout layer. After these fully connected layers come the final output layer with nodes equal to the number of classes. Since there are 2 classes for gender, the output layer has 2 nodes, 5 classes for age results in 5 nodes, and 4 classes of ethnicity resulting in 4 nodes. These dense layers are activated with a softmax activation function. In each convolution layer and each dense layer, an L2 kernel regularizer is inserted to prevent the overfitting of the CNN model. Table 6 shows the output layers for gender, age, and ethnicity classifications. Table 6 Output layers Age output
Gender output
Ethnicity output
Dense layer—2 units
Dense layer—5 units
Dense layer—4 units
Softmax activation
Softmax activation
Softmax activation
17 Prediction of Age, Gender, and Ethnicity Using Haar Cascade …
213
4 Results and Discussion 4.1 Transfer Learning Using VGG Face Model VGG face model uses RGB images. The original VGG model was trained on the Nvidia Titan GPU for 2–3 weeks. In addition, the time taken to fine tune on the Fair Face dataset was 40 min for 25 epochs. The model trains on 140,559,179 parameters out of which 125,844,491 are trainable parameters and 14,714,688 are non-trainable parameters. This makes the VGG face model computationally expensive and resource consuming. Hamdi and Moussaoui [9], Sheoran et al. [10], Shanthi [11] use this model as a benchmark to compare their methods, similar to how we’ve used it here in comparison to a CNN. It is observed that due to the higher number of classes in the age category, we get a slightly lower accuracy. The gender classification obtains a good accuracy of 81.51% on the validation set. Since this model has been pre-trained on a large Image net dataset, the training accuracy, and validation accuracy are high when Transfer Learning is used.
4.2 Three Convolution Layer Model This model takes 15 min and 30 s to train. This model proves to be very time effective with its small training time. The accuracy graphs for the three classification tasks are shown in Fig. 7.
4.3 Four Convolution Layer Model The four convolution layer model uses grayscale images as input. The time taken to train this model is 20 min and 53 s over 115 epochs. This model proves to be more time efficient than the VGG face model, but due to the increase in the number of convolution layers, it takes more time than the three convolution layer model. Figure 8 depicts the accuracy graphs of the three classification tasks.
4.4 Five Convolution Layer Model The five convolution layer model uses grayscale images as input. The time taken to train this model is 23 min and 5 s over 100 epochs. This model proves to be more time efficient than the VGG face model but takes more time than the three and four
214
D. Lakshmi et al.
Fig. 7 a Gender b age c ethnicity accuracy graph with three CNN Layers
convolution layer models due to the addition of another convolution layer. Figure 9 shows the accuracy graphs (training and testing) of the three classification tasks. Table 7 shows the accuracy results for gender, age, and ethnicity for all the four models.
4.5 Output Images and Classification Among the three custom CNN models, the four convolution layer model performs better based on the validation accuracies obtained. The four convolution layer model was tested with images to predict the gender, age, and ethnicity. Figure 10 depicts a sample output with the original input image, grayscale converted image, CLAHE applied image, normalized image and the final gender, ethnicity, and age classification output. These images belong to the validation set of the Fair Face dataset. After validation of the model on the Fair Face dataset, real-time validation of the model was performed by capturing live images using a webcam. Haar Cascades algorithm was used in the detection of faces from the captured image. The obtained facial image was then preprocessed before being evaluated by the custom CNN model. Image (a)
17 Prediction of Age, Gender, and Ethnicity Using Haar Cascade …
215
Fig. 8 a Gender b age c ethnicity accuracy graph with four CNN Layer
shows the image captured by the webcam. This image is detected for faces using Haar Cascades, resized to 224 × 224, and is then converted to grayscale. The next image (c) depicts the image after the application of the CLAHE algorithm. The final image (d) shows the normalized image after dividing the pixels of the image by 255. This image is passed on to the custom CNN model to give the predicted outputs of gender, ethnicity, and age. Figure 11 shows a real-time prediction result.
216
D. Lakshmi et al.
Fig. 9 a Gender b age c ethnicity accuracy graph with five CNN Layer
Table 7 Results Model name
Gender Accuracy (%)
Age Accuracy (%)
Ethnicity Accuracy (%)
VGG face model
81.51
65.97
72.22
3 Conv layer model
80.14
65.25
71.69
4 Conv layer model
80.73
65.57
72.19
5 Conv layer model
79.86
65.30
72.01
17 Prediction of Age, Gender, and Ethnicity Using Haar Cascade …
Fig. 10 Sample output
Fig. 11 Real-time prediction sample output
217
218
D. Lakshmi et al.
5 Conclusion On comparing the accuracies obtained from the VGG face and the custom three, four, and five convolution layer models, we see that comparable accuracy is achievable even with a smaller number of convolution layers. The training time of the VGG face model is almost double the training time required by the custom models. The custom models train on about 3,000,000 parameters each, while the VGG face trains on 140,000,000 parameters. This is multi-folds compared to the custom model, thus explaining the extensive training time of the pre-trained model. An advantage of the proposed system is that it uses unfiltered, grayscale images in the prediction process. This speeds up the training and reduces the complexity of the designed neural network model. It was found that this method produced reasonable results on realtime images as well, and not biased toward the dataset used to train the model. This result proves the robustness of the designed model, and it is not confined to the dataset. As in Kamencay [12], we see that CNNs outperform traditional Machine Learning methods. The accuracy can possibly be improved by training the model with more images. With grayscale images, increasing the number of images does not significantly increase the stress on the system’s resources as compared to color images. Slight modifications in the CNN model, like changes in the kernel size, number of kernels and regularization techniques might lead to an improvement in accuracy and reduce the overfitting of the model on the dataset. All these modifications may be considered for further work in this area.
References 1. Serengil S (2023) Deep face recognition with VGG-face in Keras. https://sefiks.com/2018/08/ 06/deep-face-recognition-with-keras/. Accessed 14 Mar 2023 2. Olatunbosun A, Viriri S (2020) Deeply learned classifiers for age and gender predictions of unfiltered faces. Sci World J 12, Article ID 1289408. https://doi.org/10.1155/2020/1289408 3. Micheal AA, Shankar R (2021) Automatic age and gender estimation using deep learning and extreme learning machine. Turk J Comput Math Educ 12(14):63–73 4. Greco A, Percannella G, Vento M et al (2020) Bench marking deep network architectures for ethnicity recognition using a new large face dataset. Mach Vis Appl 31:67 5. Karkkainen K, Joo J (2021) Fair face: face attribute data set for balanced race, gender, and age for bias measurement and mitigation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1548–1558 6. Venkat N, Srivastava S (2018) Ethnicity detection using deep convolutional neural networks 7. Ahmed MA, Choudhury RD, Kashyap K (2020) Race estimation with deep networks. J King Saud Univ Comput Inf Sci. ISSN1319-1578 8. Patil, Thombare R, deo Y, Kharche R, Tagad N (2021) Age and gender detection using CNN. Int J Sci Res Sci Technol 29–33. https://doi.org/10.32628/IJSRST21835 9. Hamdi S, Moussaoui A (2020) Comparative study between machine and deep learning methods for age, gender and ethnicity identification. In: 4th International symposium on informatics and its applications (ISIA) 1–6 10. Sheoran V, Joshi S, Bhayani TR (2021) Age and gender prediction using deep CNNs and transfer learning. Communications in computer and information science, pp 293–304. https:// doi.org/10.1007/978-981-16-1092-9_25
17 Prediction of Age, Gender, and Ethnicity Using Haar Cascade …
219
11. Shanthi N et al (2022) Gender and age detection using deep convolutional neural networks. In: 2022 4th international conference on smart systems and inventive technology (ICSSIT). https://doi.org/10.1109/icssit53264.2022.9716377 12. Kamencay P et al (2017) A new method for face recognition using convolutional neural network. Adv Electr Electron Eng 15:663–672
Chapter 18
Augmentation of Green and Clean Environment by Employing Automated Solar Lawn Mower for Exquisite Garden Design T. Mrunalini, D. Geethanjali, E. Anuja, and R. Madhavan
1 Introduction The very first garden was created for functional purposes. People cultivated veggies or plants. But as mankind advanced in civilization, upper-class people with free time started to favor just ornamental gardens. Additionally, they had slaves or servants who took care of their gardening. Rich individuals in ancient Egypt preferred to relax under the shade of trees due to the hot, arid atmosphere. They built walledin gardens with rows of trees planted inside. The Egyptians occasionally cultivated contrasting species. Sycamores, date palms, fig trees, nut trees, and pomegranate trees were among the many types of trees they grew. Greek gardeners didn’t have a great deal of expertise. To give shade around temples and other public spaces, they would occasionally install trees, but pleasure gardens were uncommon. Although they did so in pots, the Greeks did grow flowers. Despite the admiration of Greek travelers for eastern gardens, gardens in Greece were mainly created for functional purposes. Greeks cultivated orchards, vineyards, and vegetable gardens. Greece and Rome’s ideas were revived in the sixteenth century. Gardening ideas evolved, influenced by classical ideals. In the sixteenth and seventeenth centuries, symmetry, proportion, and balance started to be important concepts. The main axis of most garden designs fell from the house, and a number of cross axes formed a grid pattern. Hedges split the garden into sections. Flowerbeds were frequently arranged in squares with gravel paths dividing them in the sixteenth and seventeenth centuries. Several people rebelled against traditional gardens at the beginning of the eighteenth century, desiring a more “natural” design. Yet gardens in the eighteenth century frequently T. Mrunalini · D. Geethanjali (B) · E. Anuja · R. Madhavan Department of Electronics and Instrumentation Engineering, Kongu Engineering College, Erode, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_18
221
222
T. Mrunalini et al.
Fig. 1 First lawn mower
included shrubbery, grottoes, pavilions, bridges, and follies like fake temples. During the late nineteenth and early twentieth centuries, the arts and crafts movement had an impact on some gardeners. The industrial revolution drove away its supporters, and mass production led to a lack of taste. They missed the handcrafted quality of a bygone period. Some gardeners who were affected by the movement had a romanticized vision of traditional cottage gardens. They designed gardens that featured flower trellises, spotless hedges, and traditional English flowers. Edwin Budding invented the first lawn mower in 1830 in Thrupp, Gloucestershire, England. His mower was supposed to be preferable to a scythe for cutting grass on sports fields and large gardens. The scythe was the first tool invented for cutting grass to a suitable height. It has a straightforward design, with a long wooden handle and a curved blade affixed perpendicularly at the end. The concept of first lawn mower is shown in Fig. 1. Until the nineteenth century, the only way to cut grass was with a scythe, which proved to be a time-consuming and laborious task. When considering previous work done by others in relation with this project, the approach employed in carrying out this project differs from the many methodologies used in the preceding research work. The proposed lawn mower in the design and construction of this project work is aimed at enhancing field cutting efficiency when compared with the traditional fossil fuel-driven lawn mower. The Fig. 2 depicts the gas-powered lawn mower. Nowadays the major issue in the whole world is pollution. In gas-powered lawn mowers, the pollution is caused by emission of gases. Fuel cost is also increasing, so it is not at all efficient to use gaspowered lawn mowers. Hence, solar-powered lawn cutters are introduced. Gardeners and agricultural workers have significantly increased their use of traditional grass cutters in recent years. In addition to consuming a lot of energy, manual grass cutters also produce air pollution that damages workers health. Traditional lawn mowers also make a lot of noise and vibration, which can have a serious negative impact on your health, e.g., it can lead to carpal tunnel syndrome, finger blanching, less dexterity, weaker grip strength, and other problems. To overcome these kinds of problems, a new grass cutter machine design has been suggested. This equipment, dubbed a smart solar grass cutter, can be powered and operated by solar energy. Three main systems are used to operate this device, namely the smart control system, the solar system, and the grass cutter. Using solar energy from the sun, a solar lawn mowers blade
18 Augmentation of Green and Clean Environment by Employing …
223
Fig. 2 Gas-powered lawn mower
is propelled by an electric motor. Solar energy is a renewable energy source. It is challenging for a human to mow the grass with ease. Due to their loud engines and the resulting air pollution from engine combustion, riding, and push lawnmowers both produce noise pollution [1]. Even though some mowers are environmental friendly, they are some inconvenient to use them. Electric lawn mowers, for example, are risky and dangerous and cannot be used by everyone as readily as powered lawn mowers. Additionally, mowing will be risky and difficult if the electric lawn mower is attached [2]. The self-propelled electric lawn mower with remote control has the capacity to regulate the machine’s motion. This prototype is safer to use, will be cost efficient, user friendly and be environment friendly. It can also cut off the labor cost [3].
2 Literature Survey A survey has been made to review the art of gardening as well as to inculcate knowledge on the tools which aids gardening. The following are the manuscripts from which the basic idea of the proposed system is derived. In 2013, Basil and Okafor [4] suggested a model to develop a lawn mower based on simple design of self-powered lawn mower. To change the cut height, a lift mechanism that runs on a 12 V alternator is included. A system of v-belt pulleys with less slip effect and foldable blades to lessen wear are used to achieve this. The design is special, in that there is no engine involved thanks to the usage of foldable blades and the inclusion of an alternator for recharging the battery. In 2017, Okokpujie and Olasevi [5] have elaborated the design and construction of automatically operating cylindrical lawn mower. Using internal gear system, the mower transfers the torque to the blade. The efficiency of cutting was found to be 91% with human effort 0.24 KN. In 2014 Mabesh [1] evaluates different types of lawn mower like solar, electric, and gasoline lawnmower. It is concluded that cutting efficiency of solar lawn mower is over 90%. It produces no noise and causes no air pollution when compared with internal combustion engine lawn mowers. It is also concluded that charging of system may also depending on how the sun shines when it hits the solar panel.
224
T. Mrunalini et al.
In December 2015, Srishti Jain et al. [6] evaluates the self-efficient and sustainable robotic lawn mower powered by solar energy. He claims that an automatic mower will enable the employee to work more efficiently. This type of robot not only stays on lawn, but also helps to detect to objects. For the detection of objects, they use IR sensor. If it detects the object is in its right and helps to move to right direction, if the its left helps to move to left direction. They use 12 V 310 mA solar panel which contributes 0.5v each. Disadvantages in solar power-based robotic lawn mower is that response of system will be very slow sometimes, so high-end DSP processors will be recommended in real time. In February 2017, Yadav et al. [7] evaluates automated solar grass cutter. This paper discusses about the purpose of robot which is capable to cut the lawn in the day to day life. It will work on automation and obstacle detection and battery and the top of system a solar panel is connected which helps to reduce the problem of power. This system uses 12v batteries to power the motor. External charging is not essential because of the solar panel. The vehicle motors and grass cutter are interfaced with 8051 microcontroller family which helps to control the working of all motors. The motor also interfaced with an ultrasonic sensor for detection of object. If any object is detected, then microcontroller will stop the grass-cutting motor to avoid the damage to the object or animal. Akinyemi and Ddamilare [2] discuss about the design of solar lawn mower which is generally environmentally friendly. It supports environmental sustainability due to no production of greenhouse gases. It is cost effective when it is compared with conventional gasoline-operated lawn mower. Olawale et al. [8] discuss about the development of lawn mower which can be able to cut the grass in the lawns without the help of building boundary wire as a fence to be mowed. Noise level in this mower is lower when compared with others. This mower does not carry any harm to the operator. When compared with other mower, this type is cost effective. During the testing of the device by the operator, it is found that when there is more population of grass, the fine cutting in that region is lesser. However, the fine cutting can be corrected by adjusting the cutting blade. Hengtao Liu et al. [9] demonstrate the use of voice recognition via Bluetooth technology to reduce human effort so that elderly users and people with disabilities can complete their tasks on their own, the operation of a grass-cutting robot system with just a click on a smartphone running the Android operating system, and the use of solar power to do away with the need for gas, oil, and an engine. The user-friendly, affordable, secure, and ecologically beneficial characteristics of this prototype ensure that the grass cutter robot will stay inside the boundaries of the lawn. The controller allows the user to manage the lawn mower, and the absence of main supply lines also increases the operating range. Grass can be trimmed to a variety of lengths because the cutter is completely adjustable. Vaikundaselvan et al. [10] discuss about “design and implementation of autonomous lawn mower”. The prototype will not only be cord-free but also automatic and powered by a rechargeable battery. This cordless electric lawn mower with remote control costs less than this robotic lawn mower with sensor capability. Use of this robot lawn mower is secure. Because the user can have enjoyable control over
18 Augmentation of Green and Clean Environment by Employing …
225
the lawn mower with the controller, thanks to its remote-control feature, the lawn mower stays inside the lines of the lawn. Additionally, this prototype is environmentally beneficial. This device is powered by electricity, thus there is no need for gas, oil, or an engine to operate it. Modern lawn mowers are practical pieces of equipment that use a rotating blade to trim a lawn at an even, smooth length. There have been many different lawn mower designs developed since 1830. In October 2019, Dost Muhammad et al. [11] demonstrate “Solar Powered Automatic Pattern Design Grass Cutting Robot System Using Arduino”. This project aims to build a programmable, autonomous, pattern-design lawn cutting robot that runs on solar power, eliminates the need for labor-intensive human cutting of grass, and can be remotely controlled by an Android smartphone from a safe distance using Bluetooth. The cutting blade can also be altered to maintain various lawn lengths. The primary objective was to create a prototype that needed little to no direct user input. To fulfill the work, a DC battery, a solar panel, a DC battery, an Arduino microcontroller, DC geared motors, an IR obstacle detection sensor, a motor shield, a relay module, a relay, a motor, and a Bluetooth module are utilized. To cut the grass in a precise pattern or directly, the user can control the robotic lawn mower from a distance. When the user presses the button for the chosen pattern via the mobile application, the system will start cutting grass in the defined pattern, such as a circle, spiral, rectangle, or continuous pattern. An automated obstacle detection system is also installed employing sensors at the front of the vehicle to enhance safety measures and remove hazards. Almali et al. [12] discuss about the wireless remote control of a mobile robot. In this work, a mobile robot meant to assist humans in hazardous and confined spaces is developed. The mobile robot consists of a movable platform and a gripper-equipped 4-dof robot arm. Either a computer-based interface program or an independently operated microprocessor-controlled module can be used to control this robotic system. Through the use of the created interface program, data is wirelessly transmitted to the computer’s USB port to enable communication between the user and mobile robot. Mandloi et al. [13] design, development, and testing of a machine for low capital and operating cost shrub cutting is being done with the goal of resolving one of the annoying issues that this institute has been dealing with on an annual basis. Around 200 acres of non-fertile soil surround this institution, which in the post-rainy season is heavily overgrown with bushes and shrubs. Due of this unwelcome expansion, the institution loses its lush appearance and pours a lot of money to get rid of it every year. The bushes cutting equipment consists of a primary body with two wheels and a handle for field mobility.
226
T. Mrunalini et al.
3 Methodology 3.1 Introduction In general, lawn care services like mowing and grass trimming are not highly valued by property owners. Moreover, given the wide range of options, finding mowing services might be difficult. However, maintaining the lawn is a vital and inevitable activity that must be arranged on a regular basis. So the proposed methodology is developed and designed to maintain the environment clean and presentable. The components that will be used in the suggested model are an ultrasonic sensor, an Arduino Uno, a DC motor, a battery, a solar panel, and a cutting blade. The solarpowered lawn mower uses solar electricity to recharge batteries, which are then used to power an electric motor, which in turn actuates the blade as the mower is moved [14]. Each kilowatt-hour (kWh) of solar energy produced will significantly reduce CO2 emissions as well as other harmful pollutants such as sulfur oxides, nitrogen oxides, and particulate matter. Solar also minimizes water withdrawal and use. It will not have to worry about recharging at the pump or spending an excessive quantity of oil to maintain unnecessary parts that gas-powered lawn mowers require. The block diagram represents the flow of the process as shown in Fig. 3. Initially, the battery is charged through the solar panel which prevents external charging [6] and the instructions are given to the Arduino through the ultrasonic sensor. According to the instructions given by the ultrasonic sensor, Arduino acts on the grass-cutting motor and moving motors. The entire process flow is shown in the Fig. 4.
solar panel
12V battery
ultrasonic sensor
Fig. 3 Block diagram
Grass Cutting motor
A R D U I N O
U N O
Motor Driver
Motor1
Motor2
18 Augmentation of Green and Clean Environment by Employing …
227
Fig. 4 Flow diagram
4 Components Used The components that will be utilized in the project, their locations, the construction of the main body, the benefits and drawbacks of the design, and safety considerations are just a few of the variables that must be taken into account while creating a smart solar grass cutter. The smart solar grass cutter has the option of being autonomous or not. Other than that, efficiency is a key consideration [15]. To increase efficiency, it is essential to choose the right materials and components as well as the right placements. This smart solar grass cutter has a straightforward design that maximizes the use of resources. The size or dimensions of the solar panel will determine the total dimensions. For the blade and the rear tires, three motors are employed. The height of the battery affects how high the roof is. The front tires are rubber rotating wheels because they automatically change direction in response to the rear tires. There is a motor installed for each rear tire. The design is affordable and in line with the primary goals. Using SolidWorks software, the prototype is created in multidimensional starting from the hand sketch. The design’s dimensions must be precise and correct in order to increase the safety factor. For the future enhancement of the lawn mower, nod MCU or Wi-Fi module will be incorporated, and an additional number of sensors can be included. Ultrasonic sensor Ultrasonic sensors are excellent for detecting items and measuring distance without making direct touch with them [7]. It is employed in a variety of tasks, including liquid level measurement, proximity detection, and more prominently assisting self-parking
228 Table 1 Ultrasonic specifications
Table 2 Arduino specifications
T. Mrunalini et al.
Operating voltage
5 V DC
Operating current
15 mA
Operating frequency
0 kHz
Min range
2 cm/1 inch
Max range
400 cm/13 feet
Accuracy
3 mm
Measuring angle
< 15°
Dimension
45 × 20 × 15 mm
Microcontroller
Atmega328
Crystal oscillator
16 MHz
Operating voltage
5V
Input voltage
5–12 V
Digital I/O pins
14 (D0 to D13)
Analog I/O pins
6 (A0 to A5)
and anti-collision systems in automobiles. All the specifications of the ultrasonic sensor will be given in the Table 1. Arduino Uno The Atmega328-based Arduino Uno microcontroller board was created by Arduino.cc and is referred to as the original Arduino board (UNO means “one” in Italian). While the input voltage ranges from 7 to 12 V, the operational voltage is 5 V. The load shouldn’t exceed the Arduino Uno’s maximum current rating of 40 mA because doing so could damage the board. It has a 16 MHz crystal oscillator, which is the frequency at which it operates. Table 2 depicts the specifications of the Arduino Uno. DC motor An electrical device known as a direct current motor transforms electrical energy into mechanical energy. Direct current is used as the electrical energy source in a DC motor, where it is transformed into mechanical rotation. A DC motor is any rotating electrical apparatus that converts direct current electrical energy into mechanical energy. Cutting blade The cutting blade and cutter deck housing are the most significant components of a rotary mower. Some of the elements impacting cut quality are cutting blade speed, angle, and sail. Lawnmower batteries typically have a lifespan of 3–5 years, although they may fail after as little as 1 year or as long as 8 years [8]. This is because the
18 Augmentation of Green and Clean Environment by Employing … Table 3 Battery specifications
Brand
Amptek
Country of origin
India
Battery capacity
2.3 Ah
Battery voltage
12 V
Dimensions
43 × 97 × 57 mm
Weight
550 gm
229
battery’s capacity to hold a charge degrades over time. Battery specifications are given in the Table 3. Calculations The difference between the object and the solar grass cutter were obtained through the following formulas, Distance = time taken xspeed of sound/2;
(1)
Distance (cm) = echo pulse width (uS)/58;
(2)
Distance (inch) = echo pulse width (uS)/148.
(3)
5 Results and Discussion 5.1 Implementation Process The mechanical aspect is made up of numerous pieces that are connected to create a frame for the solar panel, a seat for the battery, and the blade design. The rectangular top of the frame was designed using wood with a thickness of 1 cm [16]. The frame and supports were designed to support the weight of the solar panel while the mower is being driven. The battery seat was likewise made of wood with a thickness of (1 cm) and was fastened to the body of the deck using screws. The body of the deck area was made of wood, making welding operations impossible. All deck joining was done with fasteners. The electric motor and blade were housed beneath in a compartment on the deck. The blade was made of 4 mm thick mild steel and was attached to the electric motor. The blade was designed to create polarity when cutting grasses of varying heights, as well as to keep the blade from wobbling. Figures 5 and 6 depict the fabrication model and sideview of the proposed model.
230
T. Mrunalini et al.
Fig. 5 Fabrication model of automatic solarlawn mower
Fig. 6 Side view of solar lawn mower
6 Results The solar-powered lawn mower was constructed and developed. Solar energy is created because of the solar panel configuration and is then stored in a battery. Electrical energy is created from solar energy. The battery is connected to the rotating blade and DC motor. The grasses are chopped to an even height with the aid of this mower [9]. The solar lawnmower was built and tested. During machine operation, the blade converted electrical energy from the battery to mechanical energy to produce the cutting motion. The electric circuit guaranteed that power was transmitted from the battery to run the DC motor, while the solar panel charged the battery continually throughout operation. The DC motor supplied power to the blade at a speed of 3000 rpm. When the switch is activated, the battery’s electrical energy powers the motor, which then turns on the blade. In order to correct for battery discharge, the solar panel generates current that is used to recharge the battery. Cutting of the grasses are done through the blade which has diameter of 15 cm. Cutting blades are incorporated in the prototype for effective and uniform trimming of grasses used to leave grasses of uniform length of 5 cm [4]. As the mower moves, the revolving blade cuts the grass continuously. It was convenient to cut grass at various heights throughout the operation by employing an adjustable lever mechanism linked to the machine’s deck area [10].
18 Augmentation of Green and Clean Environment by Employing … Table 4 Time and distance analysis
Time taken (s)
Distance (cm)
231
Action
0.001
17
No action
0.002
34
No action
0.005
85
Action
0.006
102
Action
From Table 4, it can be inferred that the ultrasonic sensor continuously produces high-frequency sound waves as the air’s sound velocity varies. When waves hit the object, they reflect to the receiver. The sensor has a range of up to 2 m and emits waves at a frequency of around 40 kHz. The Arduino receives a signal from the ultrasonic sensor. If the distance is less than or equal to 50 cm, the Arduino will act based on the instructions provided by the ultrasonic sensor [11].
6.1 Comparative Study The following factors were observed when comparing the solar lawn mower with the conventional gasoline lawn mower. These findings provide a detailed illustration of the many parameters, including how obstructions, grass with various lengths, and equal distributions affect the outcomes [5, 12, 13]. Result analysis of the proposed model is given in the Tables 5 and 6. Table 5 Result analysis Proposed lawn mower
Existing lawn mower
Efficient than existing systems—75%
Comparatively low efficient—60%
Grass cutting-even
Grass cutting-uneven
Pollution free
Some mower creates pollution
No fuel is required
Some mower consumes fuel
No need for external charging
Need to charge externally
Table 6 Comparative study Sample grass
Height (cm) Before mowing)
Height (cm) (After mowing)
Expected output (cm)
Accuracy (%)
Spare grass
100
60
70
85
Carpet grass
70
56
50
78.4
232
T. Mrunalini et al.
7 Conclusion and Recommendation 7.1 Conclusion The design characteristics of the solar lawn mower were investigated and stress analyses were performed on several machine components. The solar lawn mower was developed. The solar lawn mower was examined and shown to be successful at cutting all sorts of grass at various heights. A comparison study was conducted between the solar lawn mower and traditional gasoline-powered mowers. When compared with a traditional lawn mower, this project offers several advantages, such as no fuel costs, no emissions, and no fuel residue. It can be controlled remotely. This technology minimizes the amount of work that humans must do. This system can charge the batteries in the sun during the day. This project will undoubtedly benefit many families because the grass can be trimmed at a low cost and in a short period of time. Finally, this project may encourage and inspire those who can modify.
7.2 Future Scope Based on the design and results, it is proposed that a solar panel with a larger current draw is used to entirely charge the battery in a shorter period, allowing the machine to run for longer periods of time. A larger ampere-hour battery should also be utilized to extend the machine’s working hours. The electric motor’s speed is sufficient to cut all varieties of grass, resulting in optimal efficiency. Furthermore, the blade should be designed in such a way that it enhances the angle of cut. Finally, an obstacle sensor should be added to the machine to identify obstructions ahead while the machine is moving forward, preventing wear on the surface of the blade and cracking on the solar panel. This would increase efficiency even further.
References 1. Mabesh P (2014) Design and fabricationof grass cutter. Int J Res Appl Sci Eng Technol 2. Yadav RA, Chayan NV, Patil MB, Mane VA (2017) Automated solar grass cutter. Int J Sci Develop Res (IJSDR) 2 3. The Old Lawnmower Club (2011) Mower history. The old lawnmower club. Retrieved 23 Apr 2011 4. Okafor B (2013) Simple design of self-powered lawn mower 3(10) 5. Okokpujie I, Olasevi OK (2017) Design, construction of a cylinder lawn mower. J Eng Appl Sci 6. Jain S, Khalore A, Patil S, Self efficient and sustainable solar powered robotic lawn 7. Mower in International Journal of Trendin Research and Development (IJTRD) 2(6)
18 Augmentation of Green and Clean Environment by Employing …
233
8. Akinyemi AO, Damilare AS (2020) Design and fabrication of a solar-operated lawnmower. Int Sci Res Technol 5(10). ISSN No: 2456-2165 9. Ajibola OO, Olajide S (2021) Design and construction of automated lawn mower. In: Proceedings of the international multiconference of engineers and computer scientists 2021 IMECS 2021, Oct 20–22, Hong Kong 10. Khan DM, Mumtaz Z, Saleem M, Ilyas Z, Ma Q, Ghaffar S (2019) Solar powered automatic pattern design grass cutting robot system using arduino 11. Kumar R, Ushapreethi P, Kubade PR, Kulkarni HB (2016) Android phone controlled bluetooth 642 robot. Int. Res. J. Eng. Technol. 3(104–114):643 12. Almali NM, Gürçam K, Bayram A (2015) Wireless remote control of a mobile robot. Int J Sci 644 Res Inf Syst Eng 13. A YR, V CN, B PM, Mane VA (2017) Automated solar grass cutter. Int J Sci Dev Res 685 14. Ibe, Anwara G (2017) Design of a hand-heldgrass mower. Int J Eng Res 15. Raikwar S, Oswal P, Upganlawar J (2019) Bluetooth-controlled lawn mower. Int J Sci Res Rev 700 16. Rao KP, Rambabu V, Rao KS, Rao DV (2014) Mobile operated lawnmower. Int J Mech 698 Eng Rob Res 3, 106– 117. 699.
Chapter 19
A Lightweight Solution to Intrusion Detection and Non-intrusive Data Encryption Mahnaz Jarin, Mehedi Hasan Mishu, Abu Jafar Md Rejwanul Hoque Dipu, and A. S. M. Mostafizur Rahaman
1 Introduction In November 2022 Prior to Bahrain’s legislative and local elections, hackers launched DDoS attacks against the country’s government websites [1]. This attack, like many other data leaks and data theft incidents, is caused by a common denominator. An intruder or an undesirable entity violates the system’s security standards and causes harm. We call this an intrusion in general. The Internet of Things (IoT) is revolutionizing industry and life with smart gadgets including smart homes, smart grids, and more. By 2025, 41.6 billion IoT devices are projected to be connected, which presents numerous difficulties for IoT’s actual implementation [2] this industry is one of the prime attack fields of intruders. Data integrity and confidentiality concerns are present, particularly in big IoT networks. There are more security issues now, like targeted zero-day attacks on Internet users. People and organizations of all strata require adaptable and longlasting solutions to stop these threats and offer adequate data protection. The development of data security depended on the ideal application of encryption algorithms and cryptographic procedures. An intrusion detection system is an important tool that has been used for a long time to find these intrusions and keep a system’s security intact. Based on how a network works and what kind of network it is, ID is divided into two types: (1) network-based IDS, which views the contents of sole packets to find malicious activity in network traffic, and (2) host-based IDS, which views the information M. Jarin (B) · M. H. Mishu · A. J. M. R. H. Dipu Department of ICT, Bangladesh University of Professionals, Dhaka, Bangladesh e-mail: [email protected] A. S. M. Mostafizur Rahaman Department of CSE, Jahangirnagar University, Dhaka, Bangladesh © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_19
235
236
M. Jarin et al.
in log files like file systems, sensors, disk resources, software and system logs, and others on each one system. Techniques for analyzing and putting network traffic data into groups include state-full protocol analysis, misuse detection, and anomaly detection [3]. Some of the deficiencies that remain with IDS are: a fair amount of false alarms, putting out a lot of alerts for low-risk situations, and being Costly, not ideal for small IoT devices. Only the detection of intruders isn’t enough. The pieces of information that we store in a secured computer or hard drive are also stolen. Only an effective encryption technique can protect the data even if stolen. All these ECC encryption approaches generally surround with cloud data protection or save data transmission. Doesn’t particularly focuses on personal data encryption or usage on a smaller scale. This paper’s encryption is focused on that ideology. Present-day intrusion detection systems can detect attacks but mostly aren’t integrated with data protection facilities. Again, as mentioned earlier, the highperforming deep learning-based IDSs aren’t ideal for individual nodal use and don’t offer data protection services. Keeping this in mind, the proposed system in this paper addresses its contributions. 1. The suggested system is lightweight which makes it ideal for individual use. 2. The proposed system is cost-effective as it only uses one classifier, making it affordable for small enterprises and organizations. 3. The system is capable of integration with IoT devices as it can be used in key points of networks and also offer a simple yet effective encryption system with ECC. 4. The encryption system doesn’t involve any third-party assigning and also enables the data to be saved locally or in the cloud. Vinayakumar et al. [3], Roy and Cheung [4], Min et al. [5], Yuan et al. [6] presents different approaches for IDS with deep learning techniques [4]. Presented a model with recurrent neural network (RNN) to detect intrusion for misused-based intrusion, a three-layer RNN model was built in NSL-KDD with 99.81% accuracy for binary classification, and 99.53% for multiclass classification [5]. Compares RNN and convolutional neural network (CNN) using ISCX2012 dataset. It concludes that CNN is finer for binary classification with 94.26% accuracy [6]. Uses various RNN variants such as long short-term memory (LSTM), 3LSTM, and gated recurrent unit (GRU) [7]. Proposed a model called TR-IDS. It combines the best features of random forest and CNN. TR-IDS accuracy is 99.13%, and the Alarm Rate (FAR) is 1.18%. Ke Chen et al. [8] described a model called hybrid particle swarm optimization with spiral-shaped mechanism (HPSO-SSM) with wrapper approach that uses a modified PSO with the spiral-shaped mechanism for choosing necessary attribute subsets. The accuracy for HPSO-SSM was 92.08% [9]. Presented a feature selection model on NSL-KDD where they utilized embedded methods referring to the process of selecting features via RF. The accuracy was 99.32%.
19 A Lightweight Solution to Intrusion Detection and Non-intrusive Data …
237
Researchers have used a wide range of techniques in terms of using different machine learning classifiers to build IDS. From [10–15] applied LightGBM classifier in their approach applied other approaches [10]. Uses the UNSW-NB15 dataset and applies semi-supervised learning with the tri-training method with LightGBM. It achieves an accuracy of more than 95% [11]. Follows an autoencoder (AE) approach to extract features and LightGBM for detection using the NSL-KDD and KDD Cup99 datasets for the smart distribution network. It had an accuracy of 99.70% and 99.90%, respectively [12]. Took three benchmark datasets and used the oversampling technique with LightGBM [13]. Adopted the AE method for detection and LightGBM for choosing the features on the NSL-KDD dataset [14]. On dataset CICIDS-2018 using LightGBM with under-sampling embedded feature selection achieved an accuracy of 98.37%. In [16–18] researchers used RF in their models, in [16] in a stacking model RF was used with SVM [17]. Uses RF with particle swarm optimization (PSO) for detecting intrusion. Li et al. [19] presented a hybrid approach with modified AES and (ECC). The ECC encryption approach is applied to encrypt the advanced encryption standard (AES) key while the enhanced AES method is utilized to encrypt plaintext. For the safe sharing of medical data over the cloud, Hema et al. [20] proposed an approach utilizing the ECC encryption technique. The Health Cloud (HC), Cloud Users (CUs), and the Health Cloud Trusted Third Party-Cryptographic Server (TTPCS) are the three main components of the suggested technique (HC). The TTP-CS creates a public key and a private key, with the private key being on the CU side and the public key being on the TTP-CS side, to assure integrity. For instance, to upload a 500 MB file it took 32.10 and 38.22 s to upload and download, respectively. Vengala et al. [21] applied ECC and the SHA-512 algorithm and Cued Click Point for an authentication scheme using three factors. Users’ compressed data is then securely sent to the cloud server once Modified-ECC (MECC) has encrypted it. Their proposed MECC provides 96% security while testing with 5–25 Mb size data. Hafsa et al. [22] proposed a model for image encryption with AES and ECC. Used in their work is an efficient form of a hybrid approach that mixes the advantages of symmetric AES to speed up data encryption with asymmetric ECC to secure the exchange of a symmetric session key. A model was proposed by Prabhakaran and Kulandasamy [23] that detects non-intrusive text data and flashes the intrusive data. Later, the non-intrusive data is encrypted using ECC-Modified Flower Pollination Algorithm (MFPA). This is a double encryption system. The suggested RCNN model has an accuracy of 99.67%, specificity of 98.98%, and sensitivity of 99.78%. However, they didn’t clearly mention how the replaced data was generated.
238
M. Jarin et al.
2 Methodology 2.1 Proposed System We proposed a whole system for intrusion detection and data encryption. This system consists of two main components. They are the IDS and the encryption system. The IDS is the first component of the system that detects any anomaly present in the network, if found the intrusive data can’t pass the system. In the second phase, the non-intrusive data can be encrypted with the ECC algorithm upon the wish of the user. The ECC generates points like the traditional ECC algorithm as the parameters given by the user, the encryption-decryption process is unique for the proposed system. Figure 1 shows the system flow of the proposed system as per the description written above.
2.2 Used Dataset and Preprocessing For this paper, among the three different classifications of the NSL-KDD dataset, the KDDtrain+ dataset was taken. It had 42 features, of which 38 were numeric, and 4 were non-numeric. The dataset had 23 different attacks listed in its label feature. For this experiment, primarily the dataset was given appropriate column names. The difficulty column was dropped. The 23 types of attack were categorized into four types of attack types. They are Denial of Service (DoS), probe, Root to Link (R2L), and User to Root (U2L). Another type was normal. Table 1 shows the dataset and the total instances of different attacks that were grouped into four categories. After categorizing, the numeric attributes column was taken from the dataset, and the data were normalized. Flag, protocol_type, and service these three non-numeric data were taken. A data frame of these categorical attributes was created. Through one-hot encoding, they were represented. For binary classification, the attack labels were changed into “normal” and “abnormal”. Then a dataframe was created with
Fig. 1 Intrusion detection system with encryption
19 A Lightweight Solution to Intrusion Detection and Non-intrusive Data … Table 1 KDD train+ dataset
Attack category
239
Dataset NSL-KDD Train
Normal
67,343
DOS
45,927
Probe
11,656
R2L
995
U2R
52
Total
125,973
Fig. 2 Attack distribution for the binary dataset, multiclass dataset
binary classification “normal” and “abnormal”. A dataset with binary labels and label-encoded columns was created. Again, a data frame with multiclass labels (Dos, Probe, R2L, U2R, normal) was created, label encoding (0, 1, 2, 3, 4) multiclass labels (Dos, normal, Probe, R2L, U2R) was done and one-hot-encoding attack label was done too to see the distribution of the multiclass classification. Attack distributions were checked in both binary and multiclass datasets, illustrated in Fig. 2a, b. that shows the normal data is 53.46 for both, abnormal is 46.54%. In multiclass 46.54% is divided among four classes.
2.3 Feature Extraction In the feature extraction phase, a data frame with the numeric features of the binary class dataset and encoded label attribute was created. The attributes having more
240
M. Jarin et al.
than 0.5 correlation with encoded attack label features were identified using Pearson correlation. A total of 9 attributes were found to have more than 0.5 correlation. These attributes were selected and joined with the categorical dataframe of the one-hot encoded one. Then encoded, the original attack label and one-hot encoded attributes were joined. The dataset was saved to the disk and it was the final dataset for the binary classification. The process was repeated for making the final dataset for multiclass classification. The dataset was saved to the disk, and it was the final dataset for the multiclass classification. The dataset was ready to fit on classifiers.
2.4 Model Training For building the IDS we fit the dataset on eight classifiers to compare the best results. The classifiers were Support Vector machine (SVM) (RBF, polynomial), Knearest neighbor (KNN), random forest (RF), logistic regression (LR), decision tree (DT), gradient boost algorithm (XGB), and LightGBM. Keeping 0.75:0.25 respectively for training and testing, using these eight classifiers the models were trained both for binary and multiclass classification. KNN had n = 5 neighbors, RF had 500 estimators. The model was cross-validated using tenfold cross-validation for accuracy.
2.4.1
Evaluation Matrices
The most important evaluation metrics for our IDS evaluation are accuracy and recall. Achieving greater accuracy means correctly identifying intrusion among all the samples, the higher the recall rate, the lower would be the false intrusion rate. However, for the confusion metrics for each model and both classifications were generated. Along with the precision, the F1 score Accuracy = ((TP + TN)/(TP + TN + FP + FN)) × 100
(1)
Recall (R) = TP/(TP + FN)
(2)
Precision (P) = TP/(TP + FP)
(3)
F1 score = (2.R.P)/(R + P)
(4)
19 A Lightweight Solution to Intrusion Detection and Non-intrusive Data …
241
Fig. 3 Three phases of ECC
2.5 ECC Encryption 2.5.1
ECC
A curve that is elliptic the coordinates (x, y) matching the equation form the cubic curve E over a domain K as well as the element O known as “the point at infinity” y2 = x3 + ax + b
(5)
It is significant to notice that only applies to fields where the characteristic is not equal to 2 or 3 and is in the condensed Weierstrass form. The non-singularity of the curve is guaranteed by the conditions that because the discriminant = − 16 (4a·3 + 27b·2) is non-zero and has distinct roots for the equation x 3 + ax + b [24]. Here, in Fig. 3 the three phases of ECC are shown. The first one is the point generation phase which is the traditional ECC part, the second one is the ECC encryption process, and finally, the decryption process that is proposed by this paper. In the point generation phase of ECC, the curve parameters are set manually by the user. They are the number of primes and co-efficient of x and y variables. Generates the base points. The user gives the input for the initial curve point. The user gives the index value of the x and y coordinates of the base point, depending on that, it generates the initial curve point. Secondly, it asks for the private key from the user. Upon giving the key it has to be maintained that the key is smaller than the number of primes given in the system. If the secret key is given a bigger number than the number of primes it shows a message for a smaller key. The message needs the private key to be decrypted. With the private key, it generates a user public key. It multiplies the private key with the initial curve point. Encryption phase: The main goal is to encrypt the text of choice by the user. Here for the encryption, the encryption key ‘k’ is the same as the private key ‘d’. We generated two points first. The first point is called the encryption point in our system which is calculated by multiplying the previously generated public key with the encryption key ‘d’ or private key ‘k’. Another point is called the decryption point which is the same as the public key. Now, from the decryption point the encryption point KP is generated by converting them into their corresponding Unicode by using Chr (). The x coordinate value is taken from the encryption point as the KKP point. This value is converted into an eight-bit binary number. This is our key. After this, the input text is given by the user. The test is converted into its corresponding integer value. The integer is converted into an 8-bit binary number. This is our data. The integer value
242
M. Jarin et al.
of the KKP and the integer value of the text is saved on two arrays. The arrays went through bitwise XOR operation. The result of the XOR operation is then converted into 8-bit binary values. From this, the values are again converted to Unicode. The final encryption result is achieved by joining the header with the encrypted plaintext and #. The header comes from the Unicode of point KP. The final encryption result sequence is KP1#KP2#xor result. KP1 and KP2 are the x and y coordinates of point KP in Unicode. Decryption phase: To decrypt the message encrypted with the system the header needed to be separated from the encrypted cipher text. As the encrypted ciphertext comes from the KPP point and input text’s XOR, this point is changed to its corresponding value in integer and then to its 8-bit binary value. The KP point which is the same point as the decryption point is calculated by multiplying the encryption key ‘k’ with the integer value of (KP1, KP2). For this, the decoder must know the k and the KP Unicode value. The x coordinate of the KP point is taken and it’s the KKP point. Which is converted into binary. The integer values of the ciphertext and KKP point are stored in two arrays. Between them, the bitwise XOR operation is performed. The result is converted into its binary value which is finally converted to its character type. Thus, representing the original text.
3 Result Analysis For the eight classifiers, their cross-validation accuracy was taken both for binary and multiclass classification. First, we look for higher accuracy, then we consider the recall rate. Because the higher the recall rate, possess the FAR. For a better understanding of the confusion matrices were considered. Considering binary classification first. In Fig. 4. After cross-validation accuracy is shown, in which the highest detection rate was achieved by the LightGBM classifier in the terms of accuracy. Although random forest (RF) had a nearly similar accuracy of 98.70%, to make a better choice, we compared their precision (P), recall (R), and F1score. The classifier having the lower False Negative Rate (FNR) is preferred according to us so that any attack can’t go unnoticed. For that, recall helped to determine which classifier had the lowest FNR. Illustrated in Fig. 5, shows the comparison of RF and LightGBM for precision, recall, and F1 score. The LightGBM had an FNR of 0.0124 and for RF it was 0.0129, and the reasonable false alarm rate, in this case, in Fig. 7 the confusion matrix of RF for binary classification, in which falsely detected the intrusion in among 16,774 attacks RF falsely detected 229 attacks, according to Fig. 6: the confusion matrix of LightGBM for binary classification, that shows LightGBM falsely detected 246. However, having almost similar results, the LightGBM trained faster, making it our primary choice for binary classification. For multiclass classification, the evaluation was done considering the accuracy rate after cross-validation. Figure 8 shows the multiclass class classification accuracy. It
19 A Lightweight Solution to Intrusion Detection and Non-intrusive Data … Fig. 4 Classification accuracy of different classifiers for binary classification
Fig. 5 Metrics comparison
Fig. 6 Confusion matrix (LightGBM)
Fig. 7 Confusion matrix (RF)
243
244
M. Jarin et al.
Fig. 8 Classification accuracy of eight classifiers for multiclass classification
also shows the comparison of eight classifiers in terms of accuracy. Since in multiclass classification, there were five categories dos, probe, U2R, R2l, and normal each of them had its own precision, recall, and F1 score for each classifier. Since, finding low-level (U2R, R2L) attacks are hard, we choose to prefer the classifier that would be able to detect the low-level attacks more efficiently. In Fig. 8. The accuracy of RF is the highest among the eight, 98.51%. In Fig. 9 the confusion matrix of RF for multiclass classification is shown. Among the 16,674 samples, it detected 16,601 normal non-intrusions. The precision, recall, and F1 score for detecting the attacks by RF is shown in Table 2. However, among 15 U2R attacks in the sample, it can only detect three. But still, it was the highest among the other classifiers, showing the struggles to detect low-level attacks. DT showed the closest result with 98.33% accuracy, while comparing the detection rate for U2R attacks it lagged behind with a precision of 0.25 and recall of 0.13. While comparing with other approaches using either LightGBM or RF our single classifier model showed decent results. Table 3 shows the comparisons. Those approaches used various techniques, compared to our IDS uses only one classifier. For analyzing the performance of ECC, we tested the word limit till it could encrypt the text. Also, keeping the curve parameters the same, we changed the length of the text and the given password and recorded the time needed to encrypt-decrypt the Fig. 9 Confusion matrix (RF)
19 A Lightweight Solution to Intrusion Detection and Non-intrusive Data …
245
Table 2 Results of RF for multiclass classification Precision
Recall
F1-score
Dos
0.99
0.99
0.99
Probe
0.97
0.97
0.97
R2L
0.92
0.85
0.88
U2R
0.38
0.20
0.26
Normal
0.99
0.99
0.99
Table 3 Comparison with other state of art approaches Author
Dataset
Zhang et al. [10]
UNSW-NB15 Tri-LightGBM + semi-supervised learning 95 >
Yao et al. [11]
NSL-KDD
AE + LightGBM
Liu et al. [12]
NSL-KDD
Resampling + LightGBM
92.57
Tang et al. [13]
NSL-KDD
LightGBM feature selection + AE
89.92
Hua [14]
CICIDS-2018 Embedded feature selection + LightGBM Train+
Our model
KDD
Chand et al. [16]
NSL-KDD
Method
Accuracy (%) 99.70
98.37
LightGBM
98.71
SVM + RF stacking
97.50
Cleetus et al. [17] NSL-KDD
PSO + RF
91.71
Faysl et al. [18]
N-BaIoT
XGB-RF
99.9
Our model
KDD Train+
RF
98.51
text. This program fails when the text input exceeds the range of chr() function in python. So the text input has to be given in small chunks at a time. It also depends on the curve size and the selected number of prime and initial curve points till how long it can encrypt. Both Tables 4 and 5 show the start time, end time, and time that passed meanwhile to encrypt-decrypt the three different sizes of inputs for different curve parameters. As per Table 3, curve equation: y2 = x 3 + x + 2, results shown in Table 3. For the smallest input, it took 0.178 s to do the job, while it took nearly the same time in terms of a larger input. This largely depends on the size of the initial point chosen. Again, when, the Curve parameter, a = 1, b = 2, number of prime = 1001, and Private Key = 990 so, the curve equation: y2 = x 3 + x + 2, the time needed for encryptingdecrypting different size of texts are shown in Table 4. It takes 3.76 s to encrypt the same text when the key size and number of primes increase. Here also see very less time difference while encrypting 13 and 18 letters, respectively. Like this, changing the parameters of the curve and different keys in many combinations of encryption is possible as long as it doesn’t exceed the Chr () limit. All these experiments were run using a corei5 processor, 8 GB ram, and no GPU machine.
246
M. Jarin et al.
Table 4 Encryption-decryption time for different inputs Given input Time (s)
Hi
Himynameismah
Himynameismahnazza
Start
1.506
3.98
6.66
End
1.68
4.42
7.12
Elapsed
0.178
0.435
0.463
Table 5 Encryption-decryption time for different input Given input Time (s)
Hi
Himynameismah
Himynameismahnazza
Start
21.92
26.39
30.89
End
25.68
30.68
35.006
3.76
4.23
4.109
Elapsed
4 Conclusion We proposed a system with an IDS with a single classifier and an ECC-based encryption solution for the text data of the system. It fulfilled our objective to build a lightweight and cost-effective system that would be affordable for small business owners, and organizations with fewer resources, and IoT devices. Leveraging the low-key space-high security advantage of ECC, we used it to encrypt the text data of the user’s choice in seconds. It gives the encrypted text ample security and it can be stored in that computer, cloud, or any other device. We vision using multiple datasets to increase the U2R attack detection rate and evaluate the model in a real-time network. This ECC system lacks user verification, we’d be working on introducing that in our future versions with unlimited text encryption, and applying image encryption is our goal. Also, for better performance optimizers would be applied in the system both for IDS and ECC.
References 1. Significant cyber incidents. https://www.csis.org/programs/strategic-technologies-program/sig nificant-cyber-incidents. Accessed 27 Dec 2022 2. Abbas, Khan M, Ajaz M (2021) A new ensemble-based intrusion detection system for internet of things. Arab J Sci Eng 47:1805–1819 3. Vinayakumar R, Soman KP, Poornachandran P (2017) Applying convolutional neural network for network intrusion detection. In: 2017 international conference on advances in computing, communications and informatics (ICACCI). IEEE, India, pp 1222–1228 4. Roy B, Cheung H (2018) A deep learning approach for intrusion detection in internet of things using bi-directional long short-term memory recurrent neural network. In: 2018
19 A Lightweight Solution to Intrusion Detection and Non-intrusive Data …
5. 6. 7. 8. 9. 10.
11. 12. 13. 14. 15.
16.
17.
18.
19. 20. 21. 22. 23.
24.
247
28th international telecommunication networks and applications conference (ITNAC). IEEE, Australia Min E, Long J, Liu Q, Cui J (2018) Comparative study of CNN and RNN for deep learning based intrusion detection system. In: Sun X, Pan Z, Bertino E (eds) Cloud computing and security. ICCCS 2018. Lecture notes in computer science, vol 11067. Springer, Cham, pp 159–170 Yuan X, Li C, Li X (2017) DeepDefense: Identifying ddos attack via deep learning. In: 2017 IEEE international conference on smart computing (SMARTCOMP). IEEE, China, pp 1–8 Min E, Long J, Liu Q, Cui J, Chen W (2018) TR-ids: anomaly-based intrusion detection through text-convolutional neural network and random forest. Secur Commun Netw 2018:1–9 Chen K, Zhou F-Y, Yuan X-F (2019) Hybrid particle swarm optimization with spiral-shaped mechanism for feature selection. Expert Syst Appl 128:140–156 Kunhare N, Tiwari R, Dhar J (2020) Particle swarm optimization and feature selection for intrusion detection system. S¯adhan¯a 1(45) Zhang H, Li J (2020) A New network intrusion detection based on semi-supervised dimensionality reduction and tri-LightGBM. In: 2020 international conference on pervasive artificial intelligence (ICPAI). IEEE, Taiwan, pp 35–40 Yao R, Wang N, Liu Z, Chen P, Ma D, Sheng X (2021) Intrusion detection system in the smart distribution network: a feature engineering based AE-LightGBM approach. Energy Rep 7:353–361 Liu J, Gao Y, Hu F (2021) A fast network intrusion detection system using adaptive synthetic oversampling and LightGBM. Comput Secur 106(C):102289 Tang C, Luktarhan N, Zhao Y (2020) An efficient intrusion detection method based on LightGBM and autoencoder. Symmetry 12(9):1458, 1–16 Hua Y (2020) An efficient traffic classification scheme using embedded feature selection and LightGBM. In: 2020 information communication technologies conference. IEEE Xplore, China, pp 125–130 Islam MK, Hridi P, Hossain MS, Narman HS (2020) Network anomaly detection using LightGBM: a gradient boosting classifier. In: 2020 30th international telecommunication networks and applications conference (ITNAC). IEEE, Australia, pp 1–7 Chand N, Mishra P, Krishna CR, Pilli ES, Govil MC (2016) A comparative analysis of SVM and its stacking with other classification algorithm for intrusion detection. In: 2016 international conference on advances in computing, communication, & automation (ICACCA). IEEE, India, pp 1–6 Cleetus N, Dhanya KA (2014) Multi-objective functions in particle swarm optimization for intrusion detection. In: 2014 international conference on advances in computing, communications and informatics (ICACCI). IEEE, India, pp 387–392 Faysal JA, Mostafa ST, Tamanna JS, Mumenin KM, Arifin MM, Awal MA, Shome A, Mostafa SS (2022) XGB-RF: a hybrid machine learning approach for IoT intrusion detection. Telecom 3(1):52–69 Li X, Chen J, Qin D, Wan W (2010) Research and realization based on hybrid encryption algorithm of improved AES and ECC. In: 2010 international conference on audio, language and image processing. IEEE, China, pp 396–400 Sri Vigna Hema V, Kesavan R (2019) ECC based secure sharing of healthcare data in the health cloud environment. Wirel Pers Commun 2(108):1021–1035 Vengala DV, Kavitha D, Kumar AP (2021) Three factor authentication system with modified ECC based secured data transfer: untrusted cloud environment. Complex Intell Syst Hafsa, Sghaier A, Malek J, Machhout M (2021) Image encryption method based on improved ECC and modified AES algorithm. Multimedia Tools Appl 13(80):19769–19801 Prabhakaran V, Kulandasamy A (2020) Integration of recurrent convolutional neural network and optimal encryption scheme for intrusion detection with secure data storage in the cloud. Comput Intell 1(37):344–370 The University of Chicago Mathematics reu 2020. https://math.uchicago.edu/~may/REU2020/. Accessed 27 Dec 2022
Chapter 20
Efficiency of Cellular Automata Filters for Noise Reduction in Digital Images Imran Qadir and V. Devendran
1 Introduction Digital image is a demonstration of real image as collection of pixels that can be stored and manipulated by a digital device in the form of basic blocks called pixels and each pixel from the 2D matrix records value for respective element at a given point that takes the discrete values as per the properties like the intensity called brightness or its color [1]. The need of processing digital images is to enhance appearance in order to make certain details more evident [2]. Different methods have been developed for different image features like brightness, sharpness, contrast, intensity, noise, etc. [2]. One defect that is noise may be added into the image while acquisition or transmission through a network. It can be acquired from sensor or circuitry of the capturing device. Noise has been differentiated as per their features; accordingly, methods for removal of noise have been developed [3]. Digital images are used in every field of life that may be due to the expansion of digital service which provided means to change numbers into images [4]. Digital images get degraded with noise at different image processing stages like image acquisition due to inappropriate sensor functions, or during transmission through the network, hence becomes necessity to remove unwanted element from the image details because that can be quite misleading to the user and may lead to wrong decision making [5]. Digital images are ubiquitous and have found use in almost every field of human life ranging from entertainment to medical field. Digital images have become essential in medical field as most of I. Qadir (B) Department of Computer Applications, Lovely Professional University, Jalandhar, Punjab, India e-mail: [email protected] Department of Computer Applications, Higher Education, UT JK, India V. Devendran School of Computer Science and Engineering, Lovely Professional University, Jalandhar, Punjab, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_20
249
250
I. Qadir and V. Devendran
the diagnoses are being done with the help of images; hence, early and accurate diagnosis of disease for saving human lives is very important. MRI, CT, USG, etc., have become the backbone of medical field. We need to make sure that the images used for medical diagnosis are of good quality and noise less. Image degradation may be from pixel breaking due to working of electronic circuitry; blurring is image bandwidth reduction portrayed due to relative motion between camera and source [6, 7]. Image restoration requires noise detection and then noise removal by applying noise specific filter which is highly required in most of the fields including medical field [8]. Noise removal is uphill task unless prior awareness of noise characterization [9]. The selection of denoising filter depends upon the behavior of noise in corrupted image [10], and the process of removing noise from the digital image is denoising [11, 12]. The study of noise model is highly desirable for noise reduction for obtaining accuracy of results [13, 14]. Binary, color, or gray scale may be the mostly used images types [15].
2 Cellular Automata CA was first coined by Stan Ulam and J V Neumann in nineteen forties when they were operating on Manhattan at well-known national laboratory Los Alamos [16, 17]. A set of cells having finite states with the transition rules along the neighborhood is called Automaton [18]. CA is treated as one of the best parallel computational tools in the recent times of nature and computing requirements [19–21], because of their local interface, design simplicity, and inherent parallelism, many hardware implementations of CA have been constructed. The CA happens to be natural tool for image processing because of their spatial topology, local relations of cells, and efficient parallel computational patterns. CA can be used to reduce various types of noise in digital images, and a simple rule can exhibit complex behavior [22]. The dark gray pixel is the central pixel under examination. The light gray pixels in Fig. 1 represent neighborhood in Von Neumann and in Fig. 2 represents neighborhood in Moore neighborhood. The state of cell under execution along with cells in adjacency (Von Neumann or Moore) is chosen at t = 0, and the cell state at t = 1 is chosen based on specific cellular automata rule.
3 Noise and Its Types Digital image becomes noisy during acquisition, transmission, coding, storage, and processing. The unwanted signal that is noise has ample affinity to demolish the image quality and comes in the form of pixel distortion or blurred parts [23]. The types of noise that affect digital images are usually additive, multiplicative, or impulsive as
20 Efficiency of Cellular Automata Filters for Noise Reduction in Digital …
rad = 1
rad = 2
251
rad = 3
Fig. 1 Von Neumann defined neighborhood
rad = 1
rad = 2
rad = 3
Fig. 2 Moore defined neighborhood
shown in Fig. 3. The impulsive may be static or dynamic and has tendency to change pixel value at random [24]. Mathematically, the additive noise may be represented as v(x, y) = u(x, y) + n(x, y).
(1)
Noise
AddiƟve
MulƟplicaƟve
Impulsive
StaƟc
Fig. 3 Noise nature in digital images
Dynamic
252 Original image a
I. Qadir and V. Devendran b
c
d
Fig. 4 Original image, (a) Gaussian noise, (b) salt and pepper noise, (c) poison noise, and (d) speckle noise
Original image a
b
c
d
Fig. 5 Original image, (a) Gaussian noise, (b) salt and pepper noise, (c) poison noise, and (d) speckle noise
The multiplicative noise can be mathematically represented as v(x, y) = u(x, y) ∗ n(x, y),
(2)
where u(x, y) is the original image, n(x, y) is the noise added or multiplied to the image to produce corrupted image v(x, y), and (x, y) is the pixel location. Figures 4 and 5 show the original images of brain tumor dataset and Lena image dataset, respectively. Figures 4a and 5a are the images corrupted by Gaussian noise, Figs. 4b and 5b show images corrupted by S&P noise, Figs. 4c and 5c show the image corrupted by Poisson noise, and Figs. 4d and 5d show the image corrupted by speckle noise.
4 Noise Reduction Techniques 4.1 Linear and Nonlinear Filters Linear filter determines the new pixel intensity for pixel under consideration by the linear combination of the neighboring pixels, while nonlinear filter finds it by nonlinear combination of neighboring pixels. Table 1 thoroughly demonstrates different linear and nonlinear filters. Kriging interpolation filter has been proposed by Firaz [25] to replace noisy pixels in the grayscale image. Digital images are subjected to distillation process for extraction of information and exhibited the use of CNN for
20 Efficiency of Cellular Automata Filters for Noise Reduction in Digital …
253
denoising [26]. The linear filtering techniques fail badly in denoising the corrupted image when noise is non-additive [27]. One of the well-known nonlinear filters for removing impulsive noise with better denoising capacity and better time complexity fails to keep the detail of the image at edges when the noise is above 50% [28]. The standard median filter works across the image pixels irrespective of whether pixel is degraded or not, hence results in information loss and proves very expensive when some important information of the image is lost due to the denoising filter [29]. The adaptive median filter may be utilized for differentiating between corrupted and true pixels and applying the filtering technique to distorted pixels only [30]. The WMF and CWMF exhibit better denoising performance as compared to the mean filter by allocating some weight to selected pixels of the centrally located pixel [31, 32]. All the above filters exhibit poor performance because of non-adaptability capacity to varying noise densities to the image, and they also perform poor with the rise in noise levels of the image. The requirements of developing filter for denoising the corrupted images have emerged while keeping the basic and important details of the images intact by employing new technologies. Our attempt in this study is to present some CA-based filters for noise filtration utilizing efficiently cellular automata-based rule space for showing better performance. The authors presented that Cascade Decision-Based Filtering (CDBF) algorithm performs better as compared to Standard Median Filter (SMF), Progressive Switch Median Filter (PSMF), Adaptive Median Filter (AMF) for removing S&P noise out of the digital image [33]. The authors proposed that the independent component analysis (ICA) is performing better as compared to wavelet transform, Weiner filter (WF), and bilateral filter when removing S&P noise from gray scale and RGB images while as wavelet transform shows good results as compared to ICA, WF, and bilateral filter when removing Brownian and Speckle noise in gray scale and RGB shown in Fig. 6a and b respectively [34]. The authors proposed that spectral subtraction method performs better as compared to Markov random field, wavelet-based anisotropic diffusion in denoising a digital image, and the method is performed by first acquiring the image from each RF coil separately within small acquisition time [35]. The authors have presented that spatially adaptive nonlocal means (SANLM) performs better as compared to nonlocal means, PCA, bilateral filters by means of good SSIM and PSNR that helps in clinical diagnosis of the brain. SANLM technique enhances the distinction of the image. SANLM works with spaciously erratic noise levels inside the M R image and local estimation are done by processing data with static or spaciously erratic noise field in absolutely automatic manner shown in Fig. 7a and b respectively [36]. The authors have presented that the application of filter for noise reduction depends upon the noise present in the image meaning that which filter to apply for which noise and linear and nonlinear filters are not sufficient for getting optimal results, hence require some hybrid technologies for efficient and effective utilization for image restoration from noise signal [10].
254
I. Qadir and V. Devendran
Table 1 Linear and nonlinear filters for image denoising as researched by various authors Author/s
Year
Technique/filter used
Evaluation metrics Data sets
Result
Remarks
Pattnaika et al. [33]
2012
Cascade decision-based filtering (CDBF) algorithm
Mean absolute error (MAE), RMS, PSNR, image enhancement factor (IEF) Set of three image datasets
CDBF performs better SMF, AMF, decision-based algorithm (DBA), modified decision-based algorithm (MDBA), progressive switch median filter (PSMF)
There is provision to find approximation algorithm for noise level above 70 dB
Experimentation for comparison with different methods is required
Saxena and 2014 Kourav [37]
Linear PSNR, SSIM smoothing, Lena image median filtering, data set wavelet transform
Median filter (MF) is better for S&P and wavelet transform for Gaussian
Deepa and Sumitra [34]
2015
WF, ICA, bilateral filter (BF), median filter (MF)
Universal quality index MR image brain dataset
ICA is The method performing better shows poor performance above 30 dB noise level
Vaishali et al. [35]
2015
Markov random field, wavelet-based anisotropic diffusion, SSD
PSNR Spectral Medical image subtraction dataset performs better
The fine details of the image are not preserved
Saladi and Amutha Prabha [36]
2017
PCA, NLM, BF principal component analysis nonlocal means (PCANLM), SANLM
PSNR, SSIM MR brain image dataset
Results have been shown for low noise level
Owotogbe et al. [10]
2019
Linear and PSNR, SSIM nonlinear filtering LENA image techniques dataset
SANLM performs better as compared to others
Denoising Some hybrid depends upon the techniques may type of noise be developed
4.2 Cellular Automata-Based Noise Reduction Filters CA has been utilized for noise reduction in medical images; numerous cellular automata-based filters have been proposed for impulsive noise reduction, and researchers have shown highly improved results as compared to what is achieved by linear and nonlinear filters for the purpose of noise suppression. Cellular automata
20 Efficiency of Cellular Automata Filters for Noise Reduction in Digital … Original image
a
255
b
Fig. 6 Original image a image with 5% noise and b denoised image by ICA
Original image
a
b
Fig. 7 Original image a image with noise and b denoised image by SANLM
can be utilized to remove different types of noisy signals from the digital’s images; Table 2 demonstrates various CA-based filters along with advantage and scope for improvement, for identifying the research gap. The authors proposed an efficient denoising algorithm using CA for the removal of impulsive noise. They evaluated minimum, maximum, and median value out of the neighbors of the pixel under investigation. If the value of cell under investigation is between minimum and maximum, then the value is left unchanged at next time state. If the value of median is between 0 and 255, the value is changed to median value; otherwise, the pixel under examination is replaced with the highest of local differences in the neighborhood [38]. The authors presented CA-based approach for the removal of S&P noise from grayscale images. They used three thresh hold values to determine a cell noisy or noiseless in 2D Moore neighborhood. The value for noisy cell is evaluated by mean of neighbor cell after excluding all the minimum and maximum value neighbor cells from the evaluation of the mean. The use of noiseless cells for evaluation of the new value at t + 1 state reduces the requirements of time and storage space. The method has been tested for noise levels of 10–90% [39]. The authors presented proficient method for high density noise reduction from digital images. In this paper, authors present the algorithm for reducing impulsive noise from the digital image by using Moore neighborhood with the specialty that the pixel under assessment is first evaluated for impulse, and then only examinations
256
I. Qadir and V. Devendran
Table 2 Cellular automata-based filters for image denoising as researched by various authors Author/s Year
Year
Qadir et al. [38]
Tourtounis et al. [39]
Technique/filter Evaluation used metrics Data set
Result
Remarks
2012 Cellular PSNR automata-based Lena image filter dataset
This performs better than WF, average filter (AF) and MF
The filter performance is poor at higher levels of noise
2018 Proposed filter based on cellular automata
PSNR, SSIM Different image data sets
This performs better than the state of the art methods
A variation of rule set can be experimented for better performance in terms of image restoration
PSNR, MSE Lena image dataset
Performs better than AF, MF, WMF, SF, and CA-based methods
The stopping condition for the number of iterations must be developed for efficiency
Qadir and 2018 Cellular Shoosha [40] automata primary and Majority rule
Bhardwaj et al. [41]
2019 Cellular PSNR, MSSI, automata-based SSIM filter Medical image dataset
The proposed method performs better than MF, WF, GF, and AF
Provision for improving the performance of filter is possible
Kumar et al. [42]
2020 MBC-CA
SSIM Different image datasets
The proposed method performs better as compared to AMF, SMF, Fuzzy-Based Decision Algorithm (FBDA)
Some criteria should be determined for defining / setting the threshold
Jeelani and Qadir [43]
2022 Cellular automata primary rules
PSNR, SSIM 35 different image data sets
The proposed method performs better while comparing with traditional and CA-based algorithms
Extended neighborhoods of Moore and Von Neumann with varying value of r can be tested for restoration of more noisy images (continued)
20 Efficiency of Cellular Automata Filters for Noise Reduction in Digital …
257
Table 2 (continued) Author/s Year
Year
Technique/filter Evaluation used metrics Data set
Result
Remarks
Pawar et al. [44]
2022 Deep Learning
Time complexity Mural image data sets
The PCA shows 99.25% accuracy against yolo v3, RCNN, CNN
Tested for single noise but require identification of multiple noises that may coexist
Angulo et al. [45]
2022 DK
SSIM, PSNR DK performs Different better compared image dataset to mean and median filters
The method works for salt and pepper noise and variation may be developed for other types of noise
are performed for its replacement otherwise it is left unchanged. The other specialty of the proposed methods is that it can be applied to gray scale, RGB, CYMK. The noisy cell under examination is replaced by the primary rule with r = 1 if that cell is not noisy otherwise the majority rule of extended Moore neighborhood of r = 2 is used. The proposed method is iterative in nature and require only one iteration for noise up to 30% level as per the simulation-based outputs of presented method otherwise the utilizes more iterations for getting optimal results [40]. The authors proposed a CA method for removing the speckle noise in ultrasonic images. They exhibited cellular automata-based approach that uses the Moore neighborhood for the calculations of determining the new value of the noisy pixels while excluding the minimum and maximum pixel value out of the neighbor cells. The results have been found better while comparing with traditional and CA-based algorithms in practice by using the evaluation matrices of PSNR, Mean Structural Similarity Index (MSSI), and SSIM. The proposed method has been tested for medical image data set [41]. The authors presented a CA-based approach for S&P noise reduction of grayscale digital image. The grayscale image is translated into set of binary images using the multithreshold binary conversion (MBC-CA) and after the application of CA, the images then generated are by using recombination algorithm. The proposed method exhibits better even at higher levels of noise. The results have been evaluated by using SSIM [42]. The authors presented an efficient approach for S&P noise reduction using CA. They presented five S&P noise filters based on variation of OTCA with the adaptive neighborhood. The OTCA excludes the cell under examination for calculation of new value for the noisy cell hence makes the filters simple and computationally efficient. The adaptive neighborhood helps in finding efficiency in noise filtration while using noise of varying densities. The first filter uses primary rules (1 to 12) for replacing the noisy pixel according to the local transition function. The second filter uses the same primary rule, but the pixels are processed from right to left and bottom to top. The third filter uses only first and fourth rules from the primary rule set. The fourth
258
I. Qadir and V. Devendran
filter uses 1–24 primary rules to replace the noise pixel under examination. The fifth filter uses 1–12 primary rules in rotational order as per the transition function [43]. Pawar et al. [44] proposed a method for detection of noise type in ancient images by using deep learning that allows multiple images processing in order to find objects with abilities of good decision taking without any intervention. CNN model is utilized to find noise types in an image. This research can help us apply different filters meant for various noises in the image. The authors presented modeling and Numerical Validation of a cellular automatbased algorithm for image enhancement in the grayscale image. They presented DK (dynamic algorithm) algorithm for evaluating the substitute for noisy pixel in the image. The algorithm uses Moore neighborhood for image restoration. The algorithm replaces noisy pixel by average of pixel intensities and based on neighborhood rules when 0 or 255 neighbors are less than five. Otherwise, if the neighborhood contains more noisy pixels, then the algorithm extends to 5 × 5 neighborhoods for obtaining better results. The average of the values excluding 0 or 255 replaces the noisy pixel in the next state [45].
5 Methodology for Output Image Analysis Performance of denoised digital image is evaluated by using well-known statistical parameters. The performance evaluator parameters are mean pixel intensity, average pixel intensity, standard deviation, MSE, RMSE, mean absolute error, SNR, PSNR, SSIM, and entropy [46]. SNR is defined as the ratio of signal to noise. MSE is simply average of square of error between two images. PSNR is the ratio between maximum possible powers of signal to power of distorted noise. SSID is a measure of similarity between two images. The image restoration is considered better with the higher value of PSNR.
6 Result and Discussion The traditional filter like AF, MF, AMF, WFM, etc., have the capacity to restore corrupted images when affected up to the maximum of 30% noise levels. The performance of these filters degrades drastically with the increase in level of noise. The images used in this study have been taken from kaggle website and one of Lena that are executed in MATLAB R2016a. The CA-based filters have shown promising results even up to the 85% noise level which apparently seems to be an uphill task based on the evaluation parameters of SSIM, PSNR, RMS, RMSE, etc. CA has the capacity to restore image when only 15% of the true image is available meaning that 85% is corrupted by noise even when the capabilities of CA have not been fully exhausted. The fundamental behind denoising is to first identify the noisy pixel and then finding the replacement for the pixel under consideration from immediate or
20 Efficiency of Cellular Automata Filters for Noise Reduction in Digital …
259
extended neighborhood of non-noisy pixels in the NM or NVN . The filters have taken due cognizance for taking care of vertical or horizontal noise propagation. The literature review has produced the fact that denoising is carried out for only type at a particular point of time [44] have proposed a method for identification of different noise types that may be useful in future for removing multiple noise types from the image by implementing different noise filters meant for different noise removals. The parallel computational capacities of the CA have shown better results by using comparison and assignment operators only hence providing better time complexities as compared to the conventional filters.
7 Conclusion Filters for noise reduction in medical image are essential for accurate diagnosis by the medical experts. CA-based algorithms exhibit better performance by demonstrating better values for PSNR, SSIM, RMSE, MSE, IEF, etc., parameters as compared to variety of conventional methods available in the literature with the varying noise densities of around 10–90% for image restoration/extraction without degrading the image fine details due to the efficiency and robustness of CA. There is a lot of scope in developing efficient CA-based filters with variations in CA-based rule space and use of different subsets of VN and Moore neighborhood at different values of r for image restoration from different types of noises present in the original digital image. The present work reveals that CA-based filters developed for noise reduction in digital images are specific to noise type and noise density, meaning which filters to apply for which noise. The efficiency of CA-based filters for noise reduction exhibits better performance in restoration of the digital images based on different performance parameters.
References 1. Baxes G (1994) Digital image processing: principles and applications. Wiley NJ, USA 2. Daniel M (2007) Optica Tradicional y Moderna; Colección “Ciencia para todos”; Fondo de Cultura. Económica: Mexico City, Mexico 3. Petrou M, Petrou C (2010) Image processing: the fundamentals. Wiley, USA 4. Thepade S, Das R, Ghosh S (2017) Decision fusion-based approach for content based image classification. Int J Intell Comput Cybern 10(3):310–331 5. Gonzalez RC, Woods RE (2008) Digital image processing, 3rd edn. Prentice Hall, Englewood Cliffs 6. Castleman Kenneth R (1979) Digital image processing. Prentice Hall, New Jersey 7. Lagendijk RL, Biemond J (1991) Iterative identification and restoration of images. Kulwer Academic, Boston 8. Maru M, Parikh MC (2017) Image restoration techniques: a survey. Int J Comput Appl 160:15– 19 9. Boyat AK, Joshi BK (2015) A review paper: noise models in digital image processing. arXiv: 1505.03489
260
I. Qadir and V. Devendran
10. Owotogbe JS, Ibiyemi TS, Adu BA (2019) A comprehensive review on various types of noise in image processing. Int J Sci Eng Res 10:388–393 11. Kahdum AI (2017) Image steganalysis using image quality metrics (structural contents metric). IBN AL Haitham J Pure Appl Sci 12. Hongqiao L (2009) A new image denoising method using wavelet transform. IEEE 1:111–114 13. Kaur S (2015) Noise types and various removal techniques. Int J Adv Res Electron Commun Eng (IJARECE) 4:226–230 14. Halse MM, Puranik SV (2018) A review paper: study of various types of noises in digital images. Int J Eng Trends Technol (IJETT) 15. Umbaugh SE (1998) Computer vision and image processing. Prentice Hall PTR, New Jersey 16. Marinescu D (2017) Nature-inspired algorithms and systems. Elsevier, Boston, MA, USA, pp 33–63 17. Gong Y (2017) A survey on the modeling and applications of cellular automata theory. IOP Conf Ser Mater Sci Eng 242:012106 18. Das D (2011) A survey on cellular automata and its applications, vol 269. Springer, Berlin/ Heidelberg, Germany 19. Kolnoochenko A, Menshutina N (2015) CUDA-optimized cellular automata for diffusion limited processes. Comput Aided Chem Eng 37:551–556 20. Mahata K, Sarkar A, Das R, Das S (2017) Fuzzy evaluated quantum cellular automata approach for watershed image analysis. In: Quantum inspired computational intelligence; Morgan Kaufmann, Boston, MA, USA, pp 259–284 21. Tourtounis D, Mitianoudis N, Sirakoulis GC. Salt-n-pepper noise filtering using cellular automata. CoRR abs/1708.05019. arXiv:1708.05019 22. Rosin PL (2010) Image processing using 3-state cellular automata. Comput Vis Image Underst 114(7):790–802. https://doi.org/10.1016/j.cviu.2010.02.005 23. Koli M, Balaji S (2013) Literature survey on impulse noise reduction. Signal Image Process 4(5):75–95 24. Li Y, Sun J, Luo H (2014) A neuro-fuzzy network based impulse noise filtering for gray scale images. Neurocomputing 127:190–199 25. Firaz AJ (2013) Kriging Interpolation filter to reduce high density salt and pepper noise. World Comput Sci Inf Technol J 3(1):8–14 26. Lendave V (2021) A guide to different types of noise and image denoising methods. Developers corner 27. Srinivasan KS, Ebenezer D (2007) A new fast and efficient decision-based algorithm for removal of high-density impulse noises. IEEE Signal Process Lett 14:189–192 28. Nodes TA, Gallagher NC (1984) The output distribution of median type filters. IEEE Trans Commun 32(5):532–541 29. Tukey J (1974) Nonlinear (nonsuperposable) methods for smoothing data. In: Electronic and aerospace systems conference, pp 673–681 30. Hwang H, Haddad RA (1995) Adaptive median filters: new algorithms and results. IEEE Trans Image Process 4(4):499–502 31. Brownrigg DRK (1984) The weighted median filter. Commun ACM 27(8):807–818. https:// doi.org/10.1145/358198.358222 32. Ko S-J, Lee Y (1991) Center weighted median filters and their applications to image enhancement. IEEE Trans Circuits Syst 38(9):984–993. https://doi.org/10.1109/31.83870 33. Pattnaika A, Agarwala S, Chanda S (2012) A new and efficient method for removal of high density salt and pepper noise through cascade decision based filtering algorithm. ICCCS-2012, vol 6, pp 108–117 34. Deepa B, Sumitra MG (2015) Comparative analysis of noise removal techniques in MRI brain images. In: IEEE International conference on computational intelligence and computing research (ICCIC). https://doi.org/10.1109/ICCIC.2015.7435737 35. Vaishali S, Kishan Rao K, Subba Rao GV (2015) A review on noise reduction methods for brain MRI images. Spaces 363–365
20 Efficiency of Cellular Automata Filters for Noise Reduction in Digital …
261
36. Saladi S, Amutha Prabha N (2017) Analysis of denoising filters on MRI brain images. Int J Imaging Syst Technol 27:201–208 37. Saxena C, Kourav D (2014) Noises and image denoising techniques: a brief survey. IJETAE 4(3):878–885 38. Qadir F, Peer MA, Khan KA (2012) An effective image noise filtering algorithm using cellular automata. In: 2012 International conference on computer communication and informatics. IEEE, pp 1–5 39. Tourtounis D, Mitianoudis N, Sirakoulis GC (2018) Salt-n-pepper noise filtering using cellular automata. J Cellular Automata 13(1–2) 40. Qadir F, Shoosha IQ (2018) Cellular automata-based efficient method for removal of high density noise from digital images. Int J Inf Technol 10:529–536 41. Bhardwaj A, Kaur S, Shukla AP, Shukla MK (2019) An enhanced cellular automata based filter for despeckling of ultrasound images. In: 2019 6th International conference on signal processing and integrated networks, SPIN 2019 42. Kumar P, Ansari MH, Sharma A (2020) MBC-CA: multithreshold binary conversion based salt-and-pepper noise removal using cellular automata. In: Communications in computer and information science, 1147 CCIS. https://doi.org/10.1007/978-981-15-4015-8_17 43. Jeelani Z, Qadir F (2022) Cellular automata-based approach for salt-and-pepper noise filtration. J King Saud Univ Comput Inf Sci 34(2):365–374. https://doi.org/10.1016/j.jksuci.2018.12.006 44. Pawar P, Ainapure B, Rashid M, Ahmad N, Alotaibi A, Alshamrani SS (2022) Deep learning approach for the detection of noise type in ancient images. Sustainability 14(18):11786 45. Angulo KV, Gil DG, Espitia HE (2022) Modeling and numerical validation for an algorithm based on cellular automata to reduce noise in digital images. Computers 11(3):46 46. Hussain M, Rathore S, Aksam I (2014) Robust brain MRI denoising and segmentation using enhanced non-local means algorithm. Int J Imaging Syst Technol 24(1):52–66
Chapter 21
Effective Mutants’ Classification for Mutation Testing of Smart Contracts R. Sujeetha and K. Akila
1 Introduction A developer is challenged to construct tests that can find mutant program, which can tell them apart from the original program, as a result of mutation testing. Strong evidence supports the notion that true faults are tied to mutants, and that the detection of mutants is highly associated with the detection of actual defects. Compared to commonly used code coverage metrics, such as statement and branch coverage, this relationship is higher. Mutation testing is costly because software programs have a high potential for producing mutations proposed by Samuel et al. [1]. The vast number of mutants and the fact that the majority of them are useless as test goals are recognized as key obstacles to adoption in recent studies on the utility of using mutants as test goals. Additionally, the existence of redundant and comparable mutants makes it difficult for a developer to gauge their progress toward mutation adequacy. Equivalent mutations are undetectable by any test since they function exactly like the original software. Since repetitive mutants are absolutely equivalent to other mutants, tests that look for the other mutants will always pick them up. To make mutation testing practicable for practitioners, only a small subset of the countless mutants created by the present mutation systems needs to be chosen. Since many mutants are useless, such a selection approach must prefer helpful mutations. The requirements for industrial mutation testing systems do not include obtaining or
R. Sujeetha (B) · K. Akila Department of Computer Science and Engineering, College of Engineering and Technology, SRM Institute of Science and Technology, Vadapalani Campus, No 1 Jawaharlal Nehru Road, Vadapalani, Chennai, Tamil Nadu, India e-mail: [email protected] K. Akila e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_21
263
264
R. Sujeetha and K. Akila
monitoring mutant adequacy; instead, they are concentrated on iteratively presenting one or a few mutations as test targets to gradually improve test quality over time. Smart contracts, which are computer program that mimic the operations outlined in physical/traditional contracts, were first developed by Nick Szabo in 1994 [2]. Szabo outlined the goals of a smart contract later. These goals were observability, verifiability, privity, and enforceability. Bitcoin and its scripting language demonstrated that blockchain is a good platform for the implementation of smart contracts thanks to its read-append nature. A smart contract may be easily viewed, verified, and self-enforced on a blockchain. Depending on the blockchain’s access level, privity can also be attained. Bitcoin can be viewed as the first instance of a smart contract on a blockchain, even if its script language only permits checking and validating transactions. Later in 2015, Buterin [3] developed Ethereum, a blockchain platform with a decentralized payment system and a language that is Turing complete, enabling the creation of a wide range of smart contracts on a blockchain. Higher transparency is possible with smart contracts without the involvement of reliable third parties. There are a variety of real-world applications for the smart contract. But because of security weaknesses, smart contracts are susceptible to assaults. These reasons include blockchain features, code difficulties, and other factors. Due to the fact that smart contracts use cryptocurrencies as their balances, intruders may exploit these security flaws, which could lead to significant losses. Andesta et al. [4] discuss that it is critical for smart contract developers to thoroughly test and validate their code before putting it on the blockchain. A mutation-based testing technique for smart contracts written in the solidity programming language was introduced [4]. The authors examined a thorough list of known problems in solidity smart contracts and created ten classes of mutation operators based on real-world flaws. The main contribution of this paper is to provide solution for mutant reduction by using machine learning techniques to use only the effective mutants into the mutation testing for smart contracts.
1.1 Organization of Paper The rest of this article is structured as follows. Section 2 provides the background information on smart contracts testing and mutation testing. Section 3 gives the proposed system and in Sect. 4 discusses the implementation details. The results obtained are discussed in Sect. 5. Finally, we summarize the entire work and discuss future work in Sect. 6, where we further elaborate on the linked work.
21 Effective Mutants’ Classification for Mutation Testing of Smart Contracts
265
2 Background 2.1 Smart Contract Features The Gas Network: On their equipment, Ethereum miners run smart contracts. The developers and users of smart contracts will distribute a specific quantity of Ethers to miners as payment for their contribution of processing power. Gas cost times gas price is used to calculate the amount of Ether that miners are paid. Gas prices are provided by the individuals who created the transaction and depend on the amount of processing power used. When a user sends a transaction to activate a gas-saving device, the maximum cost of gas is set by a limit (the “Gas Limit”) in the contract. The execution is halted with an exception referred to as an “out-of-gas error” if the gas cost exceeds the Gas Limit. Data Location: Data can be kept in storage, memory, or calldata in smart contracts. Data is kept in storage, a permanent memory space. EVM will give each storage variable a memory slot ID to recognize it. Compared to reading from the other two places, writing and reading storage variables are the most expensive process. Memory is the name of the second memory space. After the memory variables’ life cycle is complete, the data will be made available. Memory-based writing and reading are less expensive than storage. Only the attributes of external contract functions are eligible for calldata. Compared to memory or storage, reading data from the calldata is substantially less expensive.
2.2 Software Testing and Mutation Testing Software engineering uses quality control procedures to ensure the caliber of generated software, just as the majority of technical disciplines. Software testing is the most often practiced activity for software quality control. Finding flaws is one of software testing’s primary objectives in the software that has been produced and inspire faith in its accuracy. Among other things, software testing entails the design of tests (a test suite), running the prepared tests against the built software, and assessing the program behavior following the execution of the tests. The construction (design) of test suites is an important component of software testing. The quality of the test suites developed determines the quality of software testing. Many studies recommend employing ideas of coverage for testing, such as statement coverage. These coverage guidelines act as test acceptance criterion (TAC) suggested by Chekam et al. [5], are used to evaluate the efficacy of test suites, and provide guidance to testers when developing new tests focusing on uncovered code or test goals. The major step taken while using TACs for software testing is shown
266
R. Sujeetha and K. Akila
in Fig. 1. First, a test is constructed to put the selected software component to the test. The code is then subjected to the tests, after which the TAC coverage is determined. The process execution loops back to the test creation stage to create new tests and address the test goals that were not met if the desired degree of coverage is not reached. Until the predetermined coverage threshold is attained, this is repeated. When the predetermined threshold is achieved, the software is tested for functionality; if the tests find errors “not pass,” the errors are corrected, and the process is resumed up until the tests find no defect, the entire procedure thus far is repeated. The process is complete, and the software being evaluated is deemed tested whenever the testing does not uncover any flaws or when the allotted time budget has passed. We point out that this method is the norm for utilizing TACs to enhance software test suites. TAC test objectives, for example, are used to aid developers in understanding the code in recent times, as demonstrated by Petrovic et al. [6], who also explain other ways that TACs are employed. The traditional application of TACs is the main topic of this research. The majority of TACs are based on the program’s structure. Statement, branch, block, function, and pathways coverage are a few examples. However, Richard Lipton first put out Mutation, a variant TAC based on manufactured flaws, in 1971. Industrial studies demonstrate that software practitioners employ mutation to detect software flaws [6]. Software testing that uses mutation as a TAC is known as mutation testing. Mutation introduces fake flaws into a program that is being tested in an effort to mimic genuine errors [6]. These errors are straightforward grammatical changes to the program’s syntax that are generated from a specified set of guidelines known as mutation operators. The mutation operator, for instance, can change an arithmetic operation into other arithmetic operation (in a program, prod = x * y will become prod = x/y). The program under test, usually known as the original program, becomes
Fig. 1 Test acceptance criterion-based software testing
21 Effective Mutants’ Classification for Mutation Testing of Smart Contracts
267
a new program called mutant when a mutant operator is applied to a compatible code element. The test subjects for mutation TAC are mutants. The code of the software under test is subjected to a predetermined set of mutation operators, resulting in a variety of mutants that are then used to test the code. The task of producing mutants is carried out automatically by a mutant generation tool, which is software that accepts a program as input and employs a set of mutation operators to generate mutants of the input program. When numerous fundamental syntactic changes occur at the same time the mutation is referred to as a higher order mutation. In this dissertation [6], first-order mutants are the main topic, and first-order mutation and first-order mutant are denoted by the term’s mutant and mutation, respectively. When a test’s results are compared to those of the original program and a mutant program, we claim the test “kills the mutant.” If not, the mutant is deemed to have passed the test. Similar to this, it is possible to assert that the mutant infected the program execution under that test of the modified statement performance if the execution that followed a test resulted in the original program’s state differing from that of a mutation. There are several limitations that prevents the practical use of mutation testing. The large number of mutants generated for example 1000 mutants can be easily generated for a 50 to 70 lines of code. Executing tests for killing the mutants becomes tedious and expensive. Among this large number of mutants, which are not equivalent, only some of the mutants are useful in testing. Reduction of generated mutants is a challenging task in mutation testing. Among the survived mutants, equivalent mutants which are not killed by the test cases due to its functional equivalence affect the mutation score. Detection of equivalent mutants are challenging task in mutation testing. Since the beginning of mutation testing, scientists have developed numerous methods for reducing the number of mutants, including random mutant selection [7, 8] and selective mutation [9] for choosing useful mutations. But even with those mutant reduction strategies, there is still a significant amount of lost in fault revelation. This is partially a result of those methodologies’ incapacity to identify valuable mutants using the mutant quality indicator (also known as the “usefulness” or value metric) and in order to appropriately choose one of those methods. In order to retain an acceptable defect revelation while reducing the number of mutants, a technique is needed. This paper discusses on this.
2.3 Smart Contract Testing It is impossible to overstate how important it is to guarantee code dependency for smart contracts. According to a recent survey discussed by Zou et al. [10], deploying accurate and secure code is the main issue for business experts working on blockchain projects. At the moment, there are no commonly established methods or instruments for determining the suitability of smart contract test suites. Zou et al. [10] say test
268 Table 1 Mutation testing tools for smart contracts
R. Sujeetha and K. Akila
Tool
Source
ContractMut [11]
2020
Deviant [12]
2019
MuSC [13]
2019
RegularMutator [14]
2020
SuMo [15]
2021
UniversalMutator [16]
2020
Vertigo [17]
2019
engineers often compute code coverage, execute static analyses, and even undertake manual test reviews to improve test quality. Because smart contracts are marketed, the research community has lately begun to pay more attention to better test suite adequacy assessment methodologies such as mutation testing. Table 1 lists the tools for generating and evaluating solidity mutants along with the year of publication. In a later session, the overview of the tools is covered. Mutation testing is one of the finest ways to improve the quality of a test suite, although it is rarely employed in practice due to its high computing costs.
2.4 Prioritizing Mutants in Mutation Testing To make mutation testing practicable for practitioners, it is necessary to address the vast number of mutants produced by current mutation systems and the reality that the majority of these mutants are not advantageous as suggested by Samuel et al. [1]. Utilizing the notion that a mutant is useful as a test target in mutation testing only to the extent that it results in a useful test one that improves test completion. This work introduces a novel test completeness advancement probability meter for mutant utility (TCAP). This study also demonstrates that a mutant’s static program environment can predict its TCAP, and that choosing mutants with TCAP can effectively guide mutation testing. A lot of mutants are used in mutation testing, which has long been recognized as a hurdle to the method’s practical applicability [5]. Unfortunately, despite major community-wide efforts, the mutant reduction problem is still unresolved. Presented a novel approach to the problem—the fault revealing mutant selection—to address the issue. It is claimed that the mutants most prone to reveal real defects are the most valuable mutants and to hypothesize that selecting them can be aided by common machine learning techniques. It has been demonstrated that, in light of this, a number of fundamental static program properties capture the essential traits of the fault-showing mutants, disclosing considerably more flaws (6%–34%) than would be acquired from randomly selecting mutants.
21 Effective Mutants’ Classification for Mutation Testing of Smart Contracts
269
A mutation operator is a code modifying rule that produces a mutant (i.e., code variant) of a given program depending on the existence of a particular syntactic element [5]. For example, As in Table 2, if a program contains the phrase “(value, msg.sender),” this mutation operator deleting a parameter generates a mutant in which “(,msg.sender)” replaces the specified expression with any other expression that contains “(value, msg.sender)”. The mutation is the change in syntax brought about by a mutation operator; the mutation operator will produce a different mutant for each instance. A group of connected mutation operators is known as a mutation operator group. For example, the mutation operator belongs to the AOR mutation operator category, which is made up of all mutation operators that substitute an arithmetic operator. A mutant could perform exactly like the source program on all inputs. The mutation operators are categorized and sufficient mutants can be identified using [18] which has given methods for identifying it in C programs applicable for solidity also. Table 3 provides the related tools that are available for the mutation testing of smart contracts [19]. The available tools like Musc [13] and others consider random sampling method to reduce the number of mutants. Certain tools do not take into consideration of the compilation error causing mutants which may be effective mutant affecting the mutation score. Evaluate the efficacy of smart contract mutation testing on a large scale. We choose the most promising (smart contract specific) mutation operators, assess their killability, and highlight significant weaknesses that can be injected with the mutations [11]. Furthermore, we improve on previous mutation approaches by introducing a new death condition capable of detecting a difference in gas consumption, i.e., the financial value necessary to carry out transactions. Bahel et al. [20] provide various classification models such as Logistic Regression, Naive Bayes Classifier, K-Nearest Neighbors, decision tree, and random forest classifiers. The proposed works present a comparative study of various binary classifiers and have implemented various boosting algorithms, and the summarized arguments for optimal performance of the presented classification models helps in this research to carry out the proposed methodology. Using data mining and machine learning technologies, the study by Guillaume [21] proposes a method for lowering the cost of mutation testing by minimizing the number of mutant operators run. Current mutation testing research has revealed that Table 2 Mutated code Original source code
Mutated code
functionset(uint256 _value) public { value = _value; emitNewValueSet(value, msg.sender); }
functionset(uint256 _value) public { value = _value; emitNewValueSet(,msg.sender); }
270
R. Sujeetha and K. Akila
Table 3 Existing mutation testing tools for smart contracts S. No.
Tool
Description
Remarks
1
MuSc (2019)
MuSc offers a collection of mutation operators and the outcome aids in exposing a smart contract’s flaws [13]
Mutants that cause compilation errors are avoided while the test is running. There are several missing operators, which poses a significant risk to the contract
2
Sumo (2021)
Offers a large collection of mutation operators and uses AST for mutant generation [15]
If the live mutants can also reveal significant problems, more investigation is required. The tool was unable to analyze any of the proposed innovative operators
3
Deviant (2019)
The user-selected mutation operators are applied once the source code. Every mutant produced by the mutation operators goes through the same procedure again [12]
The branch and line coverage checks are passed by the majority of non-equivalent mutants, despite the fact that they are not all destroyed
4
Regular Mutator (2021)
The user-selected mutation operators are applied once the source code has been transformed to regular expression [14]
When calculating mutation score, the mutations that cause compilation errors are avoided
mutation test prioritization and reduction is achievable without materially altering the mutation score. The number of mutants generated is high, and all the mutants are been executed by the tools which limits the usage of mutation testing in industry. This motivates for this research to propose machine learning approach for the prediction of effective mutants which helps in reducing the mutants being executed in the testing process.
3 Problem Statement Mutation testing involves a huge number of mutants, which has long been recognized as a hurdle to the mutants that are capable of revealing defects, i.e., mutations that result in test cases that expose existing but unidentified problems, are what we are trying to choose from the large pool of mutants. Given that only 2% of the killable mutants (based on our data) are flawed, achieving this aim will be difficult. Unlike “traditional” mutant reduction techniques, which aim to reduce the number of mutants, fault revealing mutant selection has a different goal. Strategies for reducing
21 Effective Mutants’ Classification for Mutation Testing of Smart Contracts
271
Fig. 2 Fault revealing mutants
the number of mutants concentrate on choosing a small group of mutants that are typical of the larger group. This means, that every test suite that kills the mutants of the smaller set, also kills the mutants of a set. Figure 2 depicts our goal in comparison with the “conventional” mutant reduction challenge. The targeted output for the fault exposing mutant selection problem is represented by the blue (and smallest) rectangle in the illustration. The study made by Chekam et al. [5] has demonstrated that the lot of mutants is not relevant to the sought defects, even in the best situation. This means that testers waste time and effort by having to examine a huge mutant in count before they can locate those that are genuinely beneficial (the ones that indicate flaws). The findings in [5] show that 17% of the minimal mutants are fault revealing. This further suggests that the bulk of mutants are “irrelevant” to the desired flaws, even in the best case scenario. Determining which mutants should be used for mutation testing is essential to ensuring that the results are accurate most likely to expose flaws. This will enable the procedure to be applied with the greatest possible diligence.
4 Proposed Methodology The following session describes the implementation of the mutant classification to predict the effective mutants that are useful in mutation testing for smart contracts. The approach uses machine learning algorithms for classification. Machine learning classification algorithm like random forest, Ada boost, and gradient boost are considered here for the effective mutants classification. Figure 3 shows the proposed steps, and following this is given the brief description of each step carried out. Let’s discuss in detail.
272
R. Sujeetha and K. Akila
Fig. 3 Prediction of effective mutants
4.1 Mutant Classification This proposes a machine learning approach for addressing the problem stated. Machine learning algorithms can be used for prediction and as well as classification as provided by Naeem et al. [22]. The classifiers like random forest and linear models can be used for this mutant classification-based prioritization. Here, we consider machine learning models, including random forests and other classifiers that forecast mutant quality indicator from mutants’ static program context in order to assess the advantages of utilizing quality indicator to prioritize mutants.
4.2 Dataset To create a dataset, the mutants generated from the smart contracts with bugs available in smart-bugs repository are considered. The feature attributes are obtained from the mutation and program context. Chosen few of the smart contracts from the repository and generated mutants using the mutation testing tools that are available for smart contracts as listed in Table 1. Prior to any mutant execution, the technique for selecting mutants should be able to identify killable (non-equivalent) mutants that explicit flaws. Any mutation that results in test scenarios that can show the code’s defects is regarded as fault revealing that is being tested. We contend that these mutants are particular to a program and that they identified using a collection of static software attributes. In order to effectively imitate the code and mutant semantics, attributes that are both potent and general are required. The aim is to select the mutants that reveal faults and effectiveness of test cases similar to selection done by Samuel et al. [1]. It is required to provide a small capable number of mutants with the strategy of making them killable and fault identifying. The model built captures the required properties making the mutants valuable. From the flawed smart contract code, the utilities of mutants are computed, their features
21 Effective Mutants’ Classification for Mutation Testing of Smart Contracts
273
Table 4 Sample dataset TEC
TTC
TMO
TMS
CC
DNB
COM
CIM
INS
TL
MA
CA
Label
1
1
7
3
1
0
2
0
0
10
1
3
Killed
1
1
7
3
1
0
2
0
0
10
1
3
Killed
1
1
7
3
1
0
2
0
0
10
1
3
Killed
1
1
7
3
1
0
2
0
0
10
1
3
Killed
6
4
7
3
1
0
0
1
1
26
6
6
Killed
6
4
7
3
1
0
0
1
1
26
6
6
Killed
3
3
7
2
1
0
0
1
1
26
3
6
Killed
3
3
7
2
1
0
0
1
1
26
3
6
Killed
3
3
7
2
1
0
0
1
1
26
3
6
Killed
3
3
7
2
1
0
0
1
1
26
3
6
Killed
3
3
17
2
1
0
0
1
1
26
3
6
Killed
7
1
3
2
2
1
0
0
0
52
1
4
Killed
7
1
3
2
2
1
0
0
0
52
1
4
Killed
are retrieved, and the utilities are used as the predicted output of the training data. The utility of mutant for flaw discovery and kill ability forecast corresponds to its fault discovery and kill ability information, respectively. The sample dataset is provided in Table 4 which contains the features of the static solidity code and label.
4.3 Machine Learning-Based Evaluation of the Utility of Mutants In this model, the selection procedure is focused on developing a predictor for estimating the likelihood that a mutant would expose flaws. To that end, investigate the potential of a number of features that are intended to reflect particular code characteristics that may distinguish a valuable mutant from another. Let’s take a look at a mutant M G connected to a statement of code M S that the mutation was applied to. The complexity of the relevant mutated statement, the location of the mutated code in the AST or control-flow graph, the dependencies with other mutants, and the structure of the code block, where M S is found can all be used to define this mutant.
4.4 Features Selection for Characterizing Mutants According to PIE theory [19, 20], the test case kills the mutant only if it satisfies the following condition. (i) Execution—the test cases executes the mutated source code. (ii) Infection—after the test is executed, the state of the source code changes. (iii)
274
R. Sujeetha and K. Akila
Table 5 Features considered for prediction model S. No.
Name of the Description feature
Category
1
TEC
Total number of times the mutated code executed against Execution the test suite condition
2
TTC
Total number of tests that covers the mutated lines of code
Execution condition
3
TMS
Refers the mutated lines of code’s type example assignment, conditional, return and method invocation, etc.
Infection condition
4
TMO
Refers the mutation operator’s type which can be found from mutation testing tools
Infection condition
5
CC
Cyclomatic complexity of the methods been mutated
Propagation condition
Propagation—the test output of the mutant is different from that of original source code. The model forecasts that a mutant is killed by checking if it satisfies the above said conditions. The main aim here is to select the features based on how recently test cases or assertions can execute similar to feature selection proposed by Wang et al. [23]. Feature selection should show the connection between mutants and test cases. Based on the conditions features to be considered for the extraction from source code are tabulated in Table 5 and been categorized accordingly.
5 Results and Discussion 5.1 Feature Importance The feature importance indicated by the ML models is profitable in getting a better understanding the logic of the model not only to verify the correctness but also in improving the model focusing on effective features. The feature importance is calculated for random forest and other classifier algorithms as listed in Table 5. Table 5 provides a sample list of features; total of 15 features are considered. The following graph in Fig. 4 generated helps to visualize the features that are important in producing the effective mutants for testing using random forest algorithm. Features like Total Assertion, Inheritance, cyclomatic complexity, total lines of code, total mutated statements, total statements executed play a major role in producing the effective mutants which are depicted from Fig. 4.
21 Effective Mutants’ Classification for Mutation Testing of Smart Contracts
275
Fig. 4 Feature importance using random forest
5.2 Machine Learning Approach The mutation testing tool is used to generate mutants and by using feature extraction tools the features are collected. By using the machine learning framework, classification model is build. The mutation testing tool being developed is used for mutant’s generation. Have used more than 20 mutation operators which include solidity specific operators. Feature extraction scripts are used for collecting the features listed in Table 5. The infection categorized features are collected from mutation testing tool. AST are used to collect the static metric related features. Prediction function can be written as in (1). f (y) = Rwhole +
L
contribute(y, l),
(1)
l=1
where L is the number of features, Rwhole is the value at the root of the node, and contribute (y, l) is the contribution from the lth feature in the feature vector y. The contribution of each feature to the decision tree is not a single fixed number, but is decided by the remainder of the feature vector, which determines the decision path that traverses the tree and hence the guards/contributions that are passed along the way. The average of the predictions of its trees is given in (2). 1 f h (y), + H h=1 H
F(y) =
(2)
276
R. Sujeetha and K. Akila
Fig. 5 Accuracy
where H is the number of trees in the forest. From this, it is easy to see that for a forest, the prediction is simply the average of the bias terms plus the average contribution of each feature is given by Eq. (3). 1 + F(y) = Rh whole + H h=1 l=1 H
L
H 1 contributeh (y, l) . H h=1
(3)
Every prediction can be expressed as a simple sum of feature contributions, demonstrating how the features lead to a specific forecast. The bytecode is used for collecting the oracle category features. Using these collected features, the dataset is generated as csv file. The classification models like support vector machine, random forest, gradient boosting classifier, and Ada boost classifier that are used for the prediction of generated mutant will be killed or survived. The complete dataset is divided into train and test data in 80:20 ratio. The model is trained for the label prediction as mutant killed or survived. The classifiers used predicted the class label validated using test data set. Among the used classifiers, random forest gives more accurate results than other classifiers. Figure 5 provides the accuracy obtained for predicting effective mutant. The absolute prediction error for various algorithms is calculated and compared in Table 6.
5.3 Classification Report Classification report is another way to evaluate the classification model performance published by Samuel et al. [1]. It displays the precision, recall, f1, and support scores for the model. The following classification report shown in Fig. 6 obtained for the
21 Effective Mutants’ Classification for Mutation Testing of Smart Contracts
277
Table 6 Absolute prediction error Algorithm
Actual mutation score
Predicted mutation score
Error
RF classifier
0.453125
0.34375
0.109375
SV classifier
0.453125
0.421875
0.03125
NB
0.453125
0.515625
0.0625
NN
0.453125
0.54375
0.09375
LR
0.453125
0.35365
0.37556
100% 50% 0%
RF Classifier
Precision
Recall
Survived
Killed
Survived
Killed
Survived
SV Classifier Killed
Fig. 6 Classification report
F1-Score
NB NN LR Classifier
machine learning algorithms considered in this paper. The mutants classified as killed are considered as effective. From Fig. 6, we can interpret that random forest algorithm provides the effective mutants with high accuracy compared with other algorithms. The effective mutants generated can be used for the mutation analysis of test suite’s quality assessment for smart contracts. The evaluation of the effectiveness of mutants will be assessed with the mutation tool in further research work.
6 Conclusion Mutation testing for smart contracts involves a high number of mutants, which has long been recognized as a hurdle to the method’s practical applicability. To address this, presented a new viewpoint on the selection of effective mutants using machine learning algorithms, asserting that valuable mutants are those that are most likely to expose actual flaws. Have shown that several basic “static” program features effectively capture the key traits of the effective mutants, representing a starting step in using machine learning to address the defect revelation mutant selection problem for smart contracts. Further, the effective mutants generated using random forest classification will be used and evaluated with mutation testing tool being developed. Further, studies will expand upon and enhance our findings by developing more complex methodologies, enhancing and optimizing the feature set, utilizing various and potentially superior classifiers, and focusing on particular fault types. The upcoming effort contributes to the mutation analysis of smart contracts.
278
R. Sujeetha and K. Akila
References 1. Kaufman SJ, Featherman R, Alvin J, Kurtz B, Ammann P, Just R (2022) Prioritizing mutants to guide mutation testing. In: 44th international conference on software engineering (ICSE ’22). Pittsburgh, PA, USA. ACM, New York, NY, USA, p 12. https://doi.org/10.1145/3510003.351 0187 2. Nakamoto S (2008) Bitcoin: a peer-to-peer electronic cash system. Decentralized business review, p 21260 3. Buterin V (2013) Ethereum white paper. https://github.com/ethereum/wiki/wiki/White-Paper 4. Andesta E, Faghih F, Fooladgar M (2020) Testing smart contracts gets smarter. In: 2020 10th International conference on computer and knowledge engineering (ICCKE). IEEE, pp 405–412 5. Titcheu Chekam T, Papadakis M, Bissyandé TF et al (2020) Selecting fault revealing mutants. Empirical Softw Eng 25:434–487. https://doi.org/10.1007/s10664-019-09778-7 6. Petrovic G, Ivankovic M (2018) State of mutation testing at google. In: 40th IEEE/ACM international conference on software engineering: software engineering in practice track, ICSESEIP 2018, Gothenburg, Sweden 7. Zhang L, Hou S, Hu J, Xie T, Mei H (2010) Is operator-based mutant selection superior to random mutant selection? In: Proceedings of the 32nd ACM/IEEE international conference on software engineering—Volume 1, ICSE 2010. Cape Town, pp 435–444 8. Acree TA, Budd AT, Demillo R, Lipton JR, Sayward GF (1979) Mutation analysis. Technical Report GIT-ICS-79/08, p 92 9. Offutt AJ, Rothermel G, Zapf C (1993) An experimental evaluation of selective mutation. In: Proceedings of the 15th international conference on software engineering, ICSE ’93. IEEE Computer Society Press, Los Alamitos, pp 100–107 10. Zou W, Lo D, Kochhar PS, Le X-BD, Xia X, Feng Y, Chen Z, Xu B (2019) Smart contract development: challenges and opportunities. IEEE Trans Softw Eng 11. Hartel P, Schumi R (2020) Mutation testing of smart contracts at scale. In: Tests and proofs— 14th international conference, TAP@STAF, LNCS, vol 12165. Springer, pp 23–42 12. Chapman P, Xu D, Deng L, Xiong Y (2019) Deviant: a mutation testing tool for solidity smart contracts. In: IEEE International conference on blockchain (blockchain). IEEE, pp 319–324 13. Li Z, Wu H, Xu J, Wang X, Zhang L, Chen Z (2019) Musc: a tool for mutation testing of ethereum smart contract. In: 34th IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 1198–1201 14. Ivanova Y, Khritankov A (2020). RegularMutator: a mutation testing tool for solidity smart contracts. Proc Comput Sci 178:75–83 15. Barboni M, Morichetta A, Polini A (2021) SuMo: a mutation testing strategy for solidity smart contracts. In: 2021 IEEE/ACM international conference on automation of software test (AST). IEEE, pp 50–59 16. Groce A, Holmes J, Marinov D, Shi A, Zhang L (2018) An extensible, regular-expression-based tool for multi-language mutant generation. In: 2018 IEEE/ACM 40th international conference on software engineering: companion (ICSE-companion). Gothenburg, Sweden, pp 25–28 17. Honig JJ, Everts MH, Huisman M (2019) Practical mutation testing for smart contracts. In: Data privacy management, cryptocurrencies and blockchain technology. LNCS, vol 11737. Springer, pp 289–303 18. Namin AS, Andrews JH (2006) Finding sufficient mutation operators via variable reduction. In: Second workshop on mutation analysis (Mutation 2006). Raleigh, NC 19. Sujeetha R, DeivaPreetha CAS (2022) Analysis on mutation testing tools for smart contracts. Int J Eng Trends Technol 70(9):280–289 20. Bahel V, Pillai S, Malhotra M (2020) A comparative study on various binary classification algorithms and their improved variant for optimal performance. In: 2020 IEEE Region 10 symposium (TENSYMP), pp 495–498 21. Guillaume SJ (2015) Mutant selection using machine learning techniques. In: Machine learning: theory and applications, p 24
21 Effective Mutants’ Classification for Mutation Testing of Smart Contracts
279
22. Naeem MR, Lin T, Naeem H, Liu H (2019) A machine learning approach for classification of equivalent mutants. J Softw Evol Process. https://doi.org/10.1002/smr.2238 23. Wang S, Tang J, Liu H (2016) Feature selection. In: Sammut C, Webb G (eds) Encyclopedia of machine learning and data mining. Springer, Boston, MA 24. Mathur A (1991) Performance, effectiveness, and reliability issues in software testing. In: Proceedings of the fifteenth annual international computer software and applications conference. Tokyo, Japan, pp 604–605 25. Namin A, Xue X, Rosas O, Sharma P (2015) MuRanker: a mutant ranking tool. Softw Test Verification Reliab 25:5–7, 572–604
Chapter 22
Scheming of Silver Nickel Magnopsor for Magneto-Plasmonic (MP) Activity Shruti Taksali, Sonam Maurya, and Amruta Lipare
1 Introduction Magnopsor is optoelectromagnetic device designed especially for magnetoplasmonic (MP) applications. Their working principle is based on a response to incident light or magnetic field which is dictated by the plasmons (collective oscillation of electrons). This in turn aid in controlling light with subwavelength structures which is not possible with comparable wavelength RF antennas [1]. Moreover, magnopsors can enhance optical intensity by several orders of magnitude, in addition to controlling light as that of traditional antenna. This effect is possible because of a strong local field confinement near its metal surface, which in turn boost the extremely small nonlinear optical response of nanomaterials upto a level achievable with optical fibres and other macroscopic nonlinear crystals. These are not only guided by light but also by external magnetic fields, which in turn supports in controlling and manipulating light at nanosale. In some situations non-optical ferromagnetic nano devices may operate as magneto-plasmonic (MP) magnopsors. Nevertheless, this research field is immense and expansive due to the requirement of development of cutting-edge optical nanomaterial with earlier mysterious functionality, superior performance, tiny imprint and broad application range as correlated with the prevailing nanostructures [2]. These are modernistic nanodevices that hold the promise to fulfil the above expectations. Contrary to the stereotyped (non-magnetic) magnopsors, the fundamental materials of MP are ferromagnetic materials such as cobalt, nickel, iron and their alloys. These can be developed from a non-magnetic metal combined with a magnetic material, which can be either insulating or conducting. External magnetic fields are required in order to enable the magneto-optical activity [3]. They may also S. Taksali (B) · S. Maurya · A. Lipare Indian Institute of Information Technology Pune, Pune, Maharashtra, India e-mail: [email protected] A. Lipare e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_22
281
282
S. Taksali et al.
foothold plasmon excitations-magneto-plasmon resonances in a similar way to their non-magnetic counterparts. Eventually, the enhancement of magneto-optical effects is due to the embellished strength of local electromagnetic field affiliated with the MP resonance. Therefore, it promotes the development of miniaturised and integrated active magneto-optical devices, which can be the promising candidate for the various segments such as telecommunications, bio-sensing and imaging [4–7]. In general, Magneto-optical effects are distinguished in accordance to the relative e orientations of light wave vector k and magnetic field H. In other words, light can travel in either of the field direction, i.e. parallel or perpendicular orientations. When a polarisation state is changed then the popularly known MP effects; Faraday and Kerr effect comes into bulletin. The former is detected when the light is transmitted through a magnetised structure whereas the latter is noticed when the light is reflected from it. This diversification emerges from the magnetic field influence on the dielectric permittivity tensor of the medium which becomes non-symmetric resulting in either a reflection change or polarisation rotation [8, 9].
2 Design Logic of the Device For modelling and designing of MP nickel-graphene magnopsor CST microwave tool is used as simulation software. Initially glass material with the dimensions 200 × 200 × 100 nm is used as a basic substrate. Then, silver and nickel (ferromagnetic material) ring with the outer radius of 80 nm and inner radius of 60 nm is selected. For the MP effects, the graphene cone is placed above the sphere with the bottom radius as same to that of a sphere. This particular geometrical shape corresponds to a typical hut structure which could boost the non-reciprocal activity in a magnificent manner. The plasmon response of nickel is tremendously affected by the broad interband transition backgrounds in the spectral range for nanoplasmonics. In routine non-magnetic nanoplasmonics, these features are treated as parasitic as they increases light absorption loss. The interaction between interband transitions and plasmons in nickel can be considered as two coupled harmonic oscillators. The combination of silver, nickel and graphene is selected so as to decrease the absorption losses commonly found in MP materials. The background boundary condition is selected twice to the radius of the silver cone. For the faraday and kerr effects, plane wave is selected as the incident source along with a waveguide port. Incident wavelength is selected as 200 nm to 900 nm because the spectrally localised interband transition of nickel is in this range. Consequently, the frequency range reaches in the zone of optical field which is in the range of terahertz. Hence, the magnopsor is modelled while considering the nanoplasmonics and nanomagnetism for non-reciprocal activity.
22 Scheming of Silver Nickel Magnopsor for Magneto-Plasmonic …
283
3 Results and Discussion For the MP effects, external magnetic field is also applied and the designed structure is shown in Fig. 1. The yellow ring corresponds to silver material, green relates to nickel ring and above all is a graphene cone. The structure is designed so as to have the maximised MP effects. Initially to proceed further, the material combination is verified by the metal loss per material as illustrated in Figs. 2, 3 and 4. It is clear from the graphs that the silver has high loss (7.5 nW) followed by nickel (0.8 nW). Figure 4 shows the overall metal loss of the structure which will be drastically reduced by the use of a graphene cone on the top. Next, the power stimulated and accepted is shown in the Figs. 5 and 6, respectively. This indicates some of the power is lost as proved from the above statements due to metal loss of the structure. Figures 7 and 8 corresponds to a directivity pattern in 3D and polar plot which is a normalised farfield, i.e. distance normalised to nanometres. It indicates the directivity of 141.3 dB which is a good indication of a perfect nanodevice. Figure 9 demonstrates the Cartesian plot of the gain for the designed magnopsor which comes out to be −6.16
Fig. 1 The 3D structure of the design
Fig. 2 Metal loss in silver in accordance with the frequency
284
S. Taksali et al.
Fig. 3 Metal loss in nickel in accordance with the frequency
Fig. 4 Overall Metal loss in accordance with the frequency
dB. But at the nanoscale the gain of the device does not specifies its working operation as the conventional theories do not applies here. For the magneto-plasmonic (MP) [10] effects the electric field and magnetic field pattern are illustrated in the Figs. 10 and 11, respectively. The electric field pattern depicts the maximum electric field intensity of 159.1 dBV/m. This is enough to create optoelectronic effects for the device. The magnetic field pattern from Fig. 11 interprets the maximum magnetic field intensity of 107.6 dBA/m, which in turn in conjunction with the electric field creates the tremendous magneto-plasmonic effects. All the farfield metres distances are normalised to nanometres for the better results at the nanoscale. According to traditional antenna theory, VSWR should be in the range of 1 and 2 with smallest value at the resonance point whereas S11 must have the value less than −10 or equal to it so that maximum energy can be transmitted. Figure 12 indicates the value of S11 as −9.4 for all the frequency range except the initial ones. On the other hand, Fig. 13, clarifies the value of VSWR as 2.01 for all the frequencies except for the starting range. All the results exemplifies that the material combination of silver, nickel and graphene in form of a hut is a superlative magnopsor for the magneto-plasmonic effects.
22 Scheming of Silver Nickel Magnopsor for Magneto-Plasmonic …
Fig. 5 Power stimulated to the magnopsor during the simulation cycle
Fig. 6 Power accepted by the magnopsor during the simulation cycle
Fig. 7 Normalised 3D directivity of the magnopsor
285
286
S. Taksali et al.
Fig. 8 Polar plot of Normalised 3D directivity of the magnopsor
Fig. 9 The Cartesian plot of gain for the designed magnopsor
4 Conclusion The paper particularly emphasised on the magneto-plasmonics effects by designing the state of the art magnopsor. This is a multipurpose device which can finds its extensive use in applications such as on-chip non-reciprocal photonics devices, ultrasensitive bio sensors, magneto-plasmonic rulers, one way waveguides, all optical data storage and processing at nanoscale. As the results verified the enhanced electric and magnetic which in together provided the magnified MP effects, the material combination of silver, nickel and graphene proved to be an optimum choice for it. Also, the design shape holds the promise in achieving the targets expected. Eventually,
22 Scheming of Silver Nickel Magnopsor for Magneto-Plasmonic …
287
Fig. 10 The 2D electric field pattern for the magneto-plasmonic effects
Fig. 11 The 2D magnetic field pattern for the magneto-plasmonic effects
it promotes the development of integrated and miniaturised active magneto-optical devices, which can be the revolutionary candidate for the diversified segments such as bio-sensing telecommunications, and imaging.
288
S. Taksali et al.
Fig. 12 The S11 plot for magnopsor
Fig. 13 The VSWR for the magnopsor
5 Competing Interests The authors declare no competing interests Acknowledgements The authors would like to thank Director IIIT Pune and entire faculty team for the support and guidance in the entire project.
References 1. De la Chapelle ML, Pucci A (2013) Nanoantenna: plasmon-enhanced spectroscopies for biotechnological applications. Pan Stanford, Singapore 2. Temnov V (2012) Ultrafast acousto-magneto-plasmonics. Nat Photonics 1(6) 3. Maksymov I (2015) Magneto-plasmonics and resonant interaction of light with dynamic magnetisation in metallic and all-magneto-dielectric nanostructures. Nanomaterials 2(5) 4. Inoue M, Levy M, Baryshev AV (2013) Magnetophotonics: from theory to applications. Springer, Berlin, Germany
22 Scheming of Silver Nickel Magnopsor for Magneto-Plasmonic …
289
5. Armelles G, Dmitriev A (2014) Focus on magnetoplasmonics. New J Phys 16:045012 6. Maksymov IS (2015) Magneto-plasmonics and resonant interaction of light with dynamic magnetisation in metallic and all-magneto-dielectric nanostructures. Nanomaterials 5:577–613 7. Lukaszew RA (2016) Handbook of nanomagnetism: applications and tools. Taylor and Francis Group, Boca Raton 8. Pirzadeh Z, Pakizeh T (2014) Plasmon-interband coupling in nickel nanoantennas. ACS Photonics 1(3) 9. Bennett JM, Bennett HE (1978) Polarization. In: Driscoll WG, Vaughan W (eds) Handbook of optics, Chapter 10. McGraw-Hill, New York 10. Zvezdin AK, Kotov VA (1997) Modern magneto-optics and magneto-optical materials. IOP Publishing, Bristol
Chapter 23
Heart Stroke Prediction Using Different Machine Learning Algorithms Tarun Madduri, Vimala Kumari Jonnalagadda, Jaswitha Sai Ayinapuru, Nivas Kodali, and Vamsi Mohan Prattipati
1 Introduction Human hearts are of the body’s most important organs. Essentially, it regulates the flow of blood throughout the human bodies. Any abnormality inside the heart can lead to discomfort in other regions of the body. Cardiovascular disease can be defined as any disturbance in the normal functioning of the heart. Nowadays, CVDs are one of the main causes which leads to death. Heart disease is caused by an unhealthy lifestyle, smoking, alcohol, and a high fat intake, which will lead to hypertension. Annually, more than 18 million people worldwide die on account of heart disease, based on the World Health Organization [1]. Out of all CVDs, the heart stroke and the heart attack diseases cause the major damage to the people, according to WHO over 75% deaths in the 18 million are due to these two diseases. Among these two, the heart stroke has been considered as the most dangerous disease because heart stroke is directly connected to the brain [1]. As part of the central nervous system, the brain is the organ that controls vision, memory, touch, thought, emotion, breathing, motor skills, hunger, and all other functions that govern our body. This shows how important is the brain and its function to us. Moreover, it makes the heart stroke dangerous [2]. The blood flow to different areas of the brain would be significantly reduced during a stroke, which stops the functioning of cells and make them not receive the oxygen and other, oxygen they need or perish. A stroke is an emergency that needs to be cured right away, to halt additional harm to the damaged parts of the brain [3]. Heart function is impacted by heart disease. The World Health Organization conducted a survey and came to the conclusion that 10 million people had heart T. Madduri (B) · V. K. Jonnalagadda · J. S. Ayinapuru · N. Kodali · V. M. Prattipati Electrical and Electronics Engineering, V. R. Siddhartha Engineering College Vijayawada, Vijayawada, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_23
291
292
T. Madduri et al.
disease and had died as a result. Early disease prediction after a person is affected is the issue that currently facing the healthcare sector. The size of the medical history records and data makes them potentially incomplete and inconsistent in the real world. In earlier days, it may not have been possible to treat every patient at the starting stage of the disease and predict it correctly [4]. Haemorrhagic and ischemic strokes are of two types. A faint blood vessel shatters, gushes into the brain in a haemorrhagic type of stroke, whereas the flow is jammed by clumps in an ischemic type of stroke. A healthy and balanced lifestyle can reduce the risk of stroke by eliminating bad habit choices like taking alcohol and smoking cigarette, maintaining normal BMI and blood glucose levels, as well as kidney and heart function. Predicting a stroke is crucial for prompt treatment to prevent fatalities or long-term damage. In order to predict stroke, this study took into account BMI, average blood glucose levels, hypertension, etc. Additionally, machine learning can be extremely useful in the proposed system’s decision-making processes [5]. Many academics have employed machine learning (ML) algorithms to predict the strokes in earlier days. ML classifier and mining classifier were used by Govindarajan et al. [6] to categorize 507 people’s stroke disorders. They examined several artificial neural networks (ANNs) are among the ML techniques for training purposes and discovered that the proposed algorithm can be delivered highest value over 94%. Research on determining the identification of an ischemic type of stroke was presented by Cheng et al. [7]. They used two ANN models, around 80 ischemic type of stroke patient information data, and showed value of 95 and 79% in their study. To ascertain the efficacy of an automated initial ischemic stroke detection, Chin et al. [8] developed a Convolutional Neural Network (CNN), a technique for automation main ischemic stroke, with a view to developing and running tests authors collected 256 pictures using the CNN model. In order to enlarge the overall impression for their system’s output preparation, they used the data lengthening technique. Finally, the authors reported that the CNN method could achieve an accuracy rate of 90%. Two real-time healthcare data sets, chronic kidney disease (CKD) and life expectancy (LE), are contrasted in this paper. Data imputation techniques are explored by analysing missing values [9]. Congenital heart disease health records cannot be segmented correctly with several methods proposed in the research. It was 97% accurate, 98.13% precise, and 98.36% F-score. It appears that CNN-based strategy can distinguish between healthy and diseased hearts [10]. Motivated by the potential outcomes that have been achieved by the abovementioned works, this paper aims to provide an ensemble solution based on ML algorithms to forecast the likelihood of developing CVDs in future. The investigation has been started by putting different ML algorithms into practice, like Nave-Bayes, Decision Tree (DT), Random Forest (RF), and K-nearest neighbours (KNN). We then built a model ensemble by mixing all of the above models, which produced the finest outcomes, achieving over 95% recall and over 90% precision on test data. Additionally, a thorough analysis on exploratory data has been carried out and discovered the relation between various factors and CVDs.
23 Heart Stroke Prediction Using Different Machine Learning Algorithms
293
2 Methodology 2.1 Proposed Method for Prediction Figure 1 illustrates the prediction using machine learning algorithms, where the data set is given to the different algorithms. After pre-processing, the model is trained. Whenever the data is taken from the patient, this model compares the data with trained model and gives the prediction weather the patient has risk of stroke or not. The data set is taken from Kaggle website [11]. The straight forward visual depiction of the data set, from the above set workplace setting and predictive analytics development. The machine learning phase starts with data pre-processing, then feature engineering relying entropy on DT, classification of modelling performance, and relatively accurate outcomes. For other attribute combinations, the feature selection and modelling procedure are repeated. ML approach and performance are being tracked at each iteration. The goal is to develop a system that can predict cardiac illness using dependable machine learning approaches, such as the RF, DT, KNN, and naïve-bayes techniques. As an input, a.CSV file has been provided. Following a successful operation, the outcome is predicted, and a warning is given to the patient as to whether a stroke is likely to occur or not.
Fig. 1 Simple steps for prediction
294 Table 1 Parameter description
T. Madduri et al.
Parameter
Description
Id
Patient’s unique id
Sex
Gender (0 = female; 1 = male)
Age
In years
HT
Hypertension, 0 = not having, 1 = having
HD
Any CVD (0 = not having, 1 = having)
Married or not
0 = not married, 1 = married
WT
Type of work of patient (0 = private, 1 = self-employed, 2 = others)
Residence type
0 = urban, 1 = rural
Avg GL
Blood’s average level of glucose
BMI
Body mass index
2.2 Input Data 304 patient information is determined according to the various health risks, which include stroke sickness incidence. The information that was gathered from the data set is taken from Kaggle website [11].
2.3 Data Pre-processing Cardiovascular disease pre-processing is done after acquiring a large number of records. The data set includes 304 patient data that will be utilized in pre-processing. For the attributes of the provided data set, the sub-categorization parameter and classification methods are described. The multi-class variable is used to assess whether or not there is a cardiac condition. If the patient suffers heart disease, the value is setup to one, otherwise setup to zero, signifying that the patient does not have heart disease. Pre-processed data is achieved by converting health records into characteristic values. The findings of data pre-processing of 304 patient data showed that 135 have a value of one which indicates the presence of heart disease, while the other 162 have a value of zero, indicating that heart disease is not present [5–11]. Table 1 provides the description of each parameter [11].
2.4 Split Data Splitting a data set involves dividing it into training and testing categories data. In this study, the split approach is employed for training and assessment.
23 Heart Stroke Prediction Using Different Machine Learning Algorithms
295
2.5 Train Data Utilizing the information of users to train a ML concept is known as training data in ML. Analysing or processing training data set learning requires some human input. Depending on the machine learning algorithms used and the kind of issue they’re supposed to solve can vary how much participation there is from people [12].
2.6 Test Data An unknown data must be needed to evaluate the machine learning algorithm after it has been built (using the training data given). Further, this data can be used. It is known as testing information and is used to evaluate the efficiency and advancement of algorithm training and to adjust or upgrade them for better results [12].
2.7 Algorithms The study of statistical models and algorithms that computers use to perform a specific action without even being asked to program ML stands for machine learning. Numerous commonly used apps make use of learning algorithms. An algorithm for learning that has perfected the art of ranking is one of the reasons why Internet search engines like Yahoo and Google function so well every time they are used to conduct searches on the Internet web pages. These algorithms are utilized for a wide range of activities, including predictive analytics, data mining, and image processing. The fundamental advantage of ML is that an algorithm may complete its work autonomously when it understands how to apply data [13]. Some of the efficient and prominent ML algorithms are presented here. Naïve Bayes (NB). Naïve-bayes is a straight forward technique mainly based on the Bayes theorem. It implies strong (Naive) attribute independence. It is a formula that is used to determine probabilities. There is no relationship between the predictors, and they are not correlated with one another. Each feature individually increases the likelihood of doing so. It employs the Nave Bayes template but not Bayesian techniques. Naive Bayes classification techniques are employed in a variety of challenging real-world circumstances: F(A/Q) is the posterior probability the likelihood, or probability of predictor, is represented by the terms F(A), the class prior probability, K(Y ), and F(Q/A). It is a straight forward, user-friendly, and effective classifier even for a complicated, nonlinear data. However, because it is dependent on presumptions and conditional class autonomy, there will be a decrease in accuracy [14]. With the ten most important predictors, Nave-bayes attained an accuracy of 85.97%. While using all 14 attributes, an accuracy of 85.97% was obtained.
296
T. Madduri et al.
A F Q
=
F
Q
∗ F(A) F(Q)
A
(1)
Decision Tree (DT). A classification algorithm known as a decision tree uses both numerical and categorical information. DT is used to build evocative of trees. A simple and popular approach for working with medical data sets is a decision tree. Building and analysing data displayed in a tree-like graph is straightforward. Determination Tree Model analyses data using three nodes. A root node is a significant node that all other nodes depend on it. Handles a wide range of properties as an interior node. The leaf node displays each test’s results. The algorithm separates the information into different equivalent sets based on the most primary determinants. Each attribute’s entropy is calculated, then the data is split among the predictor variables having both the greatest and greatest entropies: The result is straightforward to read and comprehend [15]. This algorithm analyses the set of data in a tree-like graph, which makes it more precise than other algorithms. But the information might be over classified and there might only be one attribute is checked for decision-making at a time. The accuracy obtained is 95.57%. Random Forest (RF). The supervised learning approach includes the well-known ML algorithm Random Forest. It can be applied to machine learning problems including analysis and categorization. It is founded on the notion of ensemble techniques, a way for merging various classifiers to deal with complicated issues and improve model performance [16]. To improve the predictive accuracy of the data set, Random Forest means several DTs on services for low of the input data. Higher efficiency and over fitting are prevented by the greater number of trees located in given forest. Here the accuracy is 97.69%. KNN. An algorithm for K-nearest neighbour is a supervised classification technique. It classifies objects dependent on nearest neighbour. It is an instance-based method to learning. When determining how far away a property stands from its neighbours, the Euclidean distance is utilized. It makes use of a group of designated points and uses them to denote another point. KNN can be applied to data to complete missing values, after data are clustered based on similarities. Once the missing values are filled, different forecasting methodologies are used with the given data set. By combining these algorithms in different ways, it is feasible to improve accuracy. It is easy to use the KNN algorithm without generating a model or trying to make other presumptions [17]. Accuracy is 90.54%.
3 Result Analysis This section presents the outcomes of using Random Forest, Decision Tree, and KNN. The metrics used to carry out performance analysis of the algorithm are accuracy score, recall, precision, and F-measure.
23 Heart Stroke Prediction Using Different Machine Learning Algorithms
297
The confusion matrix is used to derive the performance indicators previously described. The model’s performance is described by the confusion matrix. Table 2 displays the confusion matrix generated by the model that was suggested for various methods. The accuracy rating for the categorization methods of Random Forest, Decision Tree, and KNN is given in Table 3. Short captions are centred. It is necessary to know the algorithm’s accuracy, recall, precision, and F-measure characteristics in order to assess their performance. The confusion matrix will provide those parameters. How effectively a classification algorithm performs is given in a table called a confusion matrix. A confusion matrix displays and summarizes the results of a classification model [18, 19]. • True Positive (TP): Values that the forecast correctly identified as positive. • False Positive (FP): Projected values predicted an incorrect real positive. Actual values will exceed projected values in the negative direction. • False Negative (FN): Positive numbers with an expected negative value are FNs. • True Negative (TN): Values that the forecast correctly identified as negative. A confusion matrix aids in visualizing the outcomes of a classification issue by providing a table layout of a various outcomes of a findings and predictions. It generates a table containing all of the predicted and actual values from the classifier. Recall = (T P)/(T P + F N ) Precision = (T P)/(T P + F P) F − Measure = P ∗ R ∗ 2 /(P + R), where R is recall and P is precision Table 2 Confusion matrix values using different algorithms Algorithm
True positive
False positive
False negative
KNN
835
136
Random forest
936
33
1
505
Decision tree
952
41
26
495
Naïve-bayes
890
99
30
542
3
True negative 496
Table 3 Analysis of different ML algorithms Algorithm
Precision
Recall
F-measure
Accuracy
KNN
0.85
0.99
0.91
90.54
RF
0.96
0.99
0.97
97.69
DT
0.95
0.97
0.95
95.57
Naïve-Bayes
0.89
0.96
0.92
85.97
298
T. Madduri et al.
Precision chart 1 0.95 0.9 0.85 0.8 0.75 KNN
Random Forest
Decision Tree
Naive Bayes
Precision
Fig. 2 Analysis of precision
3.1 Precision Precision or the calibre of a successful values predicted by the model, and it is the one which is measure of the model performance. Precision is the total positive forecasts divided by the percentage of genuine positives [19, 20]. From Fig. 2, one can observe that Random Forest has the better precision value, while KNN has the least precision value compared to all algorithms used in above model.
3.2 Recall A recall has been calculated as the percentage of correctly interpreted positive samples identified as positive to all, which are certain. Recall measures the model’s ability to recognize positive samples. More positive samples that are detected have higher the recall [19, 20]. Here all the algorithms have better recall values, moreover the best recall are given by the KNN and Random Forest algorithms, as shown in Fig. 3.
Recall chart 1 0.98 0.96 0.94 KNN
Random Forest Recall
Fig. 3 Analysis of recall
Decision Tree
Naive Bayes
23 Heart Stroke Prediction Using Different Machine Learning Algorithms
299
F-measure Chart 1 0.95 0.9 0.85 KNN
Random Forest
Decision Tree
Naïve
F-measure
Fig. 4 Analysis of F-measure chart
3.3 F-Measure The average of both recall and accuracy is used to calculate the F-measure, giving each variable the same weight. This entails enabling model comparison and performance description, as well as accounting for recall as well as precision in a single score. F–Measure = (P*R*2)/(P + R), where R is recall and P is precision. Figure 4 illustrates the F-measure of all algorithms. F-measure compromises the precision value and the recall value; here random forest has highest F-measure chart and KNN got the least among all algorithms.
3.4 Accuracy According to the input, or training data, a machine learning model’s accuracy is the quantifier used to determine which model is more adopt at spotting trends and connections between different variables within a data set [19–21]. Figure 5 shows the accuracy for the all algorithms.
Accuracy Analysis 100 95 90 85 80 Accuracy KNN
Random Forest
Fig. 5 Accuracy analysis of algorithms
Decision Tree
Naive Bayes
300
T. Madduri et al.
According to the above findings and analysis, the random forest algorithm produced the best results, while the naïve-bayes algorithm produced the worst results. Moreover, when it comes to application, the best result is displayed on the user app.
4 Conclusion Given the increase in deaths from heart attacks, it has been become critical to build a system that can anticipate heart attacks effectively and reliably. The study’s goal was to identify the best machine learning (ML) system for heart attack detection. This study examines the accuracy ratings of the different models using the Kaggle data set for forecasting cardiac stroke. According to the findings of this study for predicting heart strokes, one of the best algorithms is Random Forest, with an accuracy score 97.69%. The work can be enhanced in future by developing a web application based on these four algorithms and using a larger data set than that used in this analysis. This would help to produce better results and help health professionals successfully and efficiently forecast cardiac disease.
References 1. Agrawal H, Chandiwala J, Agrawal S, Goyal Y (2021) Heart failure prediction using machine learning with exploratory data analysis. In: 2021 International conference on intelligent technologies (CONIT), IEEE, pp 1–6 2. Lassen NA, Ingvar DH, Skinhøj E (1978) Brain function and blood flow. Sci Am 239(4):62–71 3. Tazin T, Alam MN, Dola NN, Bari MS, Bourouis S, Monirujjaman Khan M (2021) Stroke disease detection and prediction using robust learning approaches. J Healthcare Eng 4. Raja MS, Anurag M, Reddy CP, Sirisala NR (2021) Machine learning based heart disease prediction system. In: 2021 International conference on computer communication and informatics (ICCCI), January, IEEE, pp 1–5 5. Emon MU, Keya MS, Meghla TI, Rahman MM, Al Mamun MS, Kaiser MS (2020) Performance analysis of machine learning approaches in stroke prediction. In: 2020 4th international conference on electronics, communication and aerospace technology (ICECA), November, IEEE, pp 1464–1469 6. Govindarajan P, Soundarapandian RK, Gandomi AH, Patan R, Jayaraman P, Manikandan R (2020) Classification of stroke disease using machine learning algorithms. Neural Comput Appl 32:817–828 7. Cheng CA, Lin YC, Chiu HW (2014) Prediction of the prognosis of ischemic stroke patients after intravenous thrombolysis using artificial neural networks. In: ICIMTH, January, pp 115– 118 8. Chin CL, Lin BJ, Wu GR, Weng TC, Yang CS, Su RC, Pan YJ (2017) An automated early ischemic stroke detection system using CNN deep learning algorithm. In: 2017 IEEE 8th International conference on awareness science and technology (iCAST), November, IEEE, pp 368–372 9. Sankepally SR, Kosaraju N, Rao KM (2022) Data imputation techniques: an empirical study using chronic kidney disease and life expectancy datasets. In: 2022 International conference on innovative trends in information technology (ICITIIT), February, IEEE, pp 1–7
23 Heart Stroke Prediction Using Different Machine Learning Algorithms
301
10. Sinha D, Sharma A, Sharma S (2022) Automated detection of coronary artery disease comparing arterial fat accumulation using CNN. J Electron Imaging 31(5):051405–051405 11. “Stroke prediction dataset [Online] Available: https://www.kaggle.com/datasets/fedesoriano/ stroke-prediction-dataset 12. Witten IH, Frank E, Hall MA, Pal CJ, Data M (2005) Practical machine learning tools and techniques. In: Data mining, vol 2(4) 13. Mahesh B (2020) Machine learning algorithms-a review. Int J Sci Res (IJSR) [Internet] 9:381– 386 14. Vembandasamy K, Sasipriya R, Deepa E (2015) Heart diseases detection using Naive Bayes algorithm. Int J Innov Sci Eng Technol 2(9):441–444 15. Navada A, Ansari AN, Patil S, Sonkamble BA (2011) Overview of use of decision tree algorithms in machine learning. In: 2011 IEEE Control and system graduate research colloquium, June, IEEE, pp 37–42 16. Liu Y, Wang Y, Zhang J (2012) New machine learning algorithm: random forest. In: Information computing and applications: Third international conference, ICICA 2012, Chengde, China, September 14–16, Proceedings, vol 3. Springer, Berlin, Heidelberg, pp 246–252 17. Bijalwan V, Kumar V, Kumari P, Pascual J (2014) KNN based machine learning approach for text and document mining. Int J Database Theory Appl 7(1):61–70 18. Haghighi S, Jasemi M, Hessabi S, Zolanvari A (2018) PyCM: Multiclass confusion matrix library in Python. J Open Source Softw 3(25):729 19. Shareefunnisa S, Malluvalasa SL, Rajesh TR, Bhargavi M (2022) Heart stroke prediction using machine learning. J Pharmaceut Negative Results 2551–2558 20. Davis J, Goadrich M (2006) The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd international conference on Machine learning, June, pp 233–240 21. Fatima M, Pasha M (2017) Survey of machine learning algorithms for disease diagnostic. J Intell Learn Syst Appl 9(01):1
Chapter 24
Credit Card Fraud Detection Using Hybrid Machine Learning Algorithm Yamini Gujjidi , Amogh Katti , and Rashmi Agarwal
1 Introduction To perform fraud is to conduct any fraudulent or criminal conduct to gain some financial or personal gain. A system, typically, uses two processes—prevention of fraud and detection of fraud, to safeguard against financial loss that could be brought on. The best line is that of defence to stop frauds before they occur. Credit card fraud occurs when criminals use stolen card information to make authorized transactions. The fraudster learns the user’s password or other sensitive information during a credit card purchase. He then uses that information to fraudulently charge a large sum of money to the victim’s card without the victim ever realizing what occurred. However, the modern technological world relies on credit cards, contributing to an increase in credit card transactions daily and the growth of the e-commerce sector. Each year, the volume of credit card transactions increases. While more people profit from technological advancements, more credit card theft occurs. In terms of global impact, it is undoubtedly a significant challenge today. Since the perpetrators of credit card fraud are often able to conceal information about themselves, such as their identity and whereabouts, online, the issue has far-reaching significance for the financial sector. The fraud detection process is depicted below in Fig. 1. The terminal point validates certain conditions, such as sufficient balance and a valid Personal Identification Number (PIN), and filters transactions based on those conditions. The prediction model uses predefined criteria to determine if a given transaction is legitimate. Y. Gujjidi · A. Katti (B) Department of Computer Science and Engineering, GITAM University, Hyderabad, India e-mail: [email protected] R. Agarwal REVA Academy for Corporate Excellence, REVA University, Bangalore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_24
303
304
Y. Gujjidi et al.
Fig. 1 Credit card fraud detection process [1]
Following up on each suspicious alert, investigators submit feedback to the prediction model to improve the algorithm’s accuracy. This is only about the prediction model. Fraud detection systems are more complicated than they appear to be. In practice, the practitioner must determine which classification technique to apply (decision trees or logistic regression…) as well as how one should cope with the problem of class imbalance (Suspicious cases are exceedingly less in contrast to valid ones). Detection of fraud is not simply a problem because of the disparity between the rich and the poor. Due to a lack of transaction data, many machine learning algorithms fail in the classification job because of the overlap between the real and fraudulent classes. An actual fraud detection scenario involves a model that uses artificial intelligence to identify suspicious transactions and send an alert to the appropriate authorities when one of those transactions is determined to be either authentic or fraudulent. The fraud detection system is improved by investigators who investigate and give their findings back to the system. Furthermore, investigators can only certify a small fraction of transactions using this method in a timely manner. Predictive models typically perform worse when less data points are used to refine the model. Because financial companies infrequently release client data owing to privacy issues, it is challenging to uncover the genuine financial dataset. An important problem in fraud detection systems is overcoming this obstacle. In this work, specific machine learning algorithms are utilized to recognize illegal financial transactions. Due to the imbalanced dataset problem in the banking area, this work provides a hybrid model based on the feature selection approach, honey bee, and random forest ensemble classifier (particularly HBRF) for fraud detection.
24 Credit Card Fraud Detection Using Hybrid Machine Learning Algorithm
305
2 Problem Statement Credit card fraud is a crucial obstacle to expanding the financial services industry. Because of these scams, a large number of companies lost money. However, privacy concerns mean that only some of these studies analyse data collected from real-world transactions to identify patterns of fraudulent behaviour. Specific machine learning (ML) algorithms are utilized in this setup to identify potentially fraudulent financial dealings. Because of the imbalanced dataset problem in the financial sector, this paper proposes a hybrid model for fraud detection based on the feature selection approach, honey bee algorithm, and random forest ensemble classifier (specifically HBRF).
3 Related Work Methods such as RF, ANN, SVM, K-nearest neighbours, and others with a hybrid [2] and privacy-preserving approach to data privacy have been recognized as useful for detecting credit card fraud. Table 1 discusses literature review [3], which consists of author, description about the methods and demerits of that method. Table 1 Literature reviews S. No. Author Altaher et al. [4]
Description
Demerit
A novel cost-conscious decision tree method Inaccuracy in fraud for detecting fraud. To evaluate its efficacy, it detection partitions attributes at non-terminal nodes by minimizing the cost of misclassification and comparing the resulting model to a standard one using a dataset of actual credit card transactions. The cost of misclassification is demonstrated in several scenarios. A cost-sensitive algorithm was tested, and the findings demonstrate significant increases in performance relative to established approaches in terms of accuracy and positive rate metrics, while also defining a cost-sensitive metric specific to credit fraud detection. By taking this strategy, economic losses due to fraud can be prevented (continued)
306
Y. Gujjidi et al.
Table 1 (continued) S. No. Author
Description
Demerit
Kurshan et al. Submitted a manuscript focusing on the Accuracy is too low [5] challenges of fraud detection in credit card transactions and provides a survey of solutions based on natural and ML. The advantages and disadvantages of various ML approaches are analysed. Misuse (Supervised) and anomaly detection are two of the mentioned methods (Unsupervised). The ability to deal with numerical and categorical datasets provides a second classification Malik et al. [6]
Hidden Markov Model (HMM) proposed to Accuracy is only model the process flow of credit card around 80% for a wide transactions and its subsequent use to detect range of input data fraudulent activity. User’s typical patterns of conduct are tracked. If a trained HMM rejects a credit card transaction as likely fraudulent, the transaction is cancelled. However, it is crucial to protect normal credit card transactions so that they are not accidentally rejected
Hossain et al. [7]
Their study focuses on employing AI to detect fraud in real-time Self-organization Map is used to interpret, filter, and analyse customer behaviours patterns to identify potential fraud. This premium approach is utilized to identify fraud red flags in personal finances
More effective models allow for greater scope for advancement in accuracy
Varun Kumar et al. [8]
Credit card fraud has been detected using categorization and clustering methods. Indicating that the likelihood of the fraudulent transaction is low but not zero. Therefore, their study aims to evaluate categorization methods and classifiers. With its attention on preventing fraud, this system will not falsely flag legitimate purchases as malware
The model’s accuracy is poor compared to other approaches
Ramyashree et al. [9]
The performance of seven hybrid machine learning models was evaluated on a real-world dataset to detect fraudulent actions. The generated hybrid models had two stages: first, state-of-the-art, machine learning techniques were employed to detect credit card fraud; then, the hybrid approach was implemented
Oversampling concept not explained
(continued)
24 Credit Card Fraud Detection Using Hybrid Machine Learning Algorithm
307
Table 1 (continued) S. No. Author
Description
Demerit
Harwani et al. Using classification machine learning Oversampling concept [10] algorithms, statistics, calculus not explained (Differentiation, Chain Rule, etc.), and linear algebra, design and maintain complex machine learning models for prediction and understanding of the dataset, with a focus on predicting fraud and fraud-free transactions with respect to time and amount Leberi et al. [11]
In their study to the ML method, the best Oversampling concept data mining algorithm available at the time, not explained was developed specifically for the task of identifying instances of credit card fraud; hence, it was one of the first models used in this context. The bank’s dataset is based on the real world; thus, it is being taken and analysed. Support vector machine (SVM), K-nearest neighbour (KNN), fuzzy logic, and decision trees are just few of the methods that have been used for fraud detection over the years. All these methods have shown some success, but a hybrid learning approach is required to further enhance accuracy when uncovering fraud
Faraji et al. [12]
Credit card fraud is just one type of financial Oversampling concept crime that has risen in frequency due to the not explained rise of online shopping and payment systems. Because of this, it is essential to set up systems that can identify instances of credit card theft. This paper provides a genetic algorithm (GA)-based machine learning (ML)-based credit card fraud detection (CCFD) engine
4 Material and Methods This approach is based on a number of basic ML techniques to identify possible fraudulent credit card transactions. To create the hybrid algorithm, combine the best features of other algorithms, such as the RF and the HB. As a result, the system’s efficacy is owed to the hybridization algorithm. To use ML algorithms for timing and monetary transaction detecting fraud on credit cards.
308
Y. Gujjidi et al.
Fig. 2 Decision tree schematic layout [13]
4.1 Decision Tree Algorithm The decision tree method (depicted in Fig. 2) can be applied to both classification and regression issues as it shown in Fig. 2. The process is the same, however the corresponding equations may differ. Decision trees for a classification issue are constructed using entropy and information gain. Entropy and information gain both measures the measure to which data is random and how much we can learn from it. The Gini index and Gini coefficient are used to create a regression decision tree. The root node in a classification problem is chosen based on the information gain principle, where the node with the most information is preferred over those with the most entropy. The feature with the smallest Gini value is used as the root node in regression situations. By optimizing hyper parameters, we can calculate the tree’s depth with the grid search cv algorithm.
4.2 Random Forest Algorithm Random forests, Fig. 3, sample rows at random, choose features at random (which are the independent variables), and the number of DT can be optimized for utilizing system parameters. When presented as a classification issue, a random forest’s output is the maximum of the DT model’s responses. Many models have successfully implemented this well-known ML algorithm. This algorithm accomplishes the goals outlined in most Kaggle computing challenges.
24 Credit Card Fraud Detection Using Hybrid Machine Learning Algorithm
309
Fig. 3 Random forest algorithm dataset layout [14]
4.3 Honey Bee Algorithm Honey bees are known for their hunting behaviour, which inspired the bees algorithm. The bee’s algorithm, depicted in Fig. 4, begins with an initiation phase and continues with a main search cycle that repeats for a fixed number of times, T, or until a solution with sufficient fitness is identified. This algorithm is used for optimization and by using this algorithm, the results were optimal in nature.
Fig. 4 Honey bee algorithm systemic layout [15]
310
Y. Gujjidi et al.
Fig. 5 SMOTE balancing data [16]
4.4 Synthetic Minority Oversampling Technique (SMOTE) An effective statistical method for resolving imbalanced data is the SMOTE, shown in Fig. 5. In order to achieve a more equitable distribution of data, it is necessary to artificially generate new minority cases while randomly increasing the number of existing minority instances. Also, it assists in minimizing the overfitting issue that comes with using a large sample size.
4.5 Confusion Matrix and Related Metrics The model’s accuracy may be very high yet misleading if it incorrectly categorized all the incidents as normal transactions based on extremely skewed fraud data. Due to this, accuracy is not a reliable indicator of performance when dealing with fraud data. This analysis measures accuracy in terms of precision and recall, which are derived from the confusion matrix. Confusion matrices reveal how various inputs were distributed across classes. Based on the confusion matrix, shown in Table 2, the performance metrics can be determined: True Positive (TP) = Estimated Number of Unauthorized charges. True Negative (TN) = number of commercial transactions considered to be authentic. False Positive (FP) = number of legal transactions predicted as fraud. False Negatives (FN) = number of fraud transactions predicted as legal. These values are applied to compute the following. Table 2 Confusion matrix
Actual/predicated
Fraud
Not fraud
Fraud
True positive
False negative
Not fraud
False positive
True negative
24 Credit Card Fraud Detection Using Hybrid Machine Learning Algorithm
Accuracy =
TP +TN T P + FP + T N + FN
Recall =
TP T P + FN
Precision = F1 =
TP T P + FP
TP T P + 1/2(F P + F N )
311
(1) (2) (3) (4)
5 Proposed Model 5.1 Credit Card Fraud Detection Using Hybrid Machine Learning Algorithm The combination of RF algorithm and HB algorithm with SMOTE is the hybrid approach. It helps in predicting output with high accuracy and is optimal in nature. This algorithm detects fraud and decreases fraud transactions and controls misuse of data. Algorithm Input: No. of transactions, fraud transactions based on threshold value rate for input data, combined transactional data. Output: Prediction of optimized accuracy. Explore the initial text of user transactions data. Arrange in N-dimensional array with different combinations. // Classification of Data Presentation Based on Random Forest Based on the pre-processing rate of each transaction, performed on n-dimensional re-production with different notations. For i = 1 to n-dimensions. Select randomly appeared transactions. Calculate the threshold rate for each transaction and destroy the non-matched transactions. Commit the relations of each transaction. End for
312
Y. Gujjidi et al.
//Optimization Process Based on threshold matched values, select optimal rates for fraud. For each i = 1 to n-dimensions. Select an optimized solution for each transaction. Generate and store solutions for each transaction. Save optimized transactions. End for Update accuracy, classification parameters Return best-optimized solution.
6 Results and Discussion The data used in this study is available for free on Kaggle. There were 284,807 transactions over two days, and only a small percentage were flagged as potentially fraudulent. In total, 28 characteristics were transformed from the dataset. In addition to time and amount, the dataset comprises 30 more characteristics (V 1, …, V 28). There are no non-numerical attributes in this dataset. The final column indicates the category (transaction type), with 1 indicating a fraudulent transaction and 0 indicating a legitimate one. In order to protect the privacy and integrity of the data, the features V 1–V 28 will not be provided. The SMOTE was implemented during the data pre-processing step of the proposed architecture depicted in Fig. 1 to address the class imbalance issue. In the SMOTE method, a point is randomly selected on a line between close samples in the feature space to create a new member of the oppressed group. In this case, the proposed model significantly improved above random forest. In contrast, it is easy to see that our dataset has a severe “class imbalance” issue. Over 99% of all transactions are legitimate (i.e. not fraudulent), while only 0.17% are fraudulent. Suppose we train our model with such a distribution without addressing the imbalance issues. In that case, it will assign more weight to legitimate transactions (because there are more data about them) and produce more accurate predictions. Class differences can be addressed using a variety of methods. One such label is “oversampling”. Random selection of the minority group is one method for redressing skewed datasets. The fastest process requires just generating minority class examples without changing the model. Instead, we can create new examples by synthesizing them from existing models. The “Synthetic Minority Oversampling Technique” (or “SMOTE” for short) is a method of data augmentation that focuses on underserved populations—implementing an oversampling technique to RF and DT.
24 Credit Card Fraud Detection Using Hybrid Machine Learning Algorithm
313
The methods we took to arrive at our prediction included. 1. Reading the problem statement, 2. Analysing the data through statistical analysis and visualization, and 3. Analysing the distribution of the data. Due to its uneven distribution, this dataset underwent balancing via oversampling, standardized scaling via standardization and normalization, and finally was put through a battery of ML algorithm evaluations. For data science projects, NumPy, numeric python, and pandas are essential, as are matplotlib and seaborn, which improve on matplotlib. Figure 6 shows genuine transactions and fraud transactions of our dataset. Here, we were comparing genuine transactions and fraud transactions, and from this, we have 284,315 genuine transactions and 492 fraud transactions in our dataset. In a nutshell, 0.1727% of fraud transactions are in our dataset. Figure 7 represents the confusion matrix of the decision tree, random forest, and the hybrid approach. False negative value is lesser for decision trees which means fraud transactions are predicted as legal. The confusion matrix for random forest shows a false positive value lesser which means legal transactions are predicted as fraud. The confusion matrix for the hybrid approach after oversampling shows a false negative value lesser, which means fraud transactions are predicted as legal. Table 3 compares the effectiveness of the three approaches using the metrics accuracy, precision, recall, and F1-score. The proposed hybrid model outperforms the other two approaches where RF and HB approaches performed well. And SMOTE helps hybrid model to perform accurately.
Fig. 6 Visualize the “Labels” column in our dataset
314
Y. Gujjidi et al.
Decision tree
Random forest
Hybrid (proposed approach) Fig. 7 Confusion matrix of methods
Table 3 Mode performance for unbalanced data Methods/metric
Accuracy
Precision
Recall
F1-score
Decision tree
0.99925
0.74324
0.80882
0.77465
Random forest
0.99961
0.94017
0.80882
0.86957
Hybrid model
0.99989
0.99978
1.00000
0.99989
7 Conclusions and Future Work The proposed solution uses an approach from machine learning algorithms to prevent credit card fraud. However, none of the current identity verification technologies can reliably identify all frauds in progress. However, they typically detect it after the occurrence. It is because only a small proportion of all transactions are actually fraud. The random forest algorithm improves with additional training data, but the need slows its speed for more time to experiment and put the data into practice. In addition, a hybrid procedure might be put into action with this data. For a hybrid system to be effective, it must combine high-priced training techniques that yield extremely precise results with an improvement strategy that can decrease the overall
24 Credit Card Fraud Detection Using Hybrid Machine Learning Algorithm
315
cost of the system and speed up the machine’s learning process. How and where a fraud sensing device is implemented affects which hybrid approaches are used. The proposed hybrid model consists of RF and HB, where RF is used to find results accurately within large datasets, and HB is used for optimization. In addition, SMOTE helps prevent the hybrid model from oversampling data. This model gives an impact result. For future work, this paper proposes a framework for implementing models of online training. The other training models can be evaluated as well. Cases of potential fraud can be carried along more quickly with online training models. This method of detection can prevent credit card fraud before it begins. Losses are thereby reduced in the financial sector.
References 1. Micro Medium (2022). https://miro.medium.com/max/1104/1*3pI3d_3ec0fHk1VKN840ww. png. Accessed 20th Dec 2022 2. Adiga P et al. (2017) Credit card fraud detection-a hybrid approach using simple genetic and Apriori algorithms. Int J Recent Sci Res 8(4):16308–16313. https://doi.org/10.24327/ijrsr. 2017.0804.0125, https://tse1.mm.bing.net/th?id=OIP.eov5qjYvSSXxfDOahjWbsAHaDC& pid=Api&P=0, https://tse4.mm.bing.net/th?id=OIP.do6O3PSpGz7-UnBs-XLWnAHaG3& pid=Api&P=0 3. Quora (2022). https://www.quora.com/ Accessed 20th Dec 2022 4. Taha AA, Malebary SJ (2022) An intelligent approach to credit card fraud detection using an optimized light gradient boosting machine. IEEE Access 5. Kurshan E, Shen H (2020) Graph computing for financial crime and fraud detection: trends, challenges and outlook. Int J Semant Comput 14:565–658 6. Malik EF, Khaw KW, Belaton B, Wong WP, Chew X (2022) Credit card fraud detection using a new hybrid machine learning architecture. Mathematics 10:1480. https://doi.org/10.3390/mat h10091480 7. Hossain MA, Islam SMS, Quinn JMW, Huq F, Moni MA (2019) Machine learning and bioinformatics models to identify gene expression patterns of ovarian cancer associated with disease progression and mortality. J Biomed Inform 100:103313 8. Varun Kumar KS, Vijaya Kumar VG, Vijay Shankar A, Pratibha K (2020) Credit card fraud detection using machine learning algorithms. Int J Eng Res Technol (IJERT) 9(07) ISSN: 2278–0181 9. Ramyashree K, Janaki K, Keerthana S, Harshitha BV, YV (2019) A hybrid method for credit card fraud detection using machine learning algorithm. Int J Recent Technol Eng (IJRTE) 7(6S4) ISSN: 2277–3878 10. Harwani H, Jain J, Jadhav C, Hodavdekar M (2020) Credit card fraud detection technique using hybrid approach: an amalgamation of self organizing maps and neural networks. Int Res J Eng Technol (IRJET) 07(07) e-ISSN: 2395–0056 11. Leberi EI, Sun Y, Wang Z (2022) A machine learning based credit card fraud detection using the GA algorithm for feature selection Ileberi et al. J Big Data 9:24. https://doi.org/10.1186/ s40537-022-00573 12. Faraji Z (2022) A review of machine learning applications for credit card fraud detection with a case study. SEISENSE J Managem 5(1):49–59. https://doi.org/10.33215/sjom.v5i1.770 13. Lucidspark (2022). https://lucidspark.com/blog/how-to-make-a-decision-tree. Accessed 20th Dec 2022
316
Y. Gujjidi et al.
14. Git Hub, Anas Bital (2022). https://anasbrital98.github.io/blog/2021/Random-Forest/, https:// anasbrital98.github.io/assets/img/20/random-forest.jpg. Accessed 20th Dec 2022 15. Afshar A, Bozorg Haddad O, Mariño M, Adams B (2007) Honey-bee mating optimization (HBMO) algorithm for optimal reservoir operation. J Franklin Instit 344(5):452–462. https:// tse3.mm.bing.net/th?id=OIP.nSIK5yl0CATEzx70kzHIhwHaFe&pid=Api&P=0 16. Sethi N, Gera A (2014) A reviewed survey of various credit card fraud detection techniques. Int J Comput Sci Mobile Comput 3(4):780–791
Chapter 25
Smart Air Pollution Monitoring System for Hospital Environment Using Wireless Sensor and LabVIEW P. Sathya
1 Introduction With the progress of urbanization and modernization in industrial cities, the problem of air pollution in hospitals is becoming a major concern for the health of the population. Emissions from fossil fuel-based power generation, transportation, household heating, industrial, and other chemical disposal are the major causes of air pollution in the capital and hospitals in the most populous areas. In most cities where hospitals and outdoor air pollution control systems do not meet the World Health Organization (WHO) air quality guidelines for acceptable air quality, air pollution becomes major health problem in those areas [1]. Even though the central pollution control board has taken several measures to reduce the air pollution, the increased usage of vehicles, and expansion of numerous industries in cities continue to deteriorate the quality of air. Health effects such as stroke, heart disease, lung cancer, and respiratory illnesses such as asthma are caused by air pollution in the hospital environment. Uncontrolled air quality has a serious negative impact on people’s health, including the health of vulnerable people such as children, asthma patients, pregnant women, and the elderly.
P. Sathya (B) School of Electronics Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu 632014, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_25
317
318
P. Sathya
2 Literature Review In this section, different technologies employed for monitoring of air pollution in the past are discussed. The air pollution monitoring system can provide a convenient and straightforward monitoring method that incorporates several sensors into a single hardware architecture by including sensor and embedded system design of LabVIEW methodologies. One such ZigBee-based air quality monitoring system was developed and the pollutants were monitored remotely by installing the sensor nodes at different points [2]. The system was designed only to monitor and display the values of air pollutants in the graphical user interface and analysis was carried out manually. An array of sensors with GPRS was designed and implemented along with mobile Data Acquisition unit for air pollution monitoring by the authors at Sharjah, UAE [3]. They collected the air pollutants level using mobile DAC unit and packed the data in a frame with GPS location and transmitted to the pollution server using public mobile network. The system was integrated with Google maps, so that it provided the realtime pollution level at any location. Due to recent advancements in the development of low cost portable smart sensors with high accuracy in real time, the paradigm of air pollution monitoring also changing rapidly [4]. The air pollution monitoring system differs for urban and rural environments due to its density of population and their life style requirements. To monitor the air pollution in rural environment with poor accessibility, an Unmanned Aerial Vehicle (UAV) equipped with sensors were designed [5]. The system was controlled using pollution driven control algorithm developed on the basis of local particle swarm optimization strategy. The authors were able to spot the areas with higher pollution content within the flight time limit of UAV. Most of the existing air pollution monitoring systems are having fixed infrastructure requiring frequent maintenance, facing sensing issues, and infeasible reconfiguration. They are also costly and spatially restricted facing unexpanded sensing capabilities. To address this issue, a Modular Sensor System (MSS) adopting universal sensor interface and modular design of a sensor node was proposed [6]. They developed a MSS sensor node with the options of expanding plug and play sensors and compatibility of multiple wireless sensor networks. This system can be deployed at different areas varying in pollution concentration level. Most of the air pollution monitoring systems were then developed using Internet of Things (IoT) and applied at different environments [7–9]. An IoT-based air pollution monitoring system was developed by the authors and they monitored the environmental pollution in real time and logged the data into a remote server [10]. They analyzed the data using excel, displayed the result in the designed hardware interface and provided a means of access from cloud to any mobile device. An IoT-based wireless sensor network with low cost sensors were used for collecting the air pollutants and datas were transmitted to the cloud using ThinkSpeak [11]. To monitor the air pollution in ambient environment while traveling, a method was proposed to deploy WSN senor nodes in public transport buses [12]. Using this method, the air pollution data could be collected using those sensors and stored in
25 Smart Air Pollution Monitoring System for Hospital Environment Using …
319
the cloud server using IoT. An another real-time air pollution monitoring system was deployed on public transport vehicles in Sweden that utilized the IoT and WSN for pollution monitoring as a part of Green IoT initiative [13]. Similarly, to figure out the emission rate of poisonous gases from vehicles that are responsible for environmental air pollution, a smart in-vehicle monitoring system was developed [14]. Recently in 2020, authors were discussed about the air pollution monitoring systems and air quality index prevailing in Ukraine which signified the importance of efficient pollution monitoring systems to recover back to normal condition [15]. The booming trend in air pollution monitoring system is employing artificial intelligence and deep learning. One such technology combining IoT and artificial neural network was developed by authors in 2022 [16]. They developed an Automated Environmental Toxicology for air pollution monitoring system using ANN (ETAPM— AIT model). This model was trained to send the air quality level to the cloud in real time and alarm the system in case of hazardous pollutants level in the air. Similar technology was developed and used for personalized air pollution monitoring and health management in Hong Kong [17]. From the literature review, it is observed that, there is still a requirement in developing smart pollution monitoring system that is not bounded by spacing constraints, sensing limitations, reconfigurable, and requires less maintenance. In this paper, we propose a cost efficient smart air pollution monitoring system that is portable and could be deployed easily at hospital environments which would serve as a first indicator of air pollution with remote access.
3 Materials and Methods The main concern addressed in this paper is remotely monitoring the quality of air in the hospital environment and knowing the types of pollutants in air. This is accomplished by using smart sensors that detects the different types of pollutants in air. The sensing device integrated with processing unit is placed in the hospital premises and the data collected is displayed in the graphical user interface of LabVIEW. If the level of pollutants exceeding the prescribed normal limit in the site, a corrective measure can be taken to reduce the air pollution based on the knowledge of pollutants and their intensity prevailing at that place.
3.1 Proposed Air Pollution Monitoring System The block diagram of the proposed methodology is shown in Fig. 1. This system comprised of three gas sensors such as MQ135 , MQ4 and MQ7, ATMega 16 Microcontroller, Max232, ZigBee pair, LCD, RS cable, power supply, and LabVIEW on host computer to detect components of air pollutants. MQ135 is used to detect wide range of gases including NH3 , NOx , alcohol, benzene, smoke, and CO2. MQ7 detects the presence of CO at concentrations from 10 to 10,000 ppm whereas MQ4 detects
320
P. Sathya
CH4 and natural gas. The gas sensors interfaced with the microcontroller are used to sense the pollutants in air and an analog output is generated corresponding to their concentration level. The analog datas are acquired and converted into digital data and processed using ATmega16 microcontroller. The processed datas are displayed in the LCD for user reference. The ZigBee protocol is used here for enabling wireless serial communication between the microcontroller and software graphical programming tool installed in the computer. The VISA driver software is installed with LabVIEW in the laptop which supports the serial communication with ZigBee pair. The list of components along with their specifications is shown in Table 1.
Fig. 1 Block diagram of the proposed air pollution monitoring system
Table 1 List of component used and their specifications
S. No.
Component
Specifications
1
Microcontroller
ATMega 16
2
ZigBee pair
IEEE.802.15.4
3
Gas sensors
MQ4 , MQ7 , MQ135
4
LCD
16 × 2
5
Communicator
MAX 232
6
Stepdown transformer
12 Vdc
7
Voltage regulator
IC 7805
8
Capacitor
1000 µF
9
Cable
RS 232, USB
25 Smart Air Pollution Monitoring System for Hospital Environment Using …
321
3.2 Working Principle of the System The working principle of the smart air pollution monitoring system is presented as a flowchart in Fig. 2. This system is portable and can be installed anywhere easily. To determine the performance of the system in real time, the prototype was installed in the outdoor site of our lab. The prototype was powered on and the components of air pollutants were sensed by the different gas sensors. The sensed analog datas were converted into digital data by the DAC embedded inside the microcontroller. The acquired digitals signals were processed by the microcontroller based on the programming which was already loaded into it. The digital datas were compared with the standard reference values of air pollutants stored in the program. If the acquired data was well above the reference value such as high or critical, the LCD was initiated to display the information. The instantly acquired signals were processed and communicated to the remote monitoring system using ZigBee technology. After displaying the data, the system was made to wait for few minutes and then started sensing the air pollution again. If in case, the sensed data was below the reference value, the system was made to repeat its sensing process after a time delay without updating the data in the display. The remote monitoring system comprises of a personal computer installed with LabVIEW.
3.3 Graphical Programming in LabVIEW Laboratory Virtual Instrument Engineering Workbench (LabVIEW) is a software tool developed by the National Instruments. It serves as a system design platform and development environment for a visual programming language. A graphical program was created using Graphical User Interface (GUI) that has front panel (User Interface) and a control panel (Block diagram panel). The block diagram panel programmed for collecting the air pollution and displaying the components of different air pollutants is shown in Fig. 3. The program was developed using case structures for collecting and displaying the digital value corresponding to different air pollutant with the help of waveform graphs. The front panel of the programming window is shown in Fig. 4. It has the information related to date and time of measurement, serial communication port number, and the waveform graphs for each sensor output. The waveform graph keeps a background marked by information from last updates. The read cradle contains the sequential information which is of 15 bytes of information.
322
P. Sathya
Fig. 2 Flowchart representing the working principle of air pollution monitoring system
4 Results and Discussion The experimental set up has been fixed in the outdoor environment of lab near the health center of our insitute. The continous monitoring of pollution level has been performed for 10 days right from April 24th to May 5, 2022. The sensed analog datas are converted into digital datas and then processed using the Atmega 16 microcontroller. The concentration of different air pollutants like CO, CO2 , CH4 , Natural gas, and so on are displayed directly in the LCD interfaced in the prototype. For remote access, the ZigBee pair is used to receive the digital data and transmit to the LabVIEW installed in the host computer. This can be achieved using RS232 serial cable converter to USB serialport cable that redirects serial console to host computer.
25 Smart Air Pollution Monitoring System for Hospital Environment Using …
Fig. 3 Block diagram panel of the graphical program in LabVIEW
Fig. 4 Graphical user interface front panel of the program in LabVIEW
323
324 Table 2 Permissible exposure limit (PEL) of air pollutants in ppm
P. Sathya
S. No.
Pollutant
PEL in ppm
1
Carbon monoxide (CO)
55
2
Carbon dioxide (CO2 )
10,000
3
Methane (CH4 )
1000
4
Ammonia (NH3 )
50
5
Benzene (C6 H6 )
1
6
Natural Gas
10
The VISA driver is installed with LabVIEW in host computer which supports serial communication with ZigBee pair to receive data and these extracted data are plotted on a graph. To know the safe range of pollutants in air and to set the critical level in the program, the standard recommendation is considered here. The current Occupational Safety and Health Administration (OSHA) has set the permissible exposure limit (PEL) of pollutants in air. The values are calculated for a time weighted average (TWA) of 8 h in parts per million (ppm). The permissible level of different components in air pollution is shown in Table 2. The experimental set up displaying the output on May 3, 2022 is shown as a reference in Fig. 5. The monitored level of air pollutants displayed in three different categories are shown in LCD. The displayed values corresponding to three different sensors are MQ4 (CO2 ) is 40 ppm, MQ7 (CO) is 17 ppm, and MQ135 (CH4 ) is 18 ppm at that instant. The instantaneous plot of the concentration levels of the three different sensors displayed in the front panel of LabVIEW program is shown in Fig. 6. The sample shown here was taken on May 3, 2022 at 12:34:59 PM. From the plot, it was observed that the concentration of carbon dioxide varies in the range of (37– 38) ppm, carbon monoxide in the range of (17–18) ppm, and methane in the range of (14–15) ppm. All the measured values are well within the safe operating limit for human beings. The developed system is capable of sensing the air pollutants continuously ranging from lower level to higher level. In order to alert the user at the instant of sensing high pollution level, it is programmed to display the pollution levels exceeding the defined threshold in the LCD. This distinct feature avoids the manual analysis of pollution level from the continuously monitored data which is followed in other works. The highest level of pollution above the threshold sensed at any instant would be updated in the display, which is required for any control action to proceed further. The average concentration of different air pollutants in air measured for a period of 10 days is plotted in Fig. 7. From the graph, it was observed that the concentration of carbon dioxide was around 39 ppm, carbon monoxide of 16 ppm, and methane/ natural gas of about 18 ppm. This prototype can be used as a first indicator of air pollution monitor at health centers which will be remotely controlled. Moreover, the device would be constructed at a low cost (Rs. 2700) as compared to other conventional existing devices which is around Rs. 8000 to Rs. 60,000.
25 Smart Air Pollution Monitoring System for Hospital Environment Using …
Fig. 5 Experimental set up displaying the output in LCD
Fig. 6 Real-time display of LabVIEW showing the concentration of air pollutants
325
326
P. Sathya
Fig. 7 Average concentration of air pollutants measured for a period of 10 days
5 Conclusion In this paper, a low cost smart air pollution monitoring system has been presented for hospital environment. Since this prototype is portable and compact, it could be installed at any location for pollution monitoring. This device can be used as a prime indicator of pollution monitoring with remote access, but it can be improved further to achieve the following which is our future scope. Improving the robustness of the system while maintaining the compact size and device dimension, ensuring accuracy of the data produced by the sensors, and the enhanced wireless network function are the key features which need to be addressed in near future. Addition of power supervisory circuits to the system can be included so as to protect the device in case of a sudden power failure or low battery is detected. Also, to protect the device against overvoltage, a transient voltage suppressor can be included in the system.
References 1. Julius A, Jian-Min Z (2017) IoT based patient health monitoring system using lab view and wireless sensor network. Int J Sci Res (IJSR) 6(3):894–900 2. Telegam N, Kandasamy N, Nanjundan M (2017) Smart sensor network based high quality air pollution monitoring system using LabVIEW. Int J Online Eng (iJOE) 13(08):79–87 3. Al-Ali AR, Zualkernan I, Aloul F (2010) A mobile GPRS-sensors array for air pollution monitoring. IEEE Sens J 10(10):1666–1671 4. Snyder EG, Watkins TH, Solomon PA, Thoma ED, Williams RW, Hagler GS, Shelow D, Hindin DA, Kilaru VJ, Preuss PW (2013) The changing paradigm of air pollution monitoring. Environ Sci Technol 47(20):11369–11377 5. Alvear O, Zema NR, Natalizio E, Calafate CT (2017) Using UAV-based systems to monitor air pollution in areas with poor accessibility. J Adv Transport 7(1)
25 Smart Air Pollution Monitoring System for Hospital Environment Using …
327
6. Yi WY, Leung KS, Leung Y, Meng ML, Mak T (2016) Modular sensor system (MSS) for urban air pollution monitoring. IEEE Sensors 1–3 7. Balasubramaniyan C, Manivannan D (2016) IoT enabled air quality monitoring system (AQMS) using raspberry Pi. Indian J Sci Technol 9(39):1–6 8. Anil Kumar U, Keerthi G, Sumalatha M, Sushma R (2017) IoT based noise and air pollution monitoring system using Raspberry Pi. Int J Adv Technol Eng Sci 5(3):183–187 9. Gupta H, Bhardwaj D, Agrawal H, Tikkiwal VA, Kumar A (2019) An IoT based air pollution monitoring system for smart cities. In: 2019 IEEE International conference on sustainable energy technologies and systems (ICSETS), IEEE, pp 173–177 10. Okokpujie K, Noma-Osaghae E, Modupe O, John S, Oluwatosin O (2018) A smart air pollution monitoring system. Int J Civil Eng Technol 9(9):799–809 11. Patil D, Thanuja TC, Melinamath BC (2019) Air pollution monitoring system using wireless sensor network (WSN). In: Data management, analytics and innovation, Springer, Singapore, pp 391–400 12. Saha D, Shinde M, Thadeshwar S (2017) IoT based air quality monitoring system using wireless sensors deployed in public bus services. In: Proceedings of the second international conference on internet of things, data and cloud computing, pp 1–6 13. Kaivonen S, Ngai EC (2020) Real-time air pollution monitoring with sensors on city bus. Digital Commun Netw 6(1):23–30 14. Miletiev R, Damyanov I, Iontchev E, Yordanov R (2020) Smart in-vehicle environment monitoring system. In: 2020 XXIX International scientific conference electronics (ET), IEEE, pp 1–4 15. Zaporozhets A, Babak V, Isaienko V, Babikova K (2020) Analysis of the air pollution monitoring system in Ukraine. In: Systems, decision and control in energy I, Springer, Cham, pp 85–110 16. Asha P, Natrayan LB, Geetha BT, Beulah JR, Sumathy R, Varalakshmi G, Neelakandan S (2022) IoT enabled environmental toxicology for air pollution monitoring using AI techniques. Environ Res 205(1):112574–112579 17. Li VO, Lam JC, Han Y, Chow K (2021) A big data and artificial intelligence framework for smart and personalized air pollution monitoring and health management in Hong Kong. Environ Sci Policy 124(1):441–450
Chapter 26
Mining Optimal Patterns from Transactional Data Using Jaya Algorithm Honey Sengar and Akhilesh Tiwari
1 Introduction Due to the rapid expansion of the Internet and the proliferation of linked devices, the amount of data stored has increased dramatically. Several individuals and businesses make use of this data to generate helpful information to aid decision-making. As a result, data processing has become a significant challenge, necessitating the development of new frameworks and methodologies that require less processing time and memory. The most popular data processing technique in the preceding ten years, Knowledge Discovery in Databases (KDD), seeks out intriguing patterns in recorded data and typically involves three stages: preprocessing, data mining, and finally postprocessing. Data mining, which tries to extract non-trivial information from data, is the main KDD process. It comprises methods used in data processing as classification, clustering, regression analysis, and association rules [1]. Association rule mining is the most significant data mining task. To ascertain the connection between objects in a database, association rules are used. Based on the coincidence of the data elements, the associations between the items are made [2]. Association rules are used to analyze consumer behavior in a variety of settings, including retail, banking, storage design, and shopping. The minimal support value has a significant impact on association rule discovery. There are several different algorithms that have been suggested for the development of association rules. The abundance of rules produced by association rule mining algorithms is one of their main drawbacks. Association rule mining and swarm intelligence algorithms can be combined to improve rule mining. H. Sengar (B) Department of CSE, Madhav Institute of Technology and Science, Gwalior, India e-mail: [email protected] A. Tiwari Department of IT, Madhav Institute of Technology and Science, Gwalior, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_26
329
330
H. Sengar and A. Tiwari
Apriori [3], FP-growth, [4] and other traditional algorithms have been developed to address ARM problems. These algorithms were created to extract each relationship from the dataset. Their execution time and memory use are currently limited by the amount of data that is kept in databases. On the other side, bio-inspired algorithms use computational techniques found in nature, including swarm intelligence or evolutionary computing, to explore the item set space. First off, the genetic algorithm [5] has been effectively used and has shown encouraging outcomes. Well-known techniques like particle swarm optimization [6], bees swarm optimization (BSO) [7], bat Algorithm (BA) [8], and others were employed with ARM to implement swarm intelligence. The datasets are conceptualized formally as sample search spaces, where the technique aims to maximize or minimize an objective mathematical function that evaluates the effectiveness of the selected rules using a variety of parameters. Optimal power flow [9, 10], parameters extraction of solar cells [11], knapsack problems [12], virtual machine placement [13], job shop scheduling [14, 15], permutation flow-shop scheduling problem [16], reliability-redundancy allocation problems [17], team formation problem [18], truss structures [19], facial emotion recognition [20], and feature selection [21] are just a few examples of the numerous optimization problems for which the Jaya algorithm has been extensively used [22, 23].
2 Association Rule Mining Association rule mining has a wide range of applications and has been influential in many fields. Market basket analysis is a key example of this. It examines consumer purchasing patterns by identifying relationships between the various items that consumers place in their “shopping baskets”. The market basket analysis aids the management of the supermarket in making routine business decisions such as what to put on sale, how to plan the shop, what promotional methods to take into account, etc. Organizations can make wise business judgments by figuring out relationships between various items. In a market basket database, it is preferable to locate association rules that have high confidence and support (i.e., those that meet a user-specified minimum confidence and minimum support) [24]. An association rule can be thought of as an expression of type X→Y, where X is an item set (also known as the rule antecedent) and Y is an item set (also known as the rule consequent). In the real-world scenario, a rule like this might state that customers who bought X also bought Y. The efficacy of a rule is defined by support and confidence. Rule support is the proportion of records with both X and Y. It represents the prior probability of X∪Y (i.e., its observed frequency) in the dataset. The conditional probability of discovering Y given X is the rule confidence. It describes the strength of the implication and is given by c(X→Y ) = s(X→Y )/s(X) [25]. The degree of a rule’s strength is indicated by the confidence measure. A rule’s support indicates both its applicability and statistical
26 Mining Optimal Patterns from Transactional Data Using Jaya Algorithm
331
significance (a rule with very little support is not statistically significant) (a rule with high support is applicable in many transactions).
2.1 Apriori Algorithm Rakesh Agrawal developed the Apriori algorithm in 1993 [24]. It is an algorithm for mining frequent item collections that are unsupervised. It is an algorithm for discovering correlations between rules and frequently occurring item sets in relational databases. As long as such item sets show frequently enough in the database, it advances by locating the frequently occurring individual items there and extending them to progressively larger item sets. This algorithm builds association rules from a given data set using a “bottom-up” methodology by gradually expanding frequently used subsets, until no further expansion is possible. It uses k-item sets to explore (k + 1)-item sets in a level-by-level manner [24]. This technique employs an Apriori rule approach to reduce the huge number of candidate set generation. A subset is rare if its superset is also rare, according to the apriori rule [26]. Due to its ease of use, the Apriori approach is quite well-liked. However, as previously said, a high number of scans demand more time and resources because this method constantly analyzes the database for frequent item sets. Therefore, it is ineffective when dealing with enormous datasets.
3 Proposed Work 3.1 Jaya Optimization Algorithm Jaya is a straightforward and strong optimization algorithm that has been used to test both unconstrained and constrained problems [27]. In terms of the concept of the “survival of the fittest”, Jaya is a novel population-based metaheuristic algorithm that combines SI and EA traits. While the best solution search in SI, the swarm often adheres to the leader. Rao [28] presented the Jaya algorithm in 2016, and due to its exceptional properties, it has caught the interest of several research communities. It is easy to use and offers straightforward concepts. There is not any derivative information in the initial search. A no-parameter algorithm is used. It is adaptable, flexible, sound, and complete. It takes its cues from “survival of the fittest” theory’s natural behavior. This indicates that the best solutions available globally are being prioritized by the Jaya population, while the worst solutions are being disregarded. In other words, the Jaya algorithm’s search process seeks out the greatest global solutions in an effort to come closer to success and seeks to avoid the worst ones in an effort to avoid failure [29].
332
H. Sengar and A. Tiwari
The population size, the quantity of design variables, and the criterion of termination all start out with initial values in the Jaya algorithm. Typically, the termination criterion for this strategy is the maximum number of iterations. Let f (Z) be the desired objective function that must be reduced (or maximized). Suppose that there are a design variables (i.e., j = 1, 2,..., a) and b candidate solutions (i.e., populationsize, k = 1, 2,..., b) available at each iteration i. Suppose that “best” refers to the best-found candidate solution for the equation f (Z), which is denoted by the notation f (Z)best. As indicated by f (Z) worst, let “worst” refer to the worst outcome for f (Z). The value is updated in accordance with Eq. (1) if Zj, k, i is the value of the jth variable for the kth candidate during the ith iteration [28]. Z j, k, i = Z j, k, i + r 1, j, i (Z j, best, i − |Z j, k, i|) − r 2, j, i (Z j, wor st, i − |Z j, k, i|)
(1)
The values of j for the best candidate are represented by Zj, best, i and the worst candidate’s value is represented by Zj, worst, i. The two random integers r1, j, i and r2, j, i are chosen at random for the jth variable during the ith iteration from the range [0, 1]. Z j, k, i is the updated value of Zj, k, i. The terms “r1, j, i ((Zj, best, i−Zj, k, i)” and “−r2, j, i (Zj, worst, i−Zj, k, i)” represent the solution’s tendency to approach the best solution and avoid the worst solution, respectively. If Z j, k, i yields a higher value for the function, it is acceptable. Every iteration ends with an acceptable function value, which is then kept and utilized as the input for the subsequent iteration. The algorithm, which goes by the name Jaya, aims to prevail by finding the best solution (a Sanskrit word meaning victory). Jaya Algorithm 1. Assign beginning values to the population size, total number of designed variables, and termination criteria. 2. Repetition of steps 1 to 5 until the necessary terminations. 3. Determine which answer is the best and worst. 4. Update the answer. Z j, k, i = Z j, k, i + r 1, j, i (Z j, best, i − |Z j, k, i|) − r 2, j, i (Z j, wor st, i − |Z j, k, i|)
(2)
5. Update the prior solution if Z j, k, i > Zj, k, i; else, do not update the previous answer. 6. Show the secured optimal answer.
26 Mining Optimal Patterns from Transactional Data Using Jaya Algorithm
333
4 Methodology Inside this research paper, a Jaya optimization algorithm is suggested as a solution to the problem of reducing the number of association rules. The Apriori algorithm uses the transaction dataset to generate the association rule set, together with a userinterested support and confidence value. Since these association rule sets are discrete and continuous, weak rule sets are required for pruning. It’s crucial to optimize the result. We’ll suggest the Jaya optimization technique for the optimization of association rules. The steps of the proposed method for generating optimal association rules are as follows: 1. 2. 3. 4. 5.
Start Load the repository’s dataset. Employ the Apriori method to identify frequent item sets. Establish the Jaya optimization algorithm’s termination conditions. Use the Jaya optimization method to obtain the best association rules from the specified rules. 6. Calculate each rule’s fitness value. 7. Add the rules to the output set if the fitness value satisfies the necessary criteria. 8. Keep going until the desired termination criterion is met. Block diagram given below shows the procedure of the proposed work (Fig. 1). After generating the rules by the Apriori algorithm, we apply the Jaya algorithm for obtaining the optimal rules.
5 Experimental Results The proposed work is carried out in a Python environment. The datasets IBM-STD, QUAK, CHESS, and MUSHROOM that are retrieved from the UCI repository are used to test the performance analysis of the proposed work. The various datasets used in our tests are displayed in Table 1. Results show the average of 20 executions for each algorithm. Figure 2 illustrates the outcomes of each algorithm in terms of time consumption with four datasets with different sizes. With the majority of datasets, the suggested approach is found to perform better than the other algorithms. In fact, the runtime is inadequate to evaluate an algorithm with such swarminspired characteristics. An important consideration in determining an algorithm is good or not is the quality of the solution. In light of this, we evaluate the fitness function values of our proposal in comparison with the aforementioned techniques, and the outcomes are displayed in Fig. 3. Memory consumption, as was noted in the beginning, is one of the most problematic defects in traditional algorithms for mining association rules as a result of the
334
H. Sengar and A. Tiwari
START
INPUT DATASET
APPLY APRIORI ALGORITHM
FREQUENT ITEMSETS
GENERATE ASSOCIATION RULES
APPLY JAYA OPTIMIZATION ALGORITHM
OPTIMIZE ASSOCIATION RULES
STOP
Fig. 1 Flow chart for generating optimal association rules
26 Mining Optimal Patterns from Transactional Data Using Jaya Algorithm Table 1 Description of experimental datasets
Fig. 2 Comparison of the proposed method with existing approaches w.r.t time (second)
335
Datasets
Transaction size
Item-size
Average- size
IBM-STD
1000
20
20
QUAK
2178
4
5
Chess
3196
37
37
Mushroom
8124
23
23
100 80 60 40 20
BSO-ARM PSO-ARM PROPOSED
Fig. 3 Comparison of the proposed method with the existing approaches w.r.t. fitness
1.2 0.8 0.6 0.4 0.2
BSO-ARM PSO-ARM JAYA-ARM
vast amounts of data that are being stored. Comparing the suggested method against the traditional algorithms (Apriori, FP-Growth), and PSO-ARM. Figure 4 provides a summary of the findings. The result shows that the proposed algorithm has less memory usage compared with other algorithms. Fig. 4 Comparison of the proposed method with traditional approaches w.r.t memory usage (mb)
100 80 60 40 20
APRIORI FP-GROWTH PSO-ARM PROPOSED
336
H. Sengar and A. Tiwari
Figures 1, 2 and 3 show that, when compared with existing algorithms, the suggested work is optimized in terms of time required, fitness, and memory consumption. Consequently, the suggested methodology outperforms existing methods.
6 Conclusion The Jaya optimization technique for association rule mining was introduced in this research as a solution for finding optimal pattern. We performed the recommended method on four datasets, and the outcomes were contrasted with recently developed similar algorithms. Results show that the suggested algorithm is efficient when considering of execution time, quality, and memory use. The study finds that among the different associations that are derived from the dataset, the Jaya optimization algorithm would be able to identify the globally best association rules. This study demonstrates that the proposed methodology effectively optimizes the association rules. Future research should focus on creating more reliable methods for association rule mining by integrating two or more evolutionary algorithms and developing new fitness functions and also involve analyzing a sizable dataset and comparing it with other algorithms.
References 1. Frawley WJ, Piatetsky-Shapiro G, Matheus CJ (1992) Knowledge discovery in databases: an overview. AI Mag 13(3):57 2. Dhanda M, Guglani S, Gupta G (2011) Mining efficient association rules through apriori algorithm using attributes. Proc IJCST 2:342–344 3. Agrawal R, Srikant R, et al (1994) Fast algorithms for mining association rules. In: Proceedings 20th international conference very large data bases, VLDB, vol 1215. pp 487–499 4. Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. ACM SIGMOD Rec 29(2):1–12 5. Wang W, Bridges SM (2000) Genetic algorithm optimization of membership functions for mining fuzzy association rules 6. Kou Z, Xi L (2018) Binary particle swarm optimization-based association rule mining for discovering relationships between machine capabilities and product features https://doi.org/ 10.1155/2018/2456010 7. Djenouri Y, Drias H, Habbas Z (2014) Bees swarm optimization using multiple strategies for association rule mining. Int J Bio-Inspired Comput 6(4):239–249 8. Heraguemi KE, Kamel N, Drias H (2015) Association rule mining based on bat algorithm. J Computer Theory Nanosci 12(7):1195–1200 9. WaridWarid HH, Norman M, Izzri A-W (2016) Optimal power fow using the jaya algorithm. Energies 9(9):678 10. Warid W (2020) Optimal power fow using the amtpg-jaya algorithm. Appl Soft Comput 91:106252 11. Vinh LT, Son NN (2020) Parameters extraction of solar cells using modifedjaya algorithm. Optik 203:164034
26 Mining Optimal Patterns from Transactional Data Using Jaya Algorithm
337
12. Congcong W, Yichao H (2020) Solving the set-union knapsack problem by a novel hybrid jaya algorithm. Soft Comput 24(3):1883–1902 13. Amarendhar Reddy M, Ravindranath K (2020) Virtual machine placement using jaya optimization algorithm. Appl ArtifIntell 34(1):31–46 14. Maharana D, Kotecha P (2019) Optimization of job shop scheduling problem with grey wolf optimizer and jaya algorithm. Smart Innovations in Communication and Computational Sciences. Springer, New York, pp 47–58 15. Li J-Q, Deng J-W, Li C-Y, Han Y-Y, Tian J, Zhang B, Wang C-G (2020) An improved jaya algorithm for solving the fexible job shop scheduling problem with transportation and setup times. Knowl-Based Syst 200:106032 16. Mishra A, Shrivastava D (2020) A discrete jaya algorithm for permutation fow-shop scheduling problem. Int J Indus Eng Comput 11(3):415–428 17. GhavidelSahand AA, Li L (2018) A hybrid jaya algorithm for reliability-redundancy allocation problems. Eng Optim 50(4):698–715 18. El-Ashmawi WalaaH, Ali AhmedF, Slowik A (2020) An improved jaya algorithm with a modifed swap operator for solving team formation problem. Soft Comput 24:16627–16641 19. Degertekin SO, Lamberti L, Ugur IB (2018) Sizing, layout and topology design optimization of truss structures using the jaya algorithm. Appl Soft Comput 70:903–928 20. Shui-Hua W, Preetha P, Zheng-Chao D, Yu-Dong Z (2018) Intelligent facial emotion recognition based on stationary wavelet entropy and jaya algorithm. Neurocomputing 272:668–676 21. MohammedAAwadallah M-B, Hammouri AI, Alomari OA (2020) Binary jaya algorithm with adaptive mutation for feature selection. Arab J Sci Eng 45:10875–10890 22. More Kiran C, Rao RV (2020) Design optimization of plate-fn heat exchanger by using modifedjaya algorithm. In: Advanced engineering optimization through intelligent techniques. Springer, NewYork, pp 165–172 23. Long W, Zijun Z, Chao H, Leung TK (2018) A gpu-accelerated parallel Jaya algorithm for efficiently estimating li-ion battery model parameters. Appl Soft Comput 65:12–20 24. Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of ACM-SIGMOD International conference on management of Data, Washington D.C., pp 207–216 25. Sethi A, Mahajan P (2012) Association rule mining: a review. The Int J Comput Sci Appl 26. Han J, Kamber M (2012) In: Data mining concepts and techniques. Morgan Kaufmann 27. Rao R (2016) Jaya: a simple and new optimization algorithm for solving constrained and unconstrained optimization problems. Int J Indus Eng Comput 7(1):19–34 28. Rao R, More K, Taler J, Ocło´n P (2016) Dimensional optimization of a micro-channel heat sink using Jaya algorithm. Appl Therm Eng 103:572–582 29. .Zitar RA, Al-Betar MA, Awadallah MA et al (2022) An intensive and comprehensive overview of JAYA algorithm, its versions and applications. Arch Computat Methods Eng 29:763–792
Chapter 27
Accurate Diagnosis of Leaf Disease Based on Unsupervised Learning Algorithms S. Jacily Jemila, S. Mary Cynthia, and L. M. Merlin Livingston
1 Introduction The disease detection in plant leaf using visual recognition is a time-consuming work also the accuracy does not in satisfaction level. To overcome this, many recognition algorithms were developed with computer vision technique. Early disease detection is very essential but it requires continuous monitoring to avoid spreading of diseases. This is a time-consuming and costly process if diagnosis of leaf disease is done using manual inspection. Therefore image processing techniques are preferred to detect and classify the diseases. Rajesh et al. [1] and Wenxuan et al. [2] described the importance of preprocessing of image processing begins with removal of noise from image using various filtering techniques. The suitable segmentation method was selected based on its quality metrics [3]. The affected area of leaf was segmented using different types of segmentation algorithms based on the features [4]. Patil et al. [5] proposed content-based image retrieval (CBIR) system for retrieving diseased leaves of soybean. Kaur et al. [6] developed modified SVM for improving plant disease detection. The clustering segmentation algorithms yield better performance metrics [7, 8]. The performance of classification can be improved by combining the colour, texture and intensity features [9]. Reza et al. [10] used k-nearest neighbours (KNN) algorithm for leaf disease detection. S. Jacily Jemila School of Electronics Engineering, Vellore Institute of Technology Chennai, Chennai, India S. Mary Cynthia (B) Department of Electronics and Communication Engineering, Sri Venkateswara College of Engineering, Sriperumbudur, India e-mail: [email protected] L. M. Merlin Livingston Department of Electronics and Communication Engineering, Jeppiaar Institute of Technology, Sriperumbudur, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_27
339
340
S. Jacily Jemila et al.
Otsu thresholding-based segmentation is done to form the binary mask in the affected region [11]. Landge et al. [12] used colour transformation and neural networks (NNs) for classification of diseases. The classification of leaf brown spot and leaf blast disease of rice plant based on morphological changes caused by disease is done [13].
2 Proposed Work In this proposed work, lesion areas affected by diseases are segmented using different techniques, namely K-means clustering, level set method (LSM), fuzzy C-means level set method (FCMLSM), feature-reduction FCM (FRFCM), adaptively regularized kernel-based fuzzy C-means (ARKFCM) which are employed. Finally the segmented image is compared with ground truth to identify the suitable segmentation method. Figure 1 gives workflow of our research work. The captured input images were preprocessed to yield better segmented result in which contrast enhancement is used as a preprocessing technique to differentiate the disease affected area. Then the enhanced image was subjected to six segmentation methods and validated with the help of quality metrics. Input Leaf image
Preprocessing FCMLSM clustering
FRFCM clustering
Otsu Thresholding
Segmentation
ARKFCM clustering
K-Means Clustering
Level set method Validation
Fig. 1 Block diagram of proposed work
Contrast Enhancement
27 Accurate Diagnosis of Leaf Disease Based on Unsupervised Learning …
341
2.1 Image Segmentation The primary work in the process of identification of leaf disease is image segmentation. It is used to separate the required region from the input leaf image. In this work, six number of segmentation algorithms were applied for the diagnosis of leaf disease and suitable algorithms were identified with the help of performance metrics.
2.2 Unsupervised Learning Algorithms In unsupervised learning, the models are not supervised with the use of trained dataset. It is mainly classified into two types. They are clustering and association rules, where clustering divides data points and depends on correlation and dissimilarity.
2.3 Otsu Thresholding The most commonly used uncomplicated global thresholding method is Otsu segmentation method in which only one threshold is chosen based on the characteristics of pixel and image grey-level values. This algorithm will be very useful to isolate the required objects from the background of the image by setting single threshold value in which the separation of objects from background is possible by comparing the pixel value with the threshold value T if pixel value is greater than or equal to T value, then it will be considered as an object otherwise considered as a background [4].
2.4 K-Means Clustering Algorithm This clustering algorithm is one kind of unsupervised algorithm which is used to form k number of groups based on pixel properties of the image. The main aim of this algorithm is to reduce the Euclidean distance value from the centroid. The properties of data within the cluster are very similar compared with other regions.
2.5 Level Set Method This algorithm is used to define the desired objects from the other regions. The curves or interfaces separating these desired regions are set by zeros of level set functions.
342
S. Jacily Jemila et al.
The boundary is formed with the help of image gradient value and edge strength. It is one of the most impressive numerical techniques. The gradient value is formed with the help of finite difference method. The partial differential plays vital role in the computation of level set algorithm.
2.6 Fuzzy C-Means Algorithm Fuzzy C-means (FCM) algorithm is one of most powerful soft clustering method used to divide the image into number of groups based on the value of membership function. Its value is obtained by measuring the gap from all the data points to midpoint of the cluster. If the distance value is less that particular data point will be the member of the particular cluster. This fuzzy C-means algorithm is similar to K-means but the only one difference is the fuzzy logic model is used, i.e. instead of taking hard decision which uses the values between 0 and 1 and gives the number of clusters belongs to each pixel.
2.7 FCMLSM The comparison results show that the accuracy value of fuzzy C-means is higher than K-means clustering algorithm, but the disadvantage is computation time is also more for fuzzy C-means method. In order to reduce this computation time fuzzy C-means algorithm is used with level set method which improves the accuracy of level set method. This can be proved with the help of computation of quality metrics.
2.8 ARKFCM In this algorithm, clustering parameters are obtained in advance which reduces computational cost. Also kernel functions are used instead of standard Euclidean distance. In this method clustering parameters are independent ones.
2.9 FRFCM It uses morphological reconstruction to preserve image details in which the pixel’s membership depends only on the spatial neighbours. In this algorithm, the distance
27 Accurate Diagnosis of Leaf Disease Based on Unsupervised Learning …
343
between cluster centre and neighbours of pixels is not computed, so it is simpler and fast.
3 Segmentation Results The segmented images of different segmentation techniques are shown in Fig. 2.
4 Evaluation and Validation The segmented images were compared with the help of performance metrics. The value of these quality metrics can be determined with the help of following expressions and ground truth by means of image fusion process. Accuracy(Acc) = Recall =
Sum of all trues Sum of all subjects
True Positive Sum of true positive and false negative
Precision P = F1 Score =
True Positive Sum of all positives
2 × Recal × Precision Sum of Recall and Precision
(1) (2) (3) (4)
Corr-Co-efficient (Product of Trues) − (Product of Falses) = √ Product of sum of all positives & negatives × sum of TP and FN × Sum of TN and FP
Dice =
2 × True Negative × True Negative) + Sum of Falses (2
(6)
True Positive Sum of Positives and false negative
(7)
True Negative Sum of true negative and false positive
(8)
Jaccard Index J = Specificity(S) =
(5)
The above calculated performance metrics gives the variance among the segmented output image and the ground truth. Table 1 gives the values of performance metrics of different segmentation results. The accuracy value of applied segmentation algorithms in this work was compared and shown in Fig. 3.
344
S. Jacily Jemila et al.
Fig. 2 a Input image, b Contrast enhanced image, c K-means clustering, d Otsu segmentation, e LSM segmentation, f ARKFCM segmentation, g FRFCM segmentation, h FCMLSM segmentation
27 Accurate Diagnosis of Leaf Disease Based on Unsupervised Learning …
345
Table 1 Performance measures of leaf image for different methods Method
Acc
Recall
F1 score
P
Corr-coefficient
Dice
Index J
S
FRFCM
0.86
0.82
0.64
0.56
0.64
0.58
0.4
0.67
ARKFCM
0.89
0.87
0.71
0.58
0.67
0.62
0.5
0.72
FCMLSM
0.98
0.92
0.77
0.61
0.72
0.75
0.6
0.89
LSM
0.91
0.89
0.59
0.41
0.51
0.57
0.4
0.76
K-means
0.93
0.04
0.05
0.04
− 0.12
0.03
0.01
0.87
Otsu
0.84
0.05
0.06
0.06
− 0.12
0.05
0.02
0.84
Accuracy 1 0.9 0.8 0.7
Accuracy
Fig. 3 Performance metric comparison
Figure 4 represents some of the performance measures to define the performance of the used algorithms. The experimental results show that FCMLSM method yields better values while seeing quality metrics of other methods. Otsu method produces not very good result because it is not suitable for low grey contrast between foreground and background regions. 1 0.8
FRFCM
0.6
ARKFCM
0.4
FCMLSM
0.2
LSM
0 -0.2
Fig. 4 Comparison graph of performance measures of six methods
K-means Otsu
346
S. Jacily Jemila et al.
5 Conclusion The image segmentation can be used for several applications. Selection of suitable segmentation method is based on type of application. The performance metrics are used to identify the best segmentation technique for the particular application. In this work, the segmented results of six segmentation methods were compared based on quality metrics. The comparison results show that the FCMLSM yields high quality metrics, while looking at quality metrics of other methods. The experimental results clearly show that the combination of FCM with level set method gives better segmented output to extract the disease affected area from the leaf image accurately.
References 1. Rajesh MR, Mridula S, Mohanan P (2016) Speckle noise reduction in images using Wiener filtering and adaptive wavelet thresholding. In: Proceedings of the 2016 IEEE region 10 conference (TENCON), Singapore, 22–25 Nov 2016, pp 2860–2863 2. Wenxuan S, Jie L, Minyuan W (2010) An image denoising method based on multiscale wavelet thresholding andbilateral filtering. Wuhan Univ J Nat Sci 15:148–152 3. Jacily Jemila S, Brintha Therese A (2020) Selection of suitable segmentation technique based on image quality metrics. Imaging Sci J 67(8):475–480 4. Merlin Livingston LM, Mary Cynthia S, Region of interest prediction using segmentation. Int J Eng Adv Technol 9(5) 5. Patil JK, Kumar R (2017) Analysis of content based image retrieval for plant leaf diseases using color, shape and texture features. Eng Agric, Environ Food 10(2):69–78 6. Kaur R, Kang SS (2015) An enhancement in classifier support vector machine to improve plant disease detection. In: 2015 IEEE 3rd international conference on MOOCs, innovation and technology in education (MITE) IEEE 7. Jacily Jemila S, Brintha Therese A (2020) Artificial intelligence based myelinated white matter segmentation for a pediatric brain a challenging task. Recent Adv Comput Sci Commun 8. Busa S, Vangala NS, Grandhe P, Balaji V (2019) Automatic brain Tumour detection using fast fuzzy C-means algorithm. Springer 9. Mary Cynthia S, Merlin Livingston LM (2019) Automatic detection and classification of brinjal leaf diseases. Int J Innov Technol Explor Eng 8(12) 10. Reza ZN, Nuzhat F, Mahsa NA, Ali H (2016) Detecting jute plant disease using image processing and machine learning. In: 3rd International conference on electrical engineering and information communication technology (ICEEICT), (2016), pp 1–6 11. Al-Hiary H et al (2011) Fast and accurate detection and classification of plant diseases. Mach Learn 14(5) 12. Landge PS et al (2013) Automatic detection and classification of plant disease through image processing. Int J Adv Res Comput Sci Softw Eng 3(7):798–801 13. Phadikar S, Sil J, Das AK (2012) Classification of rice leaf diseases based on morphological changes. Int J Inf Electron Eng 2(3):460
Chapter 28
Modified Teaching-Learning-Based Algorithm Tuned Long Short-Term Memory for Household Energy Consumption Forecasting Luka Jovanovic , Maja Kljajic , Aleksandar Petrovic , Vule Mizdrakovic , Miodrag Zivkovic , and Nebojsa Bacanin
1 Introduction The importance of reduction and containment of energy consumption does not need emphasis. Even though the human race is becoming more aware of the damage it does, the ideal scenario, or to use a term from optimization, the sub-optimal scenario is far from reality. The global pollution problem is a problem that constantly needs to be mitigated. On the other hand, the problems of the current geopolitical situation in Ukraine are not yet experienced in full. Firstly, the prices of fossil-based energy sources have spiked all over the world. Secondly, the aftermath of battles has left the country and a considerable amount of Europe without sources of energy. These arguments support the theory that there is currently an energy crisis at hand. While the priority is to protect our only habitat, the shortage of energy has to be dealt with as well. Household energy forecasting has been considered as the main techniques for energy management and improvement of decision making process leading to less energy consumption [1]. L. Jovanovic · M. Kljajic · A. Petrovic · V. Mizdrakovic · M. Zivkovic · N. Bacanin (B) Singidunum University, Danijelova 32, 11000 Belgrade, Serbia e-mail: [email protected] L. Jovanovic e-mail: [email protected] M. Kljajic e-mail: [email protected] A. Petrovic e-mail: [email protected] V. Mizdrakovic e-mail: [email protected] M. Zivkovic e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_28
347
348
L. Jovanovic et al.
Researchers have been investigating ways to apply energy consumption forecasting (ECF) to real-world scenarios, which contributes to the solution of both previously mentioned problems. Conclusively, ECF can be widely applied. Unfortunately, this process is not without obstacles. For this method to work, the prospects of precision, stability, efficiency, and general reliability need improvements. New ways are explored to tackle this problem, this includes the use of artificial intelligence (AI) which authors strongly believe will make a difference. The real-world use of AI-related technologies is realistic, and the list of examples is not scarce. Ramon et al. [2] successfully created a model for wind-speed prediction with the use of stacking-ensemble learning and multi-stage decomposition. Ning et al. [3] present a highly accurate forecast model paralleling long short-term memory (LSTM) neural networks. The hybrid techniques have become dominant in the field of machine learning, and hence, this research will also test similar logic for the purpose of forecasting. The performance of ML models such as LSTM is highly dependent on a proper selection of hyperparameters. Due to a large number of parameters, and options presented by modern algorithms manual selection quickly becomes tedious and unfeasible. Therefore, novel techniques are applied to optimize the process of selection. When addressing optimization, few algorithms excel as much as swarm intelligence. This research proposed a novel modified form of the teaching-learning-based algorithm (TLB) [4] to address parameter optimization. The scientific contributions of the conducted research may be summarized as per the following: • The introduction of a novel modified TLB metastatic that overcomes the shortcomings of the basic algorithm. • The utilization of the introduced metaheuristic to hyperparameter selection for a LSTM neural network. • The application of the novel approach to forecasting individual household power consumption. The rest of the paper is structured as per to the following: In Sect. 2, relevant literature is presented, and the research gap is discussed. Section 3 presents the utilized method in this research. Section 4 describes the conducted experiments followed by the attained results and discussion presented in Sect. 5. In Sect. 6, the work is conducted and future research discussed.
2 Related Works and Background Energy can be found in many shapes, and hence, there is a significant amount of different cases to which the predicting mechanics have been applied. The forecasting of consumption is the most researched use case regarding energy predictions. Original methods used to forecast energy consumption were created for long-term forecasting (up to a year); however, modern times require real-time forecasting [5].
28 Modified Teaching-Learning-Based Algorithm Tuned …
349
Hence, contemporary methods are oriented toward short-term forecasting. For those purposes, deep learning (DL) and ML models are most suitable. In [6], Meng et al. provide a comprehensive study for predicting energy demand in seven regions of China. The authors applied a novel grey multivariate prediction approach for this purpose. The work of Bilal et al. [7] tackles the topic of renewable energy peer-to-peer trading by applying several forecasting models. The focus is on predicting consumption, as well as generation. The industrial field has a strong potential to benefit from energy forecasting models. Maciej et al. [8] explore the consumption of microgrids in manufacturing plants. The goal is efficiency improvement and for the seller of energy to be more competent in the field. Song et al. [9] proposed a optimized structure-adaptive grey model that has been applied to forecasting energy demands form nuclear sources. DL and ML methods have seen success in [1] when applied to the prediction of consumption in residential buildings. The problem presented affects everyone; hence, its importance is high. The mitigation of energy waste from residential structures has a twofold goal. Firstly, it will benefit the environment that we will in health-wise. Secondly, the individual benefit of possible cost reduction is existent. A hybrid LSTM model combined with stationary wavelet transform (SWT) technique is applied in [5] for energy consumption prediction in terms of individual households. Novel methods have seen great success when applied to addressing existing problems in traditional power supply grids. The use of smart grids has improved forecasting in the medium and short term [10]. Fault detection plays a crucial part in power network reliability. Researchers have proposed novel methods to handle transformer fault detection [11, 12] that help preemptively detect and address transformer failures reducing maintenance delays and costs of operation Renewable energy is an ever-trending topic, and the field of predictions regarding energies is particularly interesting to researchers due to irregularities in intensity. The speed of wind varies without a match pattern. Therefore, a forecasting framework is in order. Short-term wind-speed forecasting has also been tackled by researchers in [2]. Finally, when it comes to variables that help in energy consumption forecast, aside historical data on consumption, authors [13] investigated impact of climate variables; however, additional variables are also investigated
2.1 LSTM Overview DL methods are the base of artificial neural networks (ANN) [14]. The inspiration for this type of network comes from the learning procedures of the human brain. The neurotransmitters are simulated, while the information is exchanged between them. The potential of ANNs is the strong learning capability through the training process which results in a method that is capable of recognizing relations between subjects of it is computation. Nonlinear problems are an example of the exploitation of such solutions.
350
L. Jovanovic et al.
The LSTM model is considered the most chosen type of recurrent neural network applied for a set of problems similar to the topic of this paper. Storing feature allows LSTM networks to save information within their network. Therefore, the future outcomes will have the advantage of previously obtained knowledge. Time-series prediction requires such features. The replacement of the memory cells in traditional networks with those that are in hidden layers. Three different types of gates are used to filter the data that will be saved. They are the input, output, and forget gates. The forget gate f t is the first for the data to pass through. The following Eq. (1) describes the process of decision on the retained data being removed from the node. f t = σ (W f xt + U f h t−1 + b f ),
(1)
in which the f c is the gate of range [0, 1]. σ shows the activation function. W f , U f represent the variable weight matrices and the bias vector is b f . The following gate is used to determine what data will be retained. Equation (2) determines the renewed values that are to be selected by the i t gate sigmoid function. i t = σ (Wi X t + Ui h t−1 + bi ),
(2)
where the range of i t is (0, 1), bi Wi , and Ui are the input gate learnable parameters, and the Ct shows the potential update vectors calculated by Eq. (3). Ct = tan h(Wc xt + Uc h t−1 + bc ),
(3)
where a series of learnable parameters is likewise represented by bc , Wc , and Uc . Afterward, the state of Ct is produced by Eq. (4). Ct = Ft Ct−1 + i t Ct
(4)
with element-wise multiplication and signification. Data marked for disposal is shown as Ct−1 . Once more, the f t Ct gives the data for storage. The memory cell that Ct will store the new data is i t Ct . Sigma function allows for calculation of the hidden state h T value ot output gate according to Eq. (5). ot = σ (Wo xt + Uo h t−1 + bo ),
(5)
h t = ot tan h(Ct ),
(6)
where the range (0, 1) defines the ot , while the learnable parameters for input gates are given as bo , Wo , and Uo . The output value h t is represented as the product of ot and tanh value of Ct by the Eq. (6).
28 Modified Teaching-Learning-Based Algorithm Tuned …
351
2.2 Metaheuristics Optimization The go-to choice for solving NP-hard problems has become metaheuristics-based algorithms. Swarm intelligence (SI) has been distinguished as the most successfully applied group of these algorithms. The inspiration for SI algorithms is produced from various animals and insects [15, 16]. The mathematical models created for the purpose of these algorithms mimic nature’s behavior such as the foraging, hunting, and mating processes. Examples of strongly applied solutions from this group that were improved by future development are bat algorithm (BA) [17], artificial bee colony (ABC) [18], PSO [19], and firefly algorithm (FA) [20], as well as the novel (ChOA) [21] and (RSA) [22]. On the other hand, another metaheuristics group has emerged inspired by the process in mathematics. Examples of these such solutions are most likely the most popular algorithm the sine cosine algorithm (SCA) [23], alongside the arithmetic optimization algorithm (AOA) [24] The laws and regulations of physics have been exploited for another category of metaheuristics named physics-based algorithms (PA). Examples of such are the central force optimization (CFO) [25], Big bang big crunch (BBBC) [26], and gravitational search algorithm (GSA) [27]. As NP complexity is common in real-world problems, the application of metaheuristics has been various. The examples can range from workflow scheduling optimization of cloud-edge systems [28, 29], neural network optimization [30, 31], lifetime maximization of the wireless sensors networks and sensing node localization [32], computer-assisted MRI classification and illness detection [33, 34], to COVID-19 infections forecasts [35, 36].
3 Proposed Method 3.1 Teaching-Learning-Based Algorithm Two distinctive phases of the teaching-learning-based algorithm (TLB) are the teacher and learner phases. The population PG is given as PG = [X 1,G , X 2,G , ..., X N p ,G ] at any generation G. The vector of the ith individual is represented by the X i,G , while the size of the population is N p . X i,G = [x1i,G , x2i,G , ..., x Di ,G ]T defined subjects dimensions D holds every vector X i,G (i = 1, 2, ..., N p ). The role of the teacher is assigned to the unit that possesses the best fitness X t,G of the current generation G. The rule applied in the teacher phase states that all the learners acquire knowledge from the teacher that is shown as a vector Vi,G = [v1i,G , v2i,G , ..., v Di,G ]T : Vi,G = X i,G + ri (X t,G − TF MG ),
(7)
352
L. Jovanovic et al.
for which the i = 1, 2, ..., N p represents the individuals mean vector, the random values are given as MG , ri ∈ (0, 1), and the TF = r ound[1 + rand(0, 1)2 − 1] shows the learning weight. Upon completion of the teacher phase the generation G is incremented, while the future population PG+1 consists of X i,G+1 (i = 1, 2, ..., N p ). The fitness values Vi,G and X i,G are evaluated for the updating of the individual of generation G + 1: X i,G , i f f (X i,G ) ≤ f (Vi,G ), X i,G+1 = (8) Vi,G , otherwise for which the fitness function is given as F(·). The difference from the teacher phase is that the knowledge is transferred between the learners as well. The learner phase vector is Ui,G = [u 1i,G , u 2i,G , ..., u Di,G ]T derived from Eq. 9. X m,G , +rm (X m,G − X n,G ), i f f (X m,G ) < f (X n,G ), Ui,G = (9) X m,G , +rm (X m,G + X n,G ), otherwise for which the range of the rm random value is given (0, 1), while the m = n the randomly selected units of the current population are X m,G and X n, G. After the learner phase finishes, the generation G is incremented, while the new population PG + 1 is formed with the use of X i,G+1 (i = 1, 2, ..., N p ) individuals. The updating of fitness values Vi,G and X i,G at G + 1 generation by the next equation: X i,G , i f f (X i,G ≤ f (Ui,G )). X i,G+1 = (10) Ui,G , otherwise
3.2 Modified TLB Algorithm The original TLB [4] algorithm shows generally good performance when tackling complex optimizations. However, extensive empirical evaluation with standard CEC benchmark functions has revealed that in some runs, during early iterations of the optimization, the algorithms can dwell on lsed than optimal regions within search space. Caused by a lack of a sophisticated exploration mechanism, certain runs may attain lower performance results. To address the shortcomings of the original algorithms, certain modifications are introduced, and the resulting algorithm is dubbed the modified TLB (MTLB). These modifications include the introduction of a trial attribute to each potential solution. Following an algorithm iteration, should a population not improve in quality the trial attribute is incremented. An additional introduced parameter is a predefined limit. Should an iterations trial value exceed limit it is replaced by a new quasireflexive solution. This rule does not however apply to the best-attained solution. The
28 Modified Teaching-Learning-Based Algorithm Tuned …
353
value used for the limit parameter has been empirically determined to give the best , where Tmax represents the maximum number of iterations. results when limit = Tmax 3 The quasi-reflection-based learning (QRL) mechanism [37] is applied to augment exploration, as well as boost convergence speed. Opposite values are generated from a given solution according to Eq. (12) x oj = lb j + ub j − x j ,
(11)
where x o represents a solution from the opposite region of solution x j , lb and ub represent upper limit and lower limit of the search space respectfully. The quasi-opposite value is determined using Eq. (12). lb j + ub j o 00 (12) , xj , x j = r nd 2 lb +ub+ j
where the mean of the lower and upper limits are determined as j 2 lb j +ub+ j o ingly, Eq. (12) selects a random value form a of range , x j . 2
. Accord-
qr
Finally, the location of a quasi-reflexive component X j is determined according to Eq. (13). lb j + ub j qr (13) X j = r nd , xj , 2 where the mean of theboundaries is determined and random values are selected from lb +ub+ j a range j 2 , x j . By adopting this approach, quasi-reflexive solutions are generated that can be reintegrated into a population boosting the algorithm’s exploratory power and improving convergence rates. The pseudocode for the described modified algorithm is shown in Algorithm 1.
4 Experimental Setup The following section presents the experimental setup as well as the utilized dataset. Additionally, the dissection of the parameters selected for optimization is justified.
4.1 Dataset To assess the performance of the introduced method, this research uses a dataset acquired from the London Data Store that can be found on Kaggle.1 The dataset 1
https://www.kaggle.com/code/rheajgurung/energy-consumption-forecast/notebook.
354
L. Jovanovic et al.
Algorithm 1 Modified TLB pseudocode Population bounds setting; Generate a randomized initial population P0 ; Assign trail parameter to every solution in P0 G = 0; while t < T Max do Teacher phase: Teacher selection X t,G and mean vector calculation M G ; Implement the teacher learning law according to Eq. (7); Bounds check; Update the population according to Eq. (8); Learner phase: Randomly select two individuals X m,G and X n,G where = m = n ; Implement the learner learning law according to Eq. (9); Bounds check; Update the population according to Eq. (10); Determine fitness of each solution for Each solution in P0 do if Solution = Best Solution then if Solution fitness hasn’t improved then Increment solution trial end if if Solution trial > limit then Replace solution with quasi-reflexive solution according to Eq. (13) end if end if end for G + +; end while return Best attained solution
Fig. 1 Household energy dataset, with train, validation, test split shown
describes electricity consumption of 5, 567 London households. The data covers a period from November 2011 to February 2014. Samples have been recorded using smart power meters with a daily resolution of 19, 752 data points total. Individual household power consumption samples have been aggregated and the resulting 829 samples have been used as a target variable for univariate time-series forecasting. The available data has been split with 70% used for training, the next 10% was utilized to validate the results, and the final 20% for data was reserved for testing. A visual representation of the dataset and this data split is presented in Fig. 1.
28 Modified Teaching-Learning-Based Algorithm Tuned …
355
4.2 Experiments The evaluation process subjected several modern metaheuristic algorithms to a comparative analysis. Alongside the proposed MTLB algorithm, several other algorithms have been tasked with selecting optimal parameters of an LSTM network. These include the original TLB [4] algorithm, the well-known ABC [18], FA [20] as well as two novel algorithms the ChOA [21] and RSA [22]. Each metaheuristic has been assigned a population limit of five individuals and given six iterations to improve potential solutions. Furthermore, testing has been conducted over 20 independent runs to compensate for the random nature intrinsic to these algorithms. A lower number of iterations and smaller populations were utilized due to limited computational resources. The guiding objective function used is the mean square error (MSE) shown in Eq. (15). However, additional metrics have been taken into consideration so as to provide a more thorough evaluation including mean absolute error (MAE) shown in Eq. (14), root mean square error (RMSE) shown in Eq. ( 16), as well as the R 2 metric shown in Eq. (17). MAE =
n 1 yi − yˆi , n i =1
(14)
2 n 1 yi − yˆi MSE = , n i =1 yi
(15)
n 1
2 RMSE = yi − yˆi , n i =1 n R =1− 2
i =1 n i =1
yi − yˆi
(16)
2
(yi − y¯ )2
.
(17)
The LSTM networks have been provided with six steps of input data and tasked with casting forecasts three steps ahead. The LSTM parameters selected for optimization with their respective range include learning rate range [0.0001, 0.01], quantity training epochs range [300, 600], dropout rate range [0.05, 0.2], number of hidden layers range [1, 2], the amount of neurons in the hidden layers range [50, 200]. These parameters have been selected for optimization since empirical analyses have determined that these have the highest impact on performance. Additionally, given that some parameters present a continuous range while others possess a discreet one, this optimization can be considered a mixed NP-hard problem.
356
L. Jovanovic et al.
Fig. 2 Described experimental procedure flowchart
All proposed methods have been independently implemented for this research. All implementations are done in Python using standard ML libraries including Pandas, Numpy, Sci kit, and TensorFlow. A flowchart for the described process in provided in Fig. 2
5 Results and Discussion The overall objective function (MSE) over 20 in dependant runs in terms of best, worst, mean, median, standard deviation, and variance is given in Table 1. Additionally, detailed metrics with both normalized and denormalized metrics for the best performing runs are given in Tables 2 and 3. The best obtained metrics are marked with bold style. By considering the attained results, it may be deduced that the introduced MTLMLSTM approach exhibits superior performance when considering overall metrics. It attained the best median and mean results on overall scores. Furthermore, detailed metrics indicate that the proposed metaheuristic attained the best results for twostep-ahead predictions compared to other best performing models. While the novel LSTM-RSA attained the best results casting one step ahead forecasts, and the ABC attained the best results three steps ahead. This is to be expected as according to the no free lunch [38] theorem of optimization no single approach works best in all cases. Nevertheless, even in cases where other algorithms performed marginally better in terms of R 2 the introduced algorithm presents the best MSE values.
28 Modified Teaching-Learning-Based Algorithm Tuned …
357
Table 1 Overall objective function metrics over 20 independent runs for 3-steps ahead predictions Method Best Worst Mean Median Std Var LSTMMTLB LSTM-TLB LSTMABC LSTM-FA LSTMChOA LSTM-RSA
9.93E−04
1.00E−03
9.96E−04
9.94E−04
3.39E−06
1.15E−11
9.97E−04 9.95E−04
1.01E−03 1.01E−03
1.00E−03 1.00E−03
1.00E−03 1.00E−03
3.66E−06 4.18E−06
1.34E−11 1.75E−11
9.97E−04 9.94E−04
1.01E−03 1.01E−03
1.00E−03 1.00E−03
1.01E−03 1.00E−03
4.80E−06 4.27E−06
2.30E−11 1.83E−11
9.98E−04
1.00E−03
9.99E−04
9.99E−04
1.47E−06
2.16E−12
Table 2 Detail R 2 , MAE, MSE, and RMSE normalized metrics for best generated LSTM model for three steps ahead Metric LSTMLSTMLSTMLSTM-FA LSTMLSTMMTLB TLB ABC ChOA RSA One-step ahead
R2
MAE MSE RMSE Two-steps R 2 ahead MAE MSE RMSE ThreeR2 steps ahead MAE MSE RMSE Overall R2 results MAE MSE RMSE
0.740884
0.742058
0.737835
0.740074
0.739935
0.742178
0.024646 0.001002 0.031660 0.743717
0.025022 0.000998 0.031587 0.741935
0.024950 0.001014 0.031846 0.743507
0.024975 0.001006 0.031710 0.742810
0.024885 0.001006 0.031718 0.743278
0.024905 0.000997 0.031581 0.739755
0.024434 0.000991 0.031487 0.745556
0.025149 0.000998 0.031596 0.743120
0.024679 0.000992 0.031500 0.746670
0.024857 0.000995 0.031543 0.743937
0.024710 0.000993 0.031514 0.746057
0.024953 0.001007 0.031729 0.743922
0.024362 0.000984 0.031374 0.743386
0.025112 0.000994 0.031524 0.742371
0.024436 0.000980 0.031305 0.742671
0.024904 0.000991 0.031473 0.742274
0.024625 0.000982 0.031343 0.743090
0.024909 0.000991 0.031474 0.741951
0.024481 0.000993 0.031507
0.025095 0.000997 0.031570
0.024688 0.000995 0.031551
0.024912 0.000998 0.031575
0.024740 0.000994 0.031525
0.024923 0.000998 0.031595
358
L. Jovanovic et al.
Table 3 Detail R 2 , MAE, MSE, and RMSE denormalized metrics for best generated LSTM model for three steps ahead
One-step ahead
Two-steps ahead
Three-steps ahead
Overall results
Metric
LSTMMTLB
LSTM-TLB
LSTM-ABC LSTM-FA
LSTMChOA
LSTM-RSA
R2
0.741
0.742
0.738
0.740
0.740
0.742
MAE
740.450
751.766
749.593
750.349
747.638
748.253
MSE
904780.504
900682.351
915427.484
907610.001
908096.043
900264.616
RMSE
951.200
949.043
956.780
952.686
952.941
948.823
R2
0.744
0.742
0.744
0.743
0.743
0.740
MAE
734.085
755.559
741.438
746.786
742.3905
749.690
MSE
894888.222
901112.631
895621.035
898056.163
896421.876
908722.734
RMSE
945.985
949.270
946.376
947.658
946.796
953.270
R2
0.746
0.743
0.747
0.744
0.746
0.744
MAE
731.931
754.471
734.161
748.205
739.825
748.370
MSE
888469.104
896973.406
884579.429
894120.576
886719.449
894172.606
RMSE
942.5864
947.087
940.521
945.579
941.658
945.607
R2
0.743
0.742
0.743
0.742
0.743
0.742
MAE
735.489
753.932
741.731
748.447
743.285
748.771
MSE
896045.944
899589.463
898542.650
899928.913
897079.122
901053.319
RMSE
946.597
948.467
947.915
948.646
947.143
949.238
To more clearly emphasize the improvements made to convergence rates of the proposed algorithms convergence plots for all metaheuristics have been provided in Fig. 3, followed by distribution box and violin plots for both convergence and error rates. AS shown in Fig. 3, a clear improvement has been made to convergence rates by the modified metaheuristic over the original algorithm. Additionally, the described approach demonstrates a faster increase in R 2 scores, and a fairly tight result grouping around the projected optimal, suggesting greater reliability in dynamic tasks. Finally, the parameter selections made by each metaheuristic are given in Table. 4. The predictions made by the best performing model compared to actual values are given in Table 4.
6 Conclusion The presented work puts forward a proposal for a novel approach to tackling an increasingly pressing issue in the energy sector. Increased demand and limited supply of energy on the global market have driven demand for a reliable mechanism for forecasting power consumption. This work proposed a novel LSTM network-based
28 Modified Teaching-Learning-Based Algorithm Tuned …
359
Fig. 3 Objective function and error rate convergence and distribution plots Table 4 Control parameters selected by the best performing model of each metaheuristic Learning epochs dropout Layers Neurons Neurons rate layer 1 layer 2 LSTMMTLB LSTM-TLB LSTMABC LSTM-FA LSTMChOA LSTM-RSA
0.010000
345
0.200000
2
200
50
0.009964 0.009355
567 600
0.175349 0.092644
1 2
176 90
138 78
0.010000 0.008615
600 545
0.200000 0.118308
1 2
148 111
183 105
0.010000
600
0.108903
1
200
134
approach for forecasting individual household power demand. Additionally, as the performance of LSTM is dependent on making a proper selection of hyperparameter values, a novel metaheuristic is also introduced and tasked with optimizing performance through hyperparameter tuning. While high computation demands to limit the extent of testing the introduced techniques has been compared to several contemporary metaheuristic algorithms including the original TLB algorithm, ABC, FA as well as two novel metaheuristics the ChOA and RSA. These algorithms tackling the same task have been evaluated and the introduced metaheuristic attained admirable
360
L. Jovanovic et al.
Fig. 4 Predictions made by best performing model
results, outperforming all other tested algorithms in overall scores. When comparing competing LSTM models, it can noticed that overall results of MTLB-LSTM model had a mean absolute error of 0.024481, confirming that energy consumption could be forecast with high precision. As with any research work, this paper only presents a small part of the potential of the introduced algorithm. Limited computational resources and available data are just some of the hurdles that will be addressed in the future works. Priority will be given to exploring and refining the introduced approach and applying it to address other pressing real-world issues in the energy sector. Acknowledgements The paper is supported by the Ministry of Education, Science and Technological Development of Republic of Serbia, Grant No. III-44006.
References 1. Olu-Ajayi R, Alaka H, Sulaimon I, Sunmola F, Ajayi S (2022) Building energy consumption prediction for residential buildings using deep learning and other machine learning techniques. J Build Eng 45:103406 2. da Silva RG, Moreno SR, Dal Molin Ribeiro MH, Larcher JHK, Mariani VC, dos Santos Coelho L (2022) Multi-step short-term wind speed forecasting based on multi-stage decomposition coupled with stacking-ensemble learning approach. Int J Electrical Power Energy Syst 143:108504 3. Jin N, Yang F, Mo Y, Zeng Y, Zhou X, Yan K, Ma X (2022) Highly accurate energy consumption forecasting model based on parallel LSTM neural networks. Adv Eng Inform 51:101442 4. Venkata Rao R, Savsani VJ, Vakharia DP (2011) Teaching–learning-based optimization: a novel method for constrained mechanical design optimization problems. Comput-Aided Des 43(3):303–315 5. Yan K, Li W, Ji Z, Qi M, Du Y (2019) A hybrid LSTM neural network for energy consumption forecasting of individual households. IEEE Access 7:157633–157642 6. Wang M, Wang W, Wu L (2022) Application of a new grey multivariate forecasting model in the forecasting of energy consumption in 7 regions of china. Energy 243:123024 7. Abu-Salih B, Wongthongtham P, Morrison G, Coutinho K, Al-Okaily M, Huneiti A (2022) Short-term renewable energy consumption and generation forecasting: A case study of Western Australia. Heliyon 8(3):e09152
28 Modified Teaching-Learning-Based Algorithm Tuned …
361
8. Slowik M, Urban W (2022) Machine learning short-term energy consumption forecasting for microgrids in a manufacturing plant. Energies 15(9):3382 9. Ding S, Tao Z, Zhang H, Li Y (2022) Forecasting nuclear energy consumption in China and America: an optimized structure-adaptative grey model. Energy 239:121928 10. Hayder IA, Habib MA, Ahmad M, Mohsin SM, Khan FA, Mustafa K et al (2023) Enhanced machine-learning techniques for medium-term and short-term electric-load forecasting in smart grids. Energies 16(1):276 11. Soni R, Chaudhari K (2016) An approach to diagnose incipient faults of power transformer using dissolved gas analysis of mineral oil by ratio methods using fuzzy logic. In: 2016 International conference on signal processing, communication, power and embedded system (SCOPES). IEEE, 2016, pp 1894–1899 12. Soni R, Mehta B (2022) Graphical examination of dissolved gas analysis by ratio methods and Duval triangle method to investigate internal faults of power transformer. Mater Today: Proc 62:7098–7103 13. Xu A, Tian M-W, Firouzi B, Alattas KA, Mohammadzadeh A, Ghaderpour E (2022) A new deep learning restricted Boltzmann machine for energy consumption forecasting. Sustainability 14(16):10081 14. Yegnanarayana B (2009) Artificial neural networks. PHI Learning Pvt. Ltd. 15. Raslan AF, Ali AF, Darwish A (2020) 1—swarm intelligence algorithms and their applications in Internet of Things. In: Swarm intelligence for resource management in Internet of Things, Intelligent Data-Centric Systems. Academic Press, pp 1–19 16. Rostami M, Berahmand K, Nasiri E, Forouzandeh S (2021) Review of swarm intelligencebased feature selection methods. Eng Appl Artif Intell 100:104210 17. Yang X-S, Gandomi AH (2012) Bat algorithm: a novel approach for global engineering optimization. Eng Comput 29(5):464–483 18. Karaboga D, Basturk B (2008) On the performance of artificial bee colony (ABC) algorithm. Appl Soft Comput 8(1):687–697 19. Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN’95— international conference on neural networks, vol 4, pp 1942–1948 20. Yang, X-S (2009) Firefly algorithms for multimodal optimization. In: Watanabe O, Zeugmann T (eds) Stochastic algorithms: foundations and applications. Springer Berlin Heidelberg, pp 169–178 21. Khishe M, Mosavi MR (2020) Chimp optimization algorithm. Expert Syst Appl 149:113338 22. Abualigah L, Elaziz MA, Sumari P, Geem ZW, Gandomi AH (2022) Reptile search algorithm (RSA): a nature-inspired meta-heuristic optimizer. Expert Syst Appl 191:116158 23. Mirjalili S (2016) SCA: a sine cosine algorithm for solving optimization problems. KnowlBased Syst 96:120–133 24. Abualigah L, Diabat A, Mirjalili S, Elaziz MA, Gandomi AH (2021) The arithmetic optimization algorithm. Comput Methods Appl Mech Eng 376:113609 25. Formato RA (2007) Central force optimization. rog Electromagn Res 77(1):425–491 26. Erol OK, Eksin I (2006) A new optimization method: big bang-big crunch. Adv Eng Softw 37(2):106–111 27. Rashedi E, Nezamabadi-Pour H, Saryazdi S (2009) GSA: a gravitational search algorithm. Inf Sci 179(13):2232–2248 28. Bezdan T, Zivkovic M, Tuba E, Strumberger I, Bacanin N, Tuba M (2020) Multi-objective task scheduling in cloud computing environment by hybridized bat algorithm. In: International conference on intelligent and fuzzy systems. Springer, pp 718–725 29. Bacanin N, Bezdan T, Tuba E, Strumberger I, Tuba M, Zivkovic M (2019) Task scheduling in cloud computing environment by grey wolf optimizer. In: 2019 27th Telecommunications forum (TELFOR). IEEE, pp 1–4 30. Strumberger I, Tuba E, Bacanin N, Zivkovic M, Beko M, Tuba M (2019) Designing convolutional neural network architecture by the firefly algorithm. In: 2019 International young engineers forum (YEF-ECE). IEEE, pp 59–65
362
L. Jovanovic et al.
31. Bacanin N, Bezdan T, Zivkovic M, Chhabra A (2022) Weight optimization in artificial neural network training by improved monarch butterfly algorithm. In: Mobile computing and sustainable informatics. Springer, pp 397–409 32. Bacanin N, Tuba E, Zivkovic M, Strumberger I, Tuba M (2019) Whale optimization algorithm with exploratory move for wireless sensor networks localization. In: International conference on hybrid intelligent systems. Springer, pp 328–338 33. Bezdan T, Zivkovic M, Tuba E, Strumberger I, Bacanin N, Tuba M (2020) Glioma brain tumor grade classification from MRI using convolutional neural networks designed by modified fa. In: International conference on intelligent and fuzzy systems. Springer, pp 955–963 34. Basha J, Bacanin N, Vukobrat N, Zivkovic M, Venkatachalam K, Hubálovsk`y S, Trojovsk`y P (2021) Chaotic Harris hawks optimization with quasi-reflection-based learning: an application to enhance CNN design. Sensors 21(19):6654 35. Zivkovic M, Bacanin N, Venkatachalam K, Nayyar A, Djordjevic A, Strumberger I, Al-Turjman F (2021) Covid-19 cases prediction by using hybrid machine learning and beetle antennae search approach. Sustain Cities Soc 66:102669 36. Zivkovic M, Venkatachalam K, Bacanin N, Djordjevic A, Antonijevic M, Strumberger I, Rashid TA (2020) Hybrid genetic algorithm and machine learning method for covid-19 cases prediction. In: Proceedings of international conference on sustainable expert systems: ICSES 2020, vol 176. Springer Nature, p 169 37. Rahnamayan S, Tizhoosh HR, Salama MMA (2007) Quasi-oppositional differential evolution. In: 2007 IEEE congress on evolutionary computation. IEEE, pp 2229–2236 38. Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput 1(1):67–82
Chapter 29
Chaotic Quasi-oppositional Chemical Reaction Optimization for Optimal Tuning of Single Input Power System Stabilizer Sourav Paul , Sneha Sultana , and Provas Kumar Roy
1 Introduction To spontaneously control the terminal voltages massive number of generators were fitted with voltage controller. As it is noted in some literature [1], the voltage controller’s increased gain had a negative impact on the feed circuit’s dynamic stability. Low magnitude and low frequency often prolong in some scenarios even restrict the power transfer capacity. As a result, low frequency oscillation or electro-mechanical oscillations (EMO) [2] conventionally takes place in large power system and occasionally makes the feeding system unstable. The tools of analysis of small signals present today make it possible to characterize both the source and the nature of fashion. Somehow, as the power systems being highly nonlinear system machines specifications may change with loading and time. Dynamic features also vary in various ways. After changing system operating conditions, the PSS may not provide satisfactory damping in an unstable system. Numerous approaches based upon the modern control theory were applied to the design of various stabilizing structures of power systems. This involves optimal control, adaptive control, variable, and intelligent structure control that are developed in [3, 4]. The weighted mean of vector (INFO) optimizer notion is known as INFO-GBB for the optimal tuning of the PSS model used in the single machine infinite bus (SMIB) system in conjunction with chaoticorthogonal-based learning (COBL) techniques [5]. By comparing the results with eighteen well-known algorithms, the proposed INFO-usefulness GBBs and efficacy were confirmed. The proposed algorithms outperform the majority of the algorithms S. Paul (B) · S. Sultana Dr. B. C. Roy Engineering College, Durgapur, India e-mail: [email protected] URL: http://www.bcrec.ac.in P. K. Roy Kalyani Government Engineering College, Kalyani, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_29
363
364
S. Paul et al.
considered, with an average rank of more than 61% for the benchmark tasks. Klein et al. [6] demonstrated the influence of the PSS on inter-regional and local oscillations in the interconnected power system. Using generic and multiband (MB) PSS [7], a hybrid 9 MW wind farm paired with a mini/micro hydro power plant has been evaluated. In addition, the proposed system is linked to a 200 MVA, 25 KV utility-grid system to demonstrate the accuracy of the suggested Simulink model [8]. Features smart, fuzzy-based synchronized control for AVR & PSS, to avoid declining concurrently succeeding unforeseen failures & reaching a relevant control point. In [9, 10], ANN techniques were employed to design PSS. The PSS parameters have been effectively tuned using a variety of well-known population-based optimization approaches, including particle swarm optimization (PSO) [11], honey bee mating optimization [12], and others, to get the best fitness functions. Many areas of science and technology have found success using the aforementioned strategies [13]. In order to apply the idea of a deterministic and probabilistic approach by coordinating numerous PSS to increase small signal stability (SSS) of the system, Gurung et al. [14] proposed modifying the IEEE 68 bus system. The problem was then finally resolved utilizing a novel directional BAT algorithm. Panda et al. [15] used the PSO (HBFOA-PSO) algorithms and BFO (Hybrid Bacteria Foraging Optimization) to study the optimal PSS and static synchronous series compensator (SSSC) parameters. The transient stability of both the SMIB system and the multi-machine system was improved by the author using the aforementioned strategies. The experiments outcome was applied to a grid sample of four two-zone machines to reduce low frequency fluctuations of native and inter-regional methods and thus enhance the steadiness of the system. The Cuckoo Search (CS) algorithm is used by Elazim et al. [16] to tune PSS parameters to their ideal values. The above stated problems opens the door for modern researchers to include novel algorithms, in particular free input control parameters, into all sorts of optimization problems with the possibilities of solving the desired targeted function. The damping rotor oscillations in machine feeding systems equipped with PSS have the major focus for the assignment in this indicated chapter. In addition, the inception of the fact that among the classic phase compensation PSS single input. The remaining pieces of paper were organized as follows. Several PSS models including the SMIB have been thoroughly described in Sect. 2. Section 3 illustrates the problem formulation and the system constraints. Sections 4,5, and 6 are devoted to the solution strategy, which explains the PSS optimization technique. The simulation results and diverse loading situations were carefully considered and expressed in Sect. 7, and the work was finally completed in Sect. 8.
2 SMIB Model The present study incorporates a SMIB system [4] as illustrated in Fig. 1. Figure 2 represents the pictorial representation of AVR and PSS [10]. PSS’s goal is to create a damping torque component, and speed deviation is employed as a logical signal to
29 Chaotic Quasi-oppositional Chemical Reaction Optimization …
365
Fig. 1 SMIB test system
Fig. 2 AVR and PSS block diagram depiction
control generator excitation. The inclusion of high gain thyristor exciter, synchronous generator, AVR, and PSS block in SMIB system has been illustrated in Fig. 3 [10]. The Heffron-Phillips model’s CPSS, which is a single input conventional PSS, is used to illustrate the linearized model of the SMIB system in Fig. 1, where the input is represented by Δωr and the output by ΔV P SS . The CPSS primarily consists of a gain phase (K P SS ) that determines how much damping the stabilizers will apply, a washout stage that is designed to only react to speed oscillations, and a block stage that blocks dc offsets. The reason for incorporating Lead-lag compensators (T1 − T4 ) at the final stage is to compensate for the phase shift often caused due to the presence of AVR and field circuit of the generators. The lead-lag parameters are adjusted to give the rotor a damping torque when speed oscillations occur. The literature survey also justifies that in present scenario, modern excitation system is also predominantly relying on this interaction owing to the fact that these tools produce comparatively excessive gain at high frequency [20]. For light loading requirements for alternators
366
S. Paul et al.
Fig. 3 SMIB system’s block diagram shows the thyristor high gain exciter, synchronous generator, PSS, and AVR
Fig. 4 Heffron-Phillips model
where the intrinsic mechanical damping are predominately kept lower, this may result in stabilizer-torsional instability with an overly responsive excitation system. The four linear differential equations of first order serve as the synchronous machine model representation Eqs. (1–4). The fourth-order generator model is represented by these equations. Utilizing saturated mutual inductance values, magnetic saturation is either disregarded or taken into account based on the various levels of complexity as (Fig. 4): d = ω − ω0 , dt
(1)
d! 1 = (−Dω + TM + −Te ) , dt 2H
(2)
dE q 1 −E q + X d − X d i d + E f d , = dt Td0
(3)
29 Chaotic Quasi-oppositional Chemical Reaction Optimization … 1 dE d = −E d + X q − X q i q , dt Tq0
367
(4)
where E f d represents the potential proportional to the field potential; The voltages E d and E q are proportional to the damper winding and field flux, respectively. The d-q components of armature currents are i d and i q are the components of armature currents; Tdo and Tqo are the axis transient time constants.
3 Problem Formulation In this research work, a single objective function has been reviewed to discover the efficacy of the presented CQOCRO algorithm.
3.1 Case-I: Eigen Value Minimization By changing the position of the eigen values, the correlative stability of the system is obtained. In the case of real eigen values, the system corresponds to a non-oscillatory mode. Therefore, the system seems to behave unstably if the eigen value is positive and likewise the system goes into decaying mode, if the eigen value is negative. The system reaction should be faster as the eigenvalues move more into the left side of the s plane. Like the vector form, as shown in Eq. 5, the eigen values happen to be in conjoined pairs. On dependence of the real component ‘σ’, which either is negative or positive, it enlarges the magnitude of oscillations to either total instability or die out the oscillation. Here the eigen value-based objective function (O F1 ) has been used as depicted in Eq. 6 s = σ ± jω, (5) O F1 =
(σ0 − σi ) .
(6)
i
The part sans the imaginary value of the eigen value σi is the number of states for which optimization is performed, represented by n. σ0 determines the relative stability with respect to the damping factor margin assuming for compelling the situation of the eigen values throughout the optimization process. The value σi shall depend on accordance to the given problem.
3.2 Case-II: Damping Ratio Minimization The damping ratio in Eq. 7 shows how rapidly the fluctuations are damped.
368
S. Paul et al.
ζ=√
σ σ2
+ ω2
(7)
Minimum damping ratio considered, ζ = 0.3. Therefore, reduction of the objective expression will reduce the undershoot, overshoot, and settling time as depicted in Eq. 8 O F2 = (8) (σ0 − σi ) i
4 Chaotic Chemical Reaction Optimization (CRO) 4.1 Inspiration of CRO Xu et al. [17] in 2011 introduced the chemical reaction optimization (CRO) method, which is based on the chemical reaction process, in which molecules interact with one another in a succession of occurrences. The CRO has strong searching capabilities that demonstrate great intensification and diversification operations, two crucial characteristics of evolutionary algorithms. A solution to the optimization problem is represented by the atomic structure of a molecule in CRO. Two essential components for a molecule are potential energy (PE) and kinetic energy (KE). The PE energy of the molecule serves as a measure of a solution’s fitness, whereas KE is employed to regulate the adoption of new solutions that are less fit.
4.2 Mathematical Modelling of CRO There are four different sorts of chemical reaction processes: synthesis, breakdown, on-wall ineffective collision, and inter-molecular ineffective collision. As a result, the PE of the resulting molecules often resembles that of the original molecules. On the other hand, synthesis and breakdown often produce new molecular structures that are not necessarily close to the original ones.
4.2.1
On-Wall Ineffective Collision
When a molecule strikes a wall and then bounces back, it causes the on-wall ineffective collision response. If the following criteria are met in this reaction, a molecule ms may transform into another molecule ms1 : E k ms + E p ms = E p ms1 .
(9)
29 Chaotic Quasi-oppositional Chemical Reaction Optimization …
4.2.2
369
Decomposition
The decomposition reaction is used to simulate how a wall would break into two or more pieces after being struck. Due to the violent collision, the two newly produced molecules have completely different molecular structures from the original molecule ms and from nearby molecules: E k ms + E p ms = E p ms1 + E p ms2 .
4.2.3
(10)
Inter-molecular Ineffective Collision
The inefficient inter-molecular collision of molecules mimics the way two molecules collide and form two new molecules: ms 1 , ms 2 . Since the collision is not severe, the created molecules’ molecular structures are more comparable to those of the parent molecules. Inter-molecular ineffective collision occurs if
E k ms1 + E p ms1 + E k ms2 + E p ms2 = E p ms 1 + E p ms 2 .
(11)
5 Quasi-oppositional-Based Learning (Q-OBL) OBL was first introduced by Tizhoosh [18] as a cutting-edge notion for soft computing or intelligence-based techniques to improve different optimization methods. It appears to be one of the most successful theories in computational intelligence for tackling nonlinear optimization problems, enhancing the searching capabilities of the traditional population-based optimization strategies. The OBL algorithm starts by generating an initial estimate, which can be based on previous knowledge of the solution or created at random. The finest solution could come from anyone or, at the absolute least, from the opposite way. An opposing set of estimates is examined for convergence, which iteratively substitutes the starting estimates for a superior solution in the direction of optimality.
5.1 Opposite Number Let the real number be denoted by P ∈ [y, z] and its opposite number denoted by P 0 and is defined real number by P 0 = y + z − P.
(12)
370
S. Paul et al.
5.2 Opposite Point Say R = (X 1 , X 2 , ..., X n ) is a point in n-dimensional space, where Pr ∈ [yr , zr ], r ∈ 1, 2, ..., n. The R 0 = (X 1 0 , X 2 0 , ..., X n 0 ) opposite point is defined by its components: (13) P r 0 = yr + zr − P0 .
5.3 Quasi-opposite Number and Quasi-opposite Point Let P belong to a real number ranging between [y, z]. The quasi-opposite number is defined as ˜ (14) P Q 0 = rand(C, P), . Let P be a real number between [a, b]. The quasiwhere C is given by C = y+z 2 opposite point P r Q 0 is defined as: P r Q 0 = rand(Cr , P˜r )
where Cr =
(15)
yr + zr 2
5.4 Chaotic CRO Chaotic CRO (CCRO), which combines chaotic behaviour with Chemical Reaction Optimization (CRO), was created to lessen the shortcomings of CRO. Chaos, which is stochastic and non-repeating by nature, conducts general searches at higher speeds, which is essential for quickening a metaheuristic algorithm’s convergence rate. A chaotic set is composed of a total of 10 chaotic maps, each of which exhibits a different behaviour. This set’s starting point has been determined to be 0.7 between 0 and 1. The many chaotic map behaviours help to solve issues with local optimum and convergence speed.
6 CQOCRO Steps for PSS Problem Different steps of the CQOCCRO for solving PSS problem are presented below: Step 1: Create the initial population (P) of each individual at random. Also generate the PSS control parameters with the defined solution space.
29 Chaotic Quasi-oppositional Chemical Reaction Optimization …
371
Step 2: Evaluate the fitness value and sort the population from best to worst. The best solution obtained so far is represented by P E. An initial K E is assigned to all molecules. Step 3: Only a small number of the best answers are retained as elite solutions, based on their value. Step 4: The non-elite solutions are once more subjected to the varied chaotic map behaviour of the CCRO in order to change the independent variables. In CCRO, the whales target the preys and the location of the preys is regarded as the best location. Step 5: Change control parameters by applying on-wall ineffective collision, decomposition, inter-molecular ineffective collision, and synthesis on non-elite molecules. Afterwards, feasibility of the solutions are checked using the inequality constraints. Step 6: The computation is finished and the results are displayed if the stopping requirement is met; otherwise, proceed to Step 4.
7 Results and Discussion The applicability and validity of the different optimization techniques have been tested on SMIB system by considering two types of objective functions, namely single objective function and multi-objective function. In all the cases, the synchronous machine model has been considered. All the simulations are carried out using MATLAB R2022b and implemented on a personal computer with 1.8 GHz Dual-Core Intel Core i5 processor with 1.8 GHz Dual-Core Intel Core i5.100 iterations are chosen as the terminating. For SMIB system, inertia constant, H = 5, M = 2H , nominal frequency, f 0 = 50H z, 0.995 ≤ E¯ b ≤ 1.0; the angle of E¯ b = 0 ; 0.2 ≤ P ≤ 1.2; −0.2 ≤ Q ≤ 1.0; 0.4752 ≤ X e ≤ 1.08; 0.5 ≤ E t ≤ 1.1. To look into the proposed method, two new methods, i.e. minimization of eigen value and minimization of damping ratio are measured independently to review the strength of the suggested method. Well-known optimization technique, namely CQOCRO, is analysed and compared with the results of other well-known algorithms to verify the supremacy of the said algorithms.
7.1 Case-I: Eigen Value Minimization The Heffron-Phillips model is a component of the linearized SMIB system. To damp out the low frequency fluctuations affecting the power system, PSSs are installed at respective locations. The eigen value of the state space matrix is given as det |A − s I | ,
(16)
372
S. Paul et al.
Table 1 Comparative investigation of the eigen value minimization problem with light loading (P = 0.65; Q = 0.55) Algorithm Eigen value Overshoot Undershoot Fitness value CQOCRO QOCRO OCRO [19] CRO [20] OCOA [21] COA [21] ODSA [22] DSA [22] OGSA [23] GSA [23]
− 1.5000±6.1904i − 1.5000±6.2212i − 1.5000±6.2387i − 1.5000±6.1432i − 1.500±6.1901i − 1.5000±6.2645 − 1.5000±6.1177i − 1.5000±6.1002i − 1.5000±6.3855i − 1.5000±6.7786i
7.1243 7.1218 7.1250 7.0782 2.3179 4.7879 5.6053 7.1343 4.6761 4.8443
13.8764 13.9442 13.9311 14.0325 13.2901 14.1270 12.0614 13.9602 13.0352 13.4608
0.0078 0.0080 0.0099 0.0106 0.0246 0.0877 0.0037 0.0240 0.0319 0.0859
Table 2 Comparative investigation of the eigen value minimization problem with nominal loading (P = 1.25; Q = 0.50) Algorithm Eigen value Overshoot Undershoot Fitness value CQOCRO QOCRO OCRO [19] CRO [20] OCOA [21] COA [21] ODSA [22] DSA [22] OGSA [23] GSA [23]
− 1.4999±5.1346i − 1.5000±5.2546i − 1.4999±6.7610i − 1.5000±6.7786i − 1.5000±6.7034i − 1.5000±5.4852 − 1.4999±6.8766i − 1.5000±6.6214i − 1.5000±6. 2018i − 1.5000±4.2791i
8.7653 8.8976 8.9697 9.0115 6.5138 8.4237 5.7082 7.8280 7.1298 4.6761
15.6512 16.0000 15.7167 15.6368 13.7391 14.4600 13.0012 14.2250 12.2386 13.1397
0.0009 0.0010 0.0011 0.0860 0.0015 0.0074 0.0031 0.0044 0.0030 0.0738
where s is the eigen value; I is the identity matrix and is the state matrix. On a wide variety of operational conditions, the algorithms are tested as heavy (P = 1.45; Q = 0.70), nominal (P = 1.25; Q = 0.50) and light loading (P = 0.65; Q = 0.55) conditions. The comparative results of the fitness functions along with the optimal PSS parameters are depicted in Tables 1, 2, and 3. It can be clearly shown from the corresponding tables that there is an improvement in the fitness function values in addition to improved overshoot and undershoot in all the three-loading environments. By taking into account the suggested CQOCRO method, the overshoot (Os ), undershoot (Us ), and fitness function values are improved.
29 Chaotic Quasi-oppositional Chemical Reaction Optimization …
373
Table 3 Comparative investigation of the eigen value minimization problems with heavy loading (P = 1.45; Q = 0.70) Algorithm Eigen value Overshoot Undershoot Fitness value CQOCRO QOCRO OCRO [19] CRO [20] OCOA [21] COA [21] ODSA [22] DSA [22] OGSA [23] GSA [23]
− 1.5000±3.1265i − 1.4999±3.1397i − 1.4999±3.1313i − 1.5000±3.1067i − 1.5000±3.1913i − 1.5000±3.1912i − 1.4999±3.1357i − 1.5000±3.1366i − 1.5000±3.1602i − 1.5000#3.1067i
3.2654 3.3181 3.3183 3.3131 2.7106 3.6121 3.3173 3.3138 3.3131 3.3131
9.5317 9.4817 9.4795 9.4713 7.8137 9.5125 9.4803 9.4800 9.4865 9.4713
0.0010 0.0011 0.0016 0.0381 0.0024 0.0045 0.0012 0.0029 0.0074 0.0381
Table 4 Results of damping ratio minimization compared between various techniques for light loading condition (P = 0.65; Q = 0.55) Algorithm Eigen value Overshoot Undershoot Fitness value CQOCRO QOCRO OCRO [19] CRO [20] OCOA [21] COA [21] ODSA [22] DSA [22] OGSA [23] GSA [23]
− 1.1692±6.7680 − 1.2692±6.2189 − 1.2190±5.9323i − 0.8754±4.2883i − 1.2685±6.2145i − 1.2515±6.3242i − 1.2624±6.1843i − 1.2419±6.0836i − 1.2506±6.1267i − 1.2323±6.0383i
5.6589 7.9358 8.1975 5.7747 7.9363 7.9436 6.6200 7.9445 7.9589 7.9949
12.7680 14.1101 14.2575 12.3212 14.0000 14.0000 12.1270 13.5749 13.7845 14.0674
0.0037 0.0048 0.0250 0.0572 0.0170 0.0169 0.0010 0.0153 0.0346 0.4315
7.2 Case-II: Damping Ratio Minimization To determine the applicability of the suggested CQOCRO algorithm and to manage the stability of the power system, damping ratio minimizations are taken into account. The algorithms are tested under the aforementioned loading circumstances. The overshoot and undershoot, together with the fitness function, are depicted in Tables 4, 5, and 6 on the basis of final optimization values.
374
S. Paul et al.
Table 5 Results of damping ratio minimization compared between various techniques for nominal loading condition (P = 1.25; Q = 0.50) Algorithm Eigen value Overshoot Undershoot Fitness value CQOCRO QOCRO OCRO [19] CRO [20] OCOA [21] COA [21] ODSA [22] DSA [22] OGSA [23] GSA [23]
− 1.4251±6.2347i − 1.3951±6.8347i − 1.3839±6.7798i − 1.3568±6.6466i − 1.3981±6.8494i − 1.3879±6.8154i − 1.3993±6.8551i − 1.3818±6.7694i − 1.3892±6.8057i − 1.3672±6.6995i
8.4398 9.3277 9.3656 9.4227 7.3162 7.9271 8.7143 9.3140 9.3425 9.4051
13.6534 15.6768 15.7411 15.8934 13.6500 13.9846 13.6627 14.7523 15.6839 16.1162
0.0032 0.0028 0.0048 0.0712 0.0222 0.0563 0.0155 0.02807 0.0172 0.5264
Table 6 Results of damping ratio minimization compared between various techniques for heavy loading condition (P = 1.45; Q = 0.70) Algorithm Eigen value Overshoot Undershoot Fitness value CQOCRO QOCRO OCRO [19] CRO [20] OCOA [21] COA [21] ODSA [22] DSA [22] OGSA [23] GSA [23]
− 0.7623±3.1234i − 0.6960±3.4100i − 0.6853±3.3577i − 0.6841±3.3512i − 0.4365±2.1385 − 0.6028±3.4543i − 0.4985±2.4421i − 0.7107±3.4815i − 0.6967±3.4130i − 0.6841±3.3512i
6.5687 6.8087 6.8998 6.8994 4.0389 4.9815 1.9867 1.9951 2.5969 4.80008
10.3240 10.8740 10.8801 10.8786 8.3912 8.8371 9.1208 10.8733 6.8726 9.6810
0.0033 0.0052 0.0075 0.0811 0.0357 0.0395 0.0111 0.0395 0.0543 0.0811
8 Conclusion In this study, the CQOCRO approach is used in an effort to increase the SMIB’s adaptable stability when including PSS. Initially, original CRO is used to catch out the ideal set of PSS parameters and then to speed up the convergence and increase the stability and accuracy of the solution, the Chaotic Q-OBL theory was also included. To assess the superiority of the anticipated approach, two independent objective functions known as eigen value minimization and damping ratio minimization are used. The ability of the aforementioned CQOCRO to eliminate the downsides of early convergence and generate a desired optimum solution can be summed up as its best convergence features.
29 Chaotic Quasi-oppositional Chemical Reaction Optimization …
375
References 1. Hariri A, Malik OP (1996) A fuzzy logic based power system stabilizer with learning ability. IEEE Trans Energy Convers 11(4):721–727 2. Grigsby LL (2006) Electric power engineering handbook. CRC Press LLC, London 3. Xia D, Heydt GT (1983) Self-tuning controller for generator excitation control. IEEE Trans Power Apparatus Syst 102(6):1877–1885 4. Kundur PS, Malik OP (2022) Power system stability and control. McGraw-Hill Education 5. Snášel V, Rizk-Allah RM, Izci D, Ekinci S (2023) Weighted mean of vectors optimization algorithm and its application in designing the power system stabilizer. Appl Soft Comput, p 110085 6. Klein M, Rogers GJ, Moorty S, Kundur P (1992) Analytical investigation of factors influencing power system stabilizers performance. IEEE Trans Energy Convers 7(3):382–390 7. Sharma KK, Gupta A, Kaur G, Kumar R, Chohan JS, Sharma S, Singh J, Khalilpoor N, Issakhov A (2021) Power quality and transient analysis for a utility-tied interfaced distributed hybrid wind-hydro controls renewable energy generation system using generic and multiband power system stabilizers. Energy Rep 7:5034–5044 8. Khezri Rahmat, Bevrani Hassan (2015) Stability enhancement in multi-machine power systems by fuzzy-based coordinated AVR-PSS. Int J Energy Optim Eng (IJEOE) 4(2):36–50 9. Zhang Y, Chen GP, Pr Malik O, Hope GS (1993) An artificial neural network based adaptive power system stabilizer. IEEE Trans Energy Convers 8(1):71–77 10. Mahabuba A, Abdullah Khan M (2009) Small signal stability enhancement of a multi-machine power system using robust and adaptive fuzzy neural network-based power system stabilizer. Eur Trans Electr Power 19(7):978–1001 11. Peres W, Júnior ICS, Filho JAP (2018) Gradient based hybrid metaheuristics for robust tuning of power system stabilizers. Int J Electr Power Energy Syst 95:47–72 12. Niknam T, Mojarrad HD, Meymand HZ, Firouzi BB (2011) A new honey bee mating optimization algorithm for non-smooth economic dispatch. Energy 36(2):896–908 13. Marmolejo JA, Velasco J, Selley HJ (2017) An adaptive random search for short term generation scheduling with network constraints. PloS ONE 12(2):e0172459 14. Gurung S, Jurado F, Naetiladdanon S, Sangswang A (2020) Comparative analysis of probabilistic and deterministic approach to tune the power system stabilizers using the directional bat algorithm to improve system small-signal stability. Electr Power Syst Res 181:106176 15. Panda S, Yegireddy NK, Mohapatra SK (2013) Hybrid BFOA–PSO approach for coordinated design of PSS and SSSC-based controller considering time delays. Int J Electr Power Energy Syst 49:221–233 16. Abd Elazim SM, Ali ES (2016) Optimal power system stabilizers design via cuckoo search algorithm. Int J Electr Power Energy Syst 75:99–107 17. Xu J, Lam AYS, Li VOK (2010) Chemical reaction optimization for the grid scheduling problem. In: 2010 IEEE international conference on communications. IEEE, pp 1–5 18. Tizhoosh HR (2005) Opposition-based learning: a new scheme for machine intelligence. In: International conference on computational intelligence for modelling, control and automation and international conference on intelligent agents, web technologies and internet commerce (CIMCA-IAWTIC’06) 19. Paul S, Maji A, Roy PK (2016) The oppositional chemical reaction optimization algorithm for the optimal tuning of the power system stabilizer. In: Foundations and frontiers in computer, communication and electrical engineering: proceedings of the 3rd international conference on foundations and frontiers in computer, communication and electrical engineering, 2016 (C2E2-2016) 20. Paul S, Roy PK (2015) Optimal design of single machine power system stabilizer using chemical reaction optimization technique. Int J Energy Optim Eng (IJEOE) 4(2):51–69 21. Paul S, Roy PK (2015) Oppositional cuckoo optimization algorithm for optimal tuning of power system stabilizers. In: Michael Faraday IET international summit 2015. IET, pp 176–181
376
S. Paul et al.
22. Paul Sourav, Roy Provas (2018) Optimal design of power system stabilizer using a novel evolutionary algorithm. Int J Energy Optim Eng (IJEOE) 7(3):24–46 23. Paul S, Roy PK (2014) Optimal design of power system stabilizer using oppositional gravitational search algorithm. In: 2014 1st International conference on non conventional energy (ICONCE 2014)
Chapter 30
Network Intrusion Detection System for Cloud Computing Security Using Deep Neural Network Framework Munish Saran and Ritesh Kumar Singh
1 Introduction Cloud computing has emerged as a very widespread area and platform in today’s time. It aims to provide on-demand services that users or any organizations may access over the Internet [1]. Currently, this technology finds its role in widespread applications like being used by every person in day to day life, business organizations, social media applications, banking, etc. Cloud computing service providers are the service vendors that offer or expose different services via different service or deployment models and can be accessed anywhere and anytime with the help of Internet and at the same time offering those services with features such as rapid elasticity, broad network access, measure services, etc. [2, 3]. Software as a service (SaaS) service model allows the end users to directly consume the cloud deployed services by the service providers. Platform as a service (PaaS) allows the developers to develop cloud-based service by offering them several platforms, frameworks, programming languages, etc. Infrastructure as a service (IaaS) allows the organizations to set-up the infrastructure resources required for their business needs. The services can be deployed by the service providers via three of the deployment models supported by this paradigm, i.e. public, private, hybrid or community mode. Figure 1 shows the NIST model for cloud computing. Every service model of this computing paradigm suffers from several intrusions or attacks that lead to loss of sensitive information, data breaches, denial of service, unavailability of services, etc. [4, 5]. Intrusion detection system plays an important role in making this ecosystem fault resistant to a great extent. Several research work M. Saran (B) Department of Computer Science, DDUGU, Gorakhpur University, Gorakhpur 273001, India e-mail: [email protected] R. K. Singh Department of Computer Science and Engineering, NIET, Greater Noida 201306, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_30
377
378
M. Saran and R. K. Singh
Fig. 1 NIST model of cloud computing
has been carried out in order to develop more advanced intrusion detection system that could detect known as well as unknown attacks. Artificial intelligence as a technology finds a very crucial role in developing such systems. Machine learning as well as deep learning-based intrusion detection systems have proven to be much more smarter in detecting the attacks or any form of intrusions that could disrupt any cloud-based services [6, 7]. Training such deep learning models in order to guard defence against attacks in crucial task as a well-trained model depicts higher accuracy in predicting the occurrence of attacks in comparison. There are two major approaches primarily available as intrusion detection approaches, namely anomaly-based detection and signature-based detection. The intrusion detection system that utilizes the signature-based approach for intrusion detection is able to detect only those attacks or the intrusions which are previously known and are unable to detect any new form of attacks whose signatures are not saved in the attack databases, while the anomaly-based approach for detecting the intrusions are able to detect the known as well as the unknown form of attacks as they make use of machine learning technology as their backend technology in this regard as shown in Fig. 2. There are several types of intrusion detection systems available depending on the placement of their deployment and nature of providing defence mechanism, namely host-based IDS, network-based IDS, hypervisor-based IDS, distributed IDS.
30 Network Intrusion Detection System for Cloud Computing Security …
379
Fig. 2 Approaches for intrusion detection
2 Literature Review This section gives a brief overview about some of the latest and authentic research work that has been carried out for providing secure frameworks for cloud computing using deep learning. Tang et al. [8] suggested an enhanced technique for constructing an intrusion detection system based on deep learning. The deep neural network-based IDS is trained on NSL-KDD dataset in order to guard defence against network-based attack in software defined network (SDN). The proposed work has 0.001 as the learning rate for the neural network in order to achieve higher accurate results. The AUC curve achieved by the proposed work shows the significance of the proposed work. The three-phase intrusion detection system is proposed by Zhou et al. [9] based on deep neural network in order to classify the incoming network traffic as normal or malicious. The proposed works shows its impact over random forest, linear regression, k-nearest neighbour and achieves the accuracy of 96.30%. The support vector machine based model uses batch size of 86, 10 number of epoch and 0.001 as the learning rate. Feng et al. [10] proposed a deep neural network-based intrusion detection system. The proposed approach utilizes the convolution neural network (CNN) as well as long short-term memory (LSTM) in order to construct a deep learning model for guarding defence against cross-site scripting (XSS) attacks, SQL attacks (injection) and Denial of Service (DoS). The model is trained on KDD CUP 99 dataset
380
M. Saran and R. K. Singh
and achieved enhanced accuracy of classification. Another deep learning frameworkbased intrusion detection system trained on KDD CUP dataset is proposed by Kim et al. [11]. The proposed IDS utilizes stochastic gradient descent (SGD) and ReLU as its optimizer and activation function, respectively, which allows to achieve the 99% accuracy. The architecture of the proposed deep neural network consists of one input layer, four hidden layers and one output layer along with 100 neurons in the hidden layers. Kasongo et al. [12] suggested an IDS in order to detect the malicious incoming network traffic named as feed forward deep neural network (FFDNN). The main emphasis is laid upon the importance of preprocessing the dataset for training the neural network. In order to overcome the major problem of overfitting the neural network, filter-based feature selection technique is employed for getting the rid of redundant, irrelevant as well as noisy features from the dataset. Feature transformation and normalization is also applied to the dataset as a part of preprocessing. The neural network trained on the optimized dataset tends to predict more accurate results as 0.9969 in comparison. A recurrent neural network (RNN)-based IDS is proposed by Kim et al. [13]. The RNN utilized long short term memory (LSTM) architecture with 50 as the batch size, 500 epochs. The model successfully attains the 98.80% accuracy in detecting the intrusions and logging the intrusion events. Another LSTM-based recurrent neural network for detecting the intrusions is given by Taylor et al. [14]. The proposed IDS thoroughly examines the incoming data packets and raises the intrusion alarms on the detection of any such. Yin et al. [15] proposed an intrusion detection framework based on recurrent neural network. The architecture of the proposed neural network consists of one input layer, one hidden and one output layer with 80 hidden layer neurones and a small learning rate of 0.1. The evaluation of the results were performed based on the scores of false positive rate (FPR), accuracy and true positive rate (TPR) achieved by the model.
3 Proposed Approach The proposed approach is based on multilayer backpropagation neural network for attack classification. Dataset is considered as the fuel for the neural networks as the model trained on the most optimized form of dataset is expected to predict the most accurate outcome and for this sake the preprocessing of the dataset is performed. Preprocessing of the dataset involves feature encoding, feature normalization and feature selection. Every dataset is composed of several features or the independent variable among which some are redundant, irrelevant, noisy, highly correlated features which are not directly related to the dependent or the output variable and in order to remove these irrelevant features from NSL-KDD dataset, the proposed approach utilizes recursive feature elimination (RFE) technique. It is observed that the dataset is reduced to 13 features which are found relevant as given in Table 1 while all the other irrelevant features are dropped out and thus overcoming the major problem of overfitting. The proposed backpropagation neural network (BPNN) has
30 Network Intrusion Detection System for Cloud Computing Security …
381
Table 1 Optimized subset of features src_bytes
diff_srv_rate
Flag_S0
dst_host_same_src_port_rate
wrong_fragment
same_srv_rate
dst_bytes
dst_host_same_srv_rate
count
dst_host_srv_serror_rate
flag_SF
srv_count
protocol_type_icmp
five layers with one input layer, three hidden layers and one output layer. Each layer is composed of multiple neurons. During the forward propagation, the optimized features subset is provided as input to the input layer of BPNN which in turns assign the weights to each input neurons and forwards to the hidden layer for further processing. The hidden layer neurons get activated and apply the ReLU activation function over the summation of assigned weights multiplied with the input feature values. The ReLU AF returns the maximum value between 0 and the received input value. The output from the hidden layer is multiplied with the assigned weight and passed to the output layer. The output layer applies the Sigmoid activation function to the output of the hidden layer neurons. The Sigmoid activation function transforms the value between 0 and 1 and has the threshold value as 0.5. The neural network classifies the input as malicious traffic if the calculated value is greater than 0.5 and classifies as normal traffic if the calculated value is smaller than 0.5. This predicted value is compared with the actual value of the training dataset and the loss function is computed if any difference is observed. The architecture of the back propagation neural network uses 100 epochs and batch size as 100. The weights of the BPNN are to be updated (backpropagation) in such a manner that it reduces the loss function to the global minimal value in order to match the predicted and the actual value. The task to reduce the loss function to the minimal possible value in the BPNN is achieved with the optimizers. In this regard, the choice of optimizer is very crucial. The proposed work makes use of the Adam optimizer which is the successor of stochastic gradient descent (SGD) and inherits the advantages of both RMSProp and Adagrade optimizers in order to enhance the accuracy of the BPNN. Unlike the other optimizers, the Adam optimizer updates the learning rate for each iteration for weight adjustment. Figures 3 and 4 show the architecture of neural network and workflow of the proposed approach, respectively.
4 Dataset Description Dataset is considered as the most important component for the artificial intelligencebased systems. The proposed deep neural network-based network intrusion detection system for the classification of attacks is trained on NSL-KDD dataset. NSL-KDD has 1,25,973 records for training and 22,544 records as testing dataset. The network level attacks are categorized under four different types of attack categories in the
382
Fig. 3 Neural network
Fig. 4 Proposed approach workflow
M. Saran and R. K. Singh
30 Network Intrusion Detection System for Cloud Computing Security …
383
Table 2 Attack types Attack type
Attack names
DoS
Pod, Apache2, Mailbomb, Land, Teardrop, Neptune, Smurf, Back, Processtable, UDPstorm
R2L
Named, Guess_Password, Phf, Imap, SnmpGuess, Sendmail, WarezClient, Xsnoop, Ftp Write, WarezMaster, Spy, SnmpGetAttack, MultiHop
U2R
LoadModule, Httptuneel, Rootkit, Buffer Overflow, Xterm
Probe
Portsweep, Satan, Ipsweep, Mscan, Nmap
Table 3 NSL-KDD feature set S. No. Feature name
S. No. Feature name
S. No. Feature name
1
Duration
15
num_root
29
protocol type
2
dst_host_rerror_rate
16
su_attempted
30
num_outbound_ cmds
3
Service
17
num_shells
31
Flag
4
is_guest_login
18
num_file_creations
32
srv_count
5
src _bytes
19
is_host_login
33
dest_bytes
6
srv_serror_rate
20
num_access_files
34
srv_rerror_rate
7
Land
21
serror_rate
35
wrong fragment
8
same_srv_rate
22
Count
36
srv_diff_host_rate
9
Urgent
23
diff_srv_rate
37
Hot
10
dst_host_srv_count
24
dst_host_count
38
number failed logins
11
dst_host_diff_srv_rate
25
rerror_rate
39
dst_host_srv_diff_ host_rate
12
logged_in
26
dst_host_same_src_ port_rate
40
dst_host_srv_ rerror_rate
13
dst_host_serror_rate
27
dst_host_same_srv_ rate
41
dst_host_srv_ serror_rate
14
num compromised
28
root shell
42
class label
NSL-KDD dataset, namely probing, denial of service, user to root and root to local with several attacks as given in Table 2 and overall 42 features given in Table 3.
5 Performance Matrix The performance of the proposed deep neural network-based network intrusion system is evaluated against the below mentioned parameters.
384
M. Saran and R. K. Singh
1. Accuracy—Ratio of the summation of true negative and true positive to the summation of overall observations is defined as the accuracy of the classifier. Acc =
TP + TN TP + TN + FP + FN
2. False Alarm Rate—FAR is defined as the ratio of FP to the summation of TN and FP. FAR =
FP TN + FP
3. Precision—Precision is defined as the ratio of TP to the summation of TP and FP by the classifier. Precision =
TP TP + FP
4. Recall—Ration between the TP to the summation of TP and FN. DR =
TP TP + FN
5. F1-Score—F1-score is defined as the harmonic mean of precision and recall. F1 - Score =
2 ∗ Recall ∗ Precision Recall ∗ Precision
6 Result The obtained results from the proposed backpropagation neural network on NSLKDD dataset were evaluated against the performance parameters as described above. The result depicts that the proposed backpropagation neural network achieves 98.60% accuracy for classification of the network level attacks when trained on NSLKDD dataset. The BPNN achieves 99.15% precision rate, 97.50% recall, 98.72% F1 score and 1% false alarm rate. The results were thus obtained and were compared with intrusion detection model based on machine learning algorithms, namely random forest, support vector machine and decision tree, and it was observed that the proposed deep learning-based backpropagation neural network achieves better accurate results on the basis of performance matrix discussed in Table 4.
30 Network Intrusion Detection System for Cloud Computing Security …
385
Table 4 Performance result comparison Performance parameter
Decision tree (%)
Support vector machine (%)
Random forest (%)
Proposed backpropagation neural network (%)
Accuracy
97.46
97.00
97.88
99.54
Recall
98.43
97.33
96.98
98.50
F1 score
96.43
97.85
97.91
98.72
False alarm rate
3
5
3
1
7 Conclusion and Future Directions The security of the cloud ecosystem is of utmost importance. In today’s era, even with huge vendors that provide cloud services to the end consumers suffer from several intrusion attacks which are been reported time to time. Thus there is a constant need to upgrade the security framework of the cloud paradigm that could accurately classify the malicious intentions of the end user requests. The proposed BPNN achieves the higher accuracy, reduction in false positive rate, increased true positive rate in classifying the normal and malicious network traffic so as to block the malicious request reaching the cloud servers and disrupting the cloud services. In future, we aim to train as well as test the proposed framework with more public datasets that could help in making the neural network-based classification model more robust and accurate.
References 1. Alouffi B, Hasnain M, Alharbi A, Alosaimi W, Alyami H, Ayaz M (2021) A systematic literature review on cloud computing security: threats and mitigation strategies. IEEE Access 9:57792– 57807 2. George SS, Pramila RS (2021) A review of different techniques in cloud computing. Mater Today: Proc 46(17):8002–8008 3. Attaran M, Woods J (2019) Cloud computing technology: improving small business performance using the Internet. J Small Bus Entrep 31(6):94–106 4. Dwivedi RK, Saran M, Kumar R (2019) A survey on security over sensor-cloud. In: 2019 International conference on confluence the next generation information technology summit (confluence), IEEE 5. Saran M, Tripathi UN (2021) A comprehensive review of threats, risks, attacks and secure defence mechanisms in cloud computing. J Xi’an Univ Architect Technol 13(7):674–682 6. Saran M, Yadav RK, Tripathi UN (2022) Machine learning based security for cloud computing: a survey. Int J Appl Eng Res 17(4):332–337 7. Saran M, Yadav RK, Tripathi UN, Mitigation from DDoS attack in cloud computing using Bayesian hyperparameter optimization based machine learning approach. Int J Res Trends Innov 7(11)
386
M. Saran and R. K. Singh
8. Tang TA, Mhamdi L, McLernon D, Zaidi SAR, Ghogho M (2016) Deep learning approach for network intrusion detection in software defined networking. In: 2016 International conference on wireless networks and mobile communications (WINCOM), IEEE, pp 258–263 9. Zhou L, Ouyang X, Ying H, Han L, Cheng Y, Zhang T (2018) Cyber-attack classification in smart grid via deep neural network. In: Proceedings of the 2nd international conference on computer science and application engineering, ACM. 2018, p 90 10. Feng F, Liu X, Yong B, Zhou R, Zhou Q (2019) Anomaly detection in ad-hoc net- works based on deep learning model: a plug and play device. Ad Hoc Netw 84:84–89 11. Kim J, Shin N, Jo SY, Kim SH (2017) Method of intrusion detection using deep neural network. In: 2017 IEEE international conference on Big Data and smart computing (BigComp). IEEE 12. Kasongo SM, Sun Y (2019) A deep learning method with filter based feature engineering for wireless intrusion detection system. IEEE Access 7:38597–38607 13. Kim J, Kim J, Thu HLT, Kim H (2016) Long short term memory recurrent neural network classifier for intrusion detection. In: 2016 International conference on platform technology and service (PlatCon). IEEE, pp 1–5 14. Taylor A, Leblanc S, Japkowicz N (2016) Anomaly detection in automobile control network data with long short-term memory networks. In: 2016 IEEE International conference on data science and advanced analytics (DSAA). IEEE, pp 130–139 15. Yin C, Zhu Y, Fei J, He X (2017) A deep learning approach for intrusion detection using recurrent neural networks. IEEE Access 5:21954–21961
Chapter 31
Comparison of Activation Functions in Brain Tumour Segmentation Using Deep Learning Amisha Nakhale and B. V. Rathish Kumar
1 Introduction The modern healthcare system places a high value on medical imaging techniques to facilitate non-invasive diagnostic treatments [1]. It involves developing realistic visual representations of our body organs for medical examination. Various representation types include x-ray-based methods such as conventional x-rays, computed tomography (CT), molecular imaging, ultrasound, and magnetic resonance imaging (MRI) [2]. Technological advancements have made it easier to acquire images, which has resulted in the production of large number of high-resolution images at very low costs. As a result, biomedical image processing and segmentation techniques have advanced significantly [3]. There is no need for human participation with the fully automatic segmentation approaches. Fully automatic segmentation can be achieved through artificial neural networks (ANN) [4]. However, most of these approaches rely on supervised learning methods that utilize training data [5]. Unsupervised learning approaches also require labelled images for validation, which are obtained through manual segmentation [6]. Machine learning techniques like neural networks (NN) or support vector machines (SVM) often use human feature engineering techniques [7]. These methods require a lot of time and are unable to process natural data in its raw form. These methods are also required to be applied again as they generally do not adapt to new information. On the contrary, deep learning methods eliminate the use of handcrafted A. Nakhale (B) · B. V. Rathish Kumar Department of Mathematics and Statistics, Indian Institute of Technology Kanpur, Kalyanpur 208016, India e-mail: [email protected] B. V. Rathish Kumar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_31
387
388
A. Nakhale and B. V. Rathish Kumar
features as they are able to process raw natural data. These methods have been used effectively for semantic segmentation on images of nature, and they have additionally found applications in the field of biomedical image segmentation [8].
2 Overview of Deep Learning 2.1 Convolutional Neural Network (CNN) A convolutional neural network (CNN) uses an image as input and assigns importance to different objects in the image by learning weights and bias parameters during the training process. This helps the CNN to differentiate between distinct characteristics of the image [9]. Machine learning methods make use of hand-engineered features and therefore are time-consuming. As CNN can learn these weights and biases with training, they require much less preprocessing than ML-based methods. In order to train and evaluate the CNN model, each input image is processed by a sequence of layers. These include convolutional layers containing filters (kernels), pooling layer, fully connected layers (FC), and an output layer with softmax function. The softmax function allows the model to identify an object based on probabilistic values that range from 0 to 1 [10]. Figure 1 illustrates the architecture of a CNN. It accepts an image as input, processes the image, and then classifies the objects in the image. CNN is widely utilized for classification problems. For using CNN for semantic segmentation, a different approach is used. The input image is initially partitioned into patches of equal size. CNN identifies the central pixel of the patch. The patch is then slid to classify the next central pixel. This process is repeated for all the patches of the input image. However, this method is not recommended because it results in a
Fig. 1 CNN architecture
31 Comparison of Activation Functions in Brain Tumour Segmentation …
389
loss of spatial information as the overlapping features of the sliding patches are not reused. To address this issue, the fully convolutional network (FCN) [11] was proposed. In FCN, the final fully connected layers of the CNN are replaced with transposed convolutional layers [12]. While performing semantic segmentation, these layers recover the original spatial dimensions by upsampling low-resolution feature maps.
2.2 U-Net U-Net is a type of FCN. Semantic segmentation produces a full, high-resolution image that classifies all pixels. Therefore, the use of a conventional convolutional network containing pooling layers and dense layers would result in the loss of spatial information and solely retain the semantic content, which is not our desired outcome. For the purpose of segmentation, it is essential to have access to both semantic and spatial information. In order to retrieve the spatial information, it is necessary to up-sample or convert the low-resolution image to a high-resolution image [13]. There are numerous techniques in the literature for upsampling an image. Bilinear interpolation, cubic interpolation, nearest neighbour interpolation, transposed convolution, and others are examples. However, most state-of-the-art networks prefer transposed convolution [12] method for upsampling an image. Transposed Convolution: The connection in position between the input and output values is an important part of the convolution process, e.g. the values located in the top-left position of the input matrix have an influence on the corresponding value situated in the top-left position of the output matrix. More specifically, the kernel serves the purpose of establishing a connection between the input matrix values and the output matrix values. A many-to-one relationship is formed by a convolution operation. A convolution matrix can be used to express a convolution operation. Figure 2 shows a convolution matrix. It is a rearranged kernel matrix that allows us to perform convolution operations using matrix multiplication. The 4 × 16 convolution matrix can be matrix-multiplied with the 16 × 1 input matrix (16-dimensional column vector). We can reshape the output 4 × 1 matrix into a 2 × 2 matrix. The convolution matrix allows you to go from 16 (4 × 4) to 4 (2 × 2) because it is 4 × 16.
Fig. 2 Kernel and its convolution matrix
390
A. Nakhale and B. V. Rathish Kumar
Transposed Convolution Matrix: Consider transposing the convolution matrix C (4 × 16) to C.T. (16 × 4). We can generate an output matrix (16 × 1) by matrixmultiplying C.T. (16 × 4) with a column vector (4 × 1). We simply up-sampled a smaller (2 × 2) matrix into a larger one. Because of the way the weights are laid out, the transposed convolution maintains the one-to-many relationship. We can use it to perform upsampling. Furthermore, the weights in the transposed convolution can be learned. There are two paths in the U-Net architecture [14], as shown in Fig. 3. The first path is the contraction path (also known as the encoder), which captures the context of an image. The encoder consists of only convolutional and maximum pooling layers. The size of the input image gradually decreases, while the depth gradually increases in the encoder. This essentially means that the U-Net learns the semantic information in the image but loses the spatial information. Using transposed convolutions, the symmetric expanding path (also known as the decoder) enables precise localization. Hence, it can accept images of any size as it only has convolutional layers and no dense layers. For this reason, it is an end-to-end fully convolutional network (FCN). The decoder gradually increases the image size while decreasing the depth. By gradually applying upsampling, the decoder intuitively retrieves the spatial information (precise localization). In order to achieve better accuracy in determining locations, we use skip connections during the decoder process. This is done by merging the output of transposed
Fig. 3 U-Net architecture (example for 32 × 32 pixels in the lowest resolution). Each blue box represents a multi-channel feature map. White boxes correspond to copied feature maps. The x–ysize is indicated at the lower left corner of the box. The top edge of the box shows the number of channels. The arrows denote the different operations
31 Comparison of Activation Functions in Brain Tumour Segmentation …
391
convolution layers with feature maps from the encoder that are at the same level. Following each concatenation, we perform two repeated regular convolutions so that the model can acquire the information required to generate more accurate outputs. As a result, the structure is symmetrical and in the shape of a U; hence, the name “U-NET”. The architecture can be briefly described as follows: Input ≥ Encoder ≥ Decoder ≥ Output
2.3 Activation Functions The activation function determines whether or not a neuron should be activated by calculating a weighted sum and then adding bias to it. The activation function is used to introduce nonlinearity into a neuron’s output [15]. In this paper, we compare the performances of the ReLU, leaky ReLU, and ELU activation functions. ReLU (Rectified Linear Unit): It is a piecewise linear function that will output the input if it is positive, otherwise, it will output zero. It has become the standard activation function for a variety of neural network types, as models that employ it are easier to train and typically yield better performance. The ReLU function solves the problem of vanishing gradients, enabling models to learn more quickly and perform better. z, z > 0 R(z) = 0, z ≤ 0 Leaky ReLU: Leaky ReLU is a variant of ReLU. A leaky ReLU permits a small, nonzero, constant gradient (typically α = 0.01) when z < 0. However, it is currently unclear whether the benefit holds true across tasks. Leaky ReLUs are one attempt to address the issue of “dying ReLU” by having a small negative slope. R(z) =
z, z > 0 αz, z ≤ 0
ELU (Exponential Linear Unit): ELU is a function that tends to converge costs to zero rapidly and generate more precise outcomes. ELU includes an additional positive alpha constant. Except for the negative inputs, ELU and ReLU are quite similar. For non-negative inputs, both of these activation functions are identity functions. ELU progressively becomes smooth until its output equals −α for negative inputs, while ReLU becomes smooth quickly. ELU, unlike ReLU, can yield negative results. R(z) =
z, z>0 α(e z − 1), z ≤ 0
392
A. Nakhale and B. V. Rathish Kumar
2.4 Performance Metrics The evaluation of the image segmentation efficacy is conducted through established and widely recognized metrics, enabling a comparative analysis with existing techniques. Several performance metrics [8, 16] are provided below to evaluate the segmentation efficacy of deep learning models. Accuracy: Accuracy refers to the proportion of image pixels that are accurately classified. Despite being a fundamental performance metric, the effectiveness of image segmentation can be misrepresented in cases of class imbalance, which is a drawback. In such a situation, a higher level of accuracy for the dominant class may lead to the overlooking of other class, thereby producing results that are biased. Correctly Predicted Pixels Total number of Image Pixels TP + TN = TP + FP + FN + TN
Accuracy =
Precision: Precision refers to the ratio of disease pixels in the automated segmentation outcomes that match the disease pixels in the ground truth. The metric of precision is important in evaluating segmentation performance due to its susceptibility to over-segmentation. Over-segmentation leads to low precision scores. Correctly Predicted Disease Pixels Total number of Predicted Disease Pixels TP = TP + FP
Precision =
Recall: Recall is a metric used to indicate the efficiency of automatic segmentation in identifying disease pixels, expressed as the proportion of correctly identified disease pixels in the ground truth. It is sensitive to under-segmentation because it results in low recall scores. Correctly Predicted Disease Pixels Total number of Actual Disease Pixels TP = TP + FN
Recall =
F1-score: Precision and recall can be used together because high values for both measures for a given segmentation result indicate that the predicted segmented regions match the ground truth in terms of location and level of detail. The F1 measure calculates the harmonic mean of precision and recall between predicted and ground truth segmentation. F1measure = 2 ×
Precision × Recall Precision + Recall
31 Comparison of Activation Functions in Brain Tumour Segmentation …
393
DICE Similarity Coefficient (DSC): DSC takes into account both false alarms and missed values in each class, thereby making it a more effective measure. DICE is also thought to be superior because it not only evaluates the number of correctly labelled pixels, but also the accuracy of the segmentation boundaries. 2|SGround Truth ∩ SAutomated | |SGround Truth | + |SAutomated | 2 × TP = 2 × TP + FP + FN
DICE =
Jaccard Similarity Index (JSI): The Jaccard Similarity Index, which is also referred to as intersection-over-union, is a metric that quantifies the degree of similarity between two sets of data. Specifically, it is calculated as the ratio of the area of overlap between the predicted and ground truth segments to the area of union between the predicted and ground truth segments. SGround Truth ∩ SAutomated SGround Truth ∪ SAutomated TP = TP + FP + FN
JSI =
3 Dataset We used the 2020 Brain Tumour Segmentation Challenge (BraTS) dataset [17]. All BraTS multimodal scans are available as NIfTI files (.nii.gz). The training set consists of MRI images of 369 patients that have been carefully annotated by experts. In contrast, the validation set includes 125 cases of unknown grades [18–22]. The dataset comprises magnetic resonance imaging (MRI) scans obtained from various institutions and contributors. Each patient’s scans include various modalities such as native (T1), T1 contrast-enhanced (T1ce), T2-weighted (T2), and T2 fluid attenuated inversion recovery (T2-FLAIR) volumes, which are used to identify tumoral subregions. The data is processed to eliminate discrepancies by being skull-stripped, aligned to an anatomical template, and resampled with a 1 mm3 resolution. The volume (dimension) of each sequence is (240, 240, 155). The ground truth images (included in the training set) highlight the three tumour regions: the enhancing tumour (ET), the peritumoral edema/whole tumour (ED), and the necrotic and non-enhancing tumour core (NCR/NET). Figure 4 shows the multimodal image of a single patient in the BraTS-2020 training dataset.
394
A. Nakhale and B. V. Rathish Kumar
Fig. 4 Multimodal images of a patient from the BraTS-2020 training dataset
4 Methodology We have used U-Net to segment brain tumour from MRI images. Each MRI image is three dimensional and has a dimension of (240, 240, 155). We have used 2D slices of MRI images as input to the U-Net. Each image is in greyscale. In the training process, we have merged two different MRI types, namely FLAIR and T1ce. The output of our model corresponds to classes 0 (no tumour), 1 (necrotic/core), 2 (edema), and 3 (enhancing tumour). We have used 65% of the BraTS-2020 data for training, 20% is used for validation, and 15% for testing. To test if our model works well on a new dataset, we have also evaluated our model on the BraTS 2019 dataset. We have trained different U-Net models using ReLU, leaky ReLU (α = 0.01), and ELU (α = 0.1) activation functions to compare their performance. The final layer of U-Net uses the softmax function. We have used early stopping to compute the training times of the model.
5 Results Figure 5 shows some results obtained using the ReLU activation function. The first two columns in both rows show the original flair image and the ground truth image. The next column, labelled “all classes” shows the combined predictions of three sub-regions of tumour. The green region corresponds to the necrotic/core, the red region is edema, and the blue region is enhancing tumour. The figures also show the predicted necrotic/core, edema and enhancing tumour region. Table 1 gives the various performance scores of our model on the BraTS 2020 set and the BraTS 2019 dataset. The scores on the BraTS 2020 dataset and the BraTS 2019 dataset are comparable, implying that our model is robust to a new dataset. This validates our model. We compare the results of our model with the top-ranking architectures of the BraTS 2019 challenge. It should be noted that these models were trained on the BraTS 2019 training dataset and validated on the BraTS 2019 dataset. We have trained our model on the BraTS 2020 dataset and validated on the BraTS 2019 dataset.
31 Comparison of Activation Functions in Brain Tumour Segmentation …
395
Fig. 5 Results of U-Net model on the BraTS 2020 dataset
Table 1 Performance of U-Net model on the BraTS 2020 and the BraTS 2019 dataset
Performance metric
BraTS 2020
BraTS 2019
Loss
0.0165
0.0165
Accuracy
0.9941
0.9946
Mean_IoU
0.8351
0.8339
Dice coefficient
0.6501
0.6665
Precision
0.9943
0.9948
Sensitivity
0.9927
0.9933
Specificity
0.9981
0.9982
Dice coefficient necrotic
0.6281
0.6344
Dice coefficient edema
0.7901
0.7997
Dice coefficient enhancing
0.6831
0.7962
Table 2 gives the dice scores of the challenge participants [22]. The cascaded UNet used by Jiang et al. [23] achieved the highest challenge scores in the challenge. Zhao et al. [24] utilized DCNN for segmentation and produced dice coefficients of 0.754, 0.910, and 0.835 for enhancing tumour, edema, and necrotic core, respectively. McKinley et al. [25] developed a CNN and produced dice coefficients of 0.770, 0.909, and 0.830 for enhancing tumour, edema, and necrotic core, respectively. As we can see from Table 2, the dice coefficient of enhancing tumour of our model is higher than that of other architectures. But the dice coefficients of edema and Table 2 Comparison of dice coefficient with challenge participants of BraTS 2019 validation set
Authors
Enhancing
Edema
Necrotic
Jiang et al. [23]
0.802
0.909
0.864
Zhao et al. [24]
0.754
0.910
0.835
McKinley et al. [25]
0.770
0.909
0.830
Our model
0.7962
0.7997
0.6344
396
A. Nakhale and B. V. Rathish Kumar
necrotic core are lower. It should be noted that these architectures are computationally expensive and require a lot of time for training. Our model has comparable dice scores and requires less training time. Our focus is on comparison of activation functions. The results of Table 2 confirms that our algorithm is performing well. So, we proceed to use the same U-Net architecture for comparing the performances of activation functions. We train the U-Net using activation functions like leaky ReLU and ELU and compare their performance. Table 3 gives the various metrics scores of our model on BraTS 2020 using different activation functions. Figure 6 shows the comparison of accuracy, sensitivity, and specificity of activation functions in the form of bar graphs. Clearly, the performance of ReLU is superior to other activation functions. From Table 3, we can see that the U-Net model trained with the ReLU function produced better results than the ones with leaky ReLU and ELU function. The scores of performance metrics like accuracy, precision, sensitivity, and specificity are very close in models using ReLU and leaky ReLU, while the model with ELU has produced significantly lower scores. The dice coefficients are 0.6501, 0.6306, and 0.2696 for ReLU, leaky ReLU, and ELU, respectively. Leaky ReLU activation function was slightly outperformed by ReLU activation function. ELU produced the worst results on our model. Though the ReLU activation function has better results, the time taken to train the model using ReLU is the highest. Leaky ReLU took less time than ReLU for training, but more time than ELU. From the above results, we can say that the ReLU activation function is better than leaky ReLU and ELU for this application in particular.
Table 3 Performance of our model on BraTS 2020 using ReLU, leaky ReLU, ELU Performance metric
ReLU
Leaky ReLU
ELU
Training time (in seconds)
7865
6481
1096
Loss
0.0165
0.0241
0.0710
Accuracy
0.9941
0.9930
0.9832
Mean_IoU
0.8351
0.8444
0.4401
Dice coefficient
0.6501
0.6306
0.2696
Precision
0.9943
0.9933
0.9832
Sensitivity
0.9927
0.9917
0.9832
Specificity
0.9981
0.9977
0.9944
Dice coefficient necrotic
0.6281
0.6193
0.0421
Dice coefficient edema
0.7901
0.7318
0.0964
Dice coefficient enhancing
0.6831
0.6540
0.0155
31 Comparison of Activation Functions in Brain Tumour Segmentation …
397
Fig. 6 Comparison of accuracy, sensitivity, and specificity of activation functions
6 Conclusion • In conclusion, we have seen why convolutional neural networks (CNN) are not efficient for image segmentation. We then described the need and working of U-Net. • The U-Net model has generated impressive results in accurately segmenting brain tumours from multimodal MRI scans in the BraTS 2020 challenge, with comparable performance to various other state-of-the-art models. • We proved that our model is robust to new datasets by testing it on BraTS 2019 data. • We conclude that the model trained with the ReLU activation function performed the best, followed by leaky ReLU. The model trained with ELU gave poor results. • The time taken to train the model using ReLU was the highest, followed by leaky ReLU, and ELU took the least time. • We thus conclude that the ReLU activation function is better than leaky ReLU and ELU for our application.
398
A. Nakhale and B. V. Rathish Kumar
References 1. Panayides AS, Amini A, Filipovic ND, Sharma A, Tsaftaris SA, Young A, Foran D, Do N, Golemati S, Kurc T, Huang K, Nikita KS, Veasey BP, Zervakis M, Saltz JH, Pattichis CS (2020) AI in medical imaging informatics: current challenges and future directions. IEEE J Biomed Health Inform 24:1837–1857 2. Flower MA (ed) (2012) Webb’s physics of medical imaging, 2nd edn. CRC Press 3. Wang G (2016) A perspective on deep imaging. IEEE Access 4:8914–8924 4. Jain AK, Mao J, Mohiuddin KM (1996) Artificial neural networks: a tutorial. Computer 29:31– 44 5. Osisanwo FY et al (2017) Supervised machine learning algorithms: classification and comparison. Int J Comput Trends Technol 48:128–138 6. Usama M, Qadir J, Raza A, Arif H, Yau KA, Elkhatib Y, Hussain A, Al-Fuqaha A (2019) Unsupervised machine learning for networking: techniques, applications and research challenges. IEEE Access 7:65579–65615 7. Dhal P, Azad C (2021) A comprehensive survey on feature selection in the various fields of machine learning. Appl Intell 52:4543–4581 8. Rizwan I Haque I, Neubert J (2020) Deep learning approaches to biomedical image segmentation. Inform Med Unlocked 18:100297 9. Albawi S, Mohammed TA, Al-Zawi S (2017) Understanding of a convolutional neural network. In: 2017 International conference on engineering and technology (ICET) 10. Saha S (2023) A comprehensive guide to convolutional neural networks — the ELI5 way. https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neuralnetworks-the-eli5-way-3bd2b1164a53. Last accessed 2023/1/23 11. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR) 12. Naoki (2023) Up-sampling with transposed convolution. https://naokishibuya.medium.com/ up-sampling-with-transposed-convolution-9ae4f2df52d0. Last accessed 2023/1/23 13. Lamba H (2023) Understanding semantic segmentation with UNET. https://towardsdatascie nce.com/understanding-semantic-segmentation-with-unet-6be4f42d4b47. Last accessed 2023/ 1/23 14. Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image segmentation. Lect Notes Comput Sci 234–241 15. Kansal S (2023) A quick guide to activation functions in deep learning. https://towardsda tascience.com/a-quick-guide-to-activation-functions-in-deep-learning-4042e7addd5b. Last accessed 2023/1/24 16. Monteiro FC, Campilho AC (2006) Performance evaluation of image segmentation. Lect Notes Comput Sci 248–259 17. Menze, Bjoern H et al (2015) The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans Med Imaging 34:1993–2024 18. Bakas S, Akbari H, Sotiras A, Bilello M, Rozycki M, Kirby JS, Freymann JB, Farahani K, Davatzikos C (2017) Advancing the cancer genome Atlas glioma MRI collections with expert segmentation labels and radiomic features. Sci Data 4 19. Bakas S et al (2018) Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the BRATS challenge. HAL (Le Centre Pour La Communication Scientifique Directe) 20. Bakas S et al (2017) Segmentation labels and radiomic features for the pre-operative scans of the TCGA-GBM collection. Cancer Imaging Arch 21. Bakas S et al (2017) Segmentation labels and radiomic features for the pre-operative scans of the TCGA-LGG collection. Cancer Imaging Arch 22. Ali M, Gilani SO, Waris A, Zafar K, Jamil M (2020) Brain tumour image segmentation using deep networks. IEEE Access 8:153589–153598
31 Comparison of Activation Functions in Brain Tumour Segmentation …
399
23. Jiang Z, Ding C, Liu M, Tao D (2020) Two-stage cascaded U-Net: 1st place solution to BraTS challenge 2019 segmentation task. In: Brainlesion: glioma, multiple sclerosis, stroke and traumatic brain injuries, 231–241 24. Zhao Y-X, Zhang Y-M, Liu C-L (2020) Bag of tricks for 3D MRI brain tumor segmentation. In: Brainlesion: glioma, multiple sclerosis, stroke and traumatic brain injuries, 210–220 25. McKinley R, Rebsamen M, Meier R, Wiest R (2020) Triplanar ensemble of 3D-to-2D CNNs with label-uncertainty for brain tumor segmentation. In: Brainlesion: glioma, multiple sclerosis, stroke and traumatic brain injuries, 379–387
Chapter 32
Detection of Alzheimer’s Disease Using Convolutional Neural Network D. J. Jovina
and T. Jayasree
1 Introduction Alzheimer’s disease (AD) is an unavoidable neurological disorder which destroys memory and thinking skills slowly [1]. The death of brain cells shrinks the total brain size since the brain tissue has fewer nerve cells and connections and causes memory loss and cognitive decline which lead to dementia [2]. Although there is no treatment to prevent the progression of Alzheimer’s, some medicines are used to alleviate the deterioration in the early stages of the disease [3]. People in the age of 30–60 are affected by early-onset Alzheimer’s disease and represents less than 5 percent of AD patients [4]. The early-onset AD cases are caused by an inherited gene change. In lateonset disease cases, the disease appears to develop without any specific cause [5]. The AD biomarkers obtained from structural brain imaging [6] with magnetic resonance imaging, molecular neuroimaging with PET and cerebrospinal fluid analyses, and the clinical profile of amnesia occur in the early stage of the disease, make it possible to identify the early stages of the AD. The stage of prodromal AD represents the concept named as mild cognitive impairment (MCI) [7, 8]. Dubois et al. [6] proposed a new framework for the diagnosis to find both the prodromal and the advanced dementia stages of the disease. In the article published by Bhushan et al. [9], AD and its clinical features have been briefly discussed and the four stages of Alzheimer’s disease such as pre-dementia, mild, moderate, and severe are also discussed. The symptoms of AD usually progress slowly, worsen over time, and become severe [10]. A major risk factor is increasing age, with people aged 65 and older having early-onset AD [11] and living an average of 8 years after symptoms become D. J. Jovina (B) · T. Jayasree Centre for Medical Electronics, CEG Campus, Anna University, Chennai, India e-mail: [email protected] T. Jayasree e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_32
401
402
D. J. Jovina and T. Jayasree
apparent to others. They survived for four to 20 years, depending on age and other health conditions. Cerebrospinal fluid (CSF) is produced at a constant rate of 0.3 mL/ min to circulate essential nutrients in the brain and help remove toxins from the brain. The rate of CSF production actually decreases in AD patients. Sahyouni et al. [12] stated that although the cause of this decrease is not fully known, it may involve in the adverse effects of disease. Various machine learning techniques have been introduced for the early detection of AD [13–15]. Folego et al. [16] deployed advanced deep learning methods to determine whether they can extract the AD biomarkers from structural magnetic resonance imaging (sMRI) and classify brain images into Alzheimer stage, mild cognitive impairment (MCI), and cognitively normal (CN) stages. Helaly et al. [17] designed a framework for early detection of AD and used medical image classification for various AD stages, in which four stages of the AD spectrum are multi-classified using deep learning approaches. Bayesian networks are also well-suited for early diagnosis of dementia, AD, and MCI [18], and Afzal et al. [19] utilized the Random Forest approach for the classification of Alzheimer’s disease. Sarraf et al. [20] used a CNN architecture, LeNet-5, to classify AD from the normal cognitive brain. Thus, the deep learning approach, especially, CNN has shown great success in AD diagnosis [21]. The convolutional layer in CNN is pre-trained with an auto-encoder and the performance is predicted to improve with fine-tuning [22]. Wang et al. [23] proposed a CNN framework that utilizes six convolutional layers for feature extraction and two fully connected layers for classification. A CNN framework proposed in [24] is based on a multi-modal MRI analytical method using functional magnetic resonance imaging (fMRI) data. The rest of the paper is described as follows: Sect. 2 describes the structure of CNN and its layers in detail. Section 3 tells the methodology used in this project and explains how the CNN is used in predicting the stages of Alzheimer’s disease from image dataset. Section 4 illustrates how the CNN predicts the stages of Alzheimer’s disease from original image dataset. The performance metrics are also analyzed. Finally, Sect. 5 concludes the work.
2 Structure of CNN A CNN is a deep learning technique which acquires an input image, assigns the learnable weights, and biases to various objects in the image. The CNN maps image pixels into neighborhood space, convolves feature maps with learned filters, and obtains all intermediate representations of the input image. Thus, the CNN performs feature extraction automatically and performs classification using fully connected layers which gives higher performance than other classifiers. Hence, the CNN is the combination of convolutional layers and neural network as shown in Fig. 1. The basic building block of CNN is a sequence of layers consisting of input layer, followed by convolutional, max-pooling, and finally dense layer. In image processing, the neural network is composed of input neurons that accepts the pixel matrix of
32 Detection of Alzheimer’s Disease Using Convolutional Neural Network
403
Fig. 1 Structure of CNN
the image and considers the required pixels into the system for further processing by subsequent layers. Convolutional layer is the main building block of CNN and is responsible for performing convolution operation. The element involved in the convolution operation is called the kernel/filter and is applied on an image repeatedly to extract the features. These results in an activation map called a feature map, which indicates the locations and strength of the required feature in the input image. In convolution process, the kernel goes over the input image, performing matrix multiplication element by element. A small size filter compared to the input data is considered for performing dot product. The dot product is the element-wise multiplication process between the 3 × 3 kernel and the kernel-sized input patch and is then summed to result a single value, and hence, this operation is the scalar product. An addition of empty pixels around the edges of an image known as padding preserves the original size of an image when it is applied to the convolution filter and enables the filter to perform full convolutions on the edge pixels. The concept of stride denotes how many steps we are moving in each step in convolution. Based on the stride rate, the kernel makes horizontal and vertical shifts. The idea behind stride is to skip every 2 or 3 pixels when the kernel slides over to reduce the spatial resolution and makes the network more computationally efficient. The general rule for the choice of stride is when the stride is 1 then the kernel is moved to 1 pixel at a time and preserve the spatial size with padding and when the stride is 2 then the kernel is moved to 2 pixels at a time and it affects the tensor shape after convolution, hence the whole network. Max-pooling layer is responsible for dimensionality reduction (i.e., size of the image) and also helps to decrease the computational power needed to process the image. Max pooling decides the maximum value from the portion covered by the kernel on the image. Therefore, the output of the max-pooling layer will be a feature map containing the most salient features from the previous feature map. Each neuron in dense layer receives input from all neurons in the previous layer and is used to classify the images based on features from convolutional layers.
404
D. J. Jovina and T. Jayasree
3 Proposed Framework Figure 2 depicts the proposed framework used for classifying and evaluating the stages of Alzheimer’s disease, which comprises three steps, namely, data acquisition, data augmentation, and classification and evaluation. The machine learning approach CNN is used for classification and evaluation of AD stages. The CNN model uses five convolutional layers in a two-dimensional array by convolutional layers (sized: 3 × 3), with five max-pooling layers (sized: 2 × 2) and five dense layers. There is a dropout layer which is used after four convolutional layers and max-pooling layers. After the last set of convolutional layer and max-pooling layer, a flatten layer followed by two dense layers is used. Then, three sets of dropout layers and three dense layers are used alternatively. The activation function known as Rectified linear layer (ReLU) is used for the intermediate layers and the final dense layer uses a Softmax activation function to handle the four stages of AD.
3.1 Data Acquisition The dataset consists of 6400 MRI scan images divided into four classes, NonDementia, Very Mild Dementia, Mild Dementia, and Moderate Dementia. NonDementia class contains 3200 images, Very Mild Dementia has 2240 images, Mild Dementia includes 896 images, and Moderate Dementia class has 64 images. All MRI scan images were considered with a size of 224 × 224 in 2D format.
3.2 Data Augmentation Data augmentation is important in deep learning process, as it is not feasible to collect the lot of images in deep learning. It helps us to increase the size of the image dataset. The operations of augmentation used in the work are rotation, shifting, flipping, and changing the brightness level. Rotation operation rotates the image by a specified rotation degree of 40. Shifting is used to transform the orientation of the image. In this work, horizontal and vertical shifts are applied. Flipping process allows us to flip the orientation of the image. We used horizontal flip, vertical flip, and the combination of both. The feature named, changing the brightness level, is used to
Fig. 2 Block representation of proposed framework
32 Detection of Alzheimer’s Disease Using Convolutional Neural Network Table 1 Operations performed in image augmentation
Operations
405
Results
Image rotation
Image shifting (hshift. vshift)
Image flipping (hflip, vflip)
Image shearing
Image brightness
make illumination changes. When we encounter a scenario where some of the image dataset is having a similar brightness level, then by augmenting the images, we can make sure that the dataset is robust. In this work, the moderate class images in the dataset are not sufficient for training, these images are maximized using data augmentation techniques, and the results are shown in Table 1.
3.3 Classification and Evaluation In this step, the classification of MRI images is performed based on different stages of Alzheimer’s disease and the performance metrics are also evaluated. In classification step, the medical image classification is done with the simple CNN architecture that deals with 2D MRI brain scans based on 2D convolution process. The model created based on the CNN architecture is evaluated according to performance metrics such as accuracy and loss.
406
D. J. Jovina and T. Jayasree
4 Results and Discussion The image dataset contains 6400 MRI images in which 5121 MRI images are used for training and validation and 1279 MRI images are used for testing. The dataset that is used for training and validation is based on 80% and 20%. Hence, the parameter “validation_split” is set to 0.2. Out of 5121 images, 4097 images are used for training and 1024 images are used for validation. Table 2 depicts the details of image dataset used. Each set of images is classified into four classes, namely, Non-Demented, Very Mild Demented, Mild Demented, and Moderate Demented. The image size considered is 224 × 224. Some sample images are shown in Fig. 3. The work is implemented using Python, one of the widely used programming languages. Python has amazing libraries and tools help in achieving the task of image processing very efficiently. For example, NumPy library can perform simple image techniques, such as image flipping, feature extraction, and also for analyzing them. Table 3 shows the summary of the CNN model used in this project. The optimizers’ Table 2 Image dataset
AD stage
Test
Non-demented
2560
640
Very wild demented
1792
448
717
179
52
12
Mild demented Moderate demented
Fig. 3 Sample images
Training
32 Detection of Alzheimer’s Disease Using Convolutional Neural Network Table 3 Model of CNN
407
Model: “Sequential” Layer (type)
Output shape
Param #
Conv2d
(222,222,16)
448
Max_pooling2d
(111,111,16)
0
Conv2d_1
(109,109,32)
4640
Max_pooling2d_1
(54,54,32)
0
Conv2d_2
(52,52,64)
18,496
Max_pooling2d_2
(26,26,64)
0
Conv2d_3
(24,24,128)
73,856
Max_pooling2d_3
(12,12,128)
0
dropout
(10,10,256)
295,168
Max_pooling2d_4
(5,5,256)
0
flatten
(6400)
0
dense
(512)
3,277,312
dense_1
(256)
131,328
dropout_1
(256)
0
dense_2
(128)
32,896
dropout_2
(128)
0
dense_3
(64)
8256
Dropout_3
(64)
0
dense_4
(32)
2000
Dense_5
(4)
132
“adam” and “categorical_crossentropy” loss function are used for compiling the CNN model, and for 50 epochs, the network is trained. Adam is first-order gradient-based optimization of stochastic objective functions that can be used to update network weights of the weighted layers of CNN model iterative based in training data. Adam adapts the learning rate parameter based on the average of first moment and the second moments of the gradients.
4.1 Image Augmentation Data augmentation is a technique to increase the size of the training dataset using existing dataset. The operations of augmentation used in this project are rotation, shifting, flipping, and changing the brightness level. These operations are used as hflip, vflip, hshift, vshift, rotation, shear, hvflip, bright, bright_rot. The details are mentioned in Table 4. Some of the augmented images are shown in Fig. 4. The training accuracy is compared with validation accuracy. Similarly, training loss is also compared with validation loss. Figure 5 shows the plots of training
408 Table 4 Augmentation parameters
D. J. Jovina and T. Jayasree
Operation
Value/range
hflip
True
vflip
True
hshift
−40, 40
vshift
−40, 40
Rotation
90 deg
Bright
0.2, 1
Fig. 4 Images after augmentation
accuracy vs. validation accuracy and training loss vs. validation loss. The training accuracy and validation accuracy reach approximately 96% after 50 epochs. The test set gives 66% accuracy. Cross-entropy loss function is one of the optimization processes used to train classification models that classify the data by predicting the probability that the data belong to one class or the other. Categorical cross-entropy used in this work is the type of loss function for multiclass classification where two or more output labels are required. The output label is assigned as one-hot category encoding value which is in
32 Detection of Alzheimer’s Disease Using Convolutional Neural Network
409
Fig. 5 Comparison of training and validation accuracies and losses
the form of 0 s and 1 s. This output label is converted into categorical encoding value using the function named keras.utils to_categorical. Formula used for categorical cross-entropy is: −
1 1s∈c log p(s ∈ c), N s∈S c∈C
(1)
where S—samples, C—classes, s ∈ S, c ∈ C—sample belongs to class c. The stages of Alzheimer’s disease are considered as follows: “Mild Dementia”, “Moderate Dementia”, “Non-Dementia”, “Very Mild Dementia”. Figure 6 shows the predicted and actual images of each class.
Fig. 6 Predicted and actual images based on AD stages
410
D. J. Jovina and T. Jayasree
5 Conclusion In this work, the detection of stages of AD has been performed using CNN architecture using MRI brain images. The model is trained, validated, and tested with MRI brain image dataset. The results are analyzed with the help of various performance metrics. The classification of AD stages using CNN model has shown promising results. Since the moderate class images in the dataset are not sufficient for training, these images may be maximized using data augmentation techniques to increase the size of the dataset for improved accuracy in the classification and evaluation.
References 1. Swapna G, Mahitha M, Teja Sree P (2023) Basics of Alzheimer’s disease and dementia. Austin Alzheimer’s Parkinson’s Dis 6(1):1035 2. Duong S, Patel T, Chang F (2017) Dementia: what pharmacists need to know. Can Pharm J (Ott). 150(2):118–129 3. Yiannopoulou KG, Papageorgiou SG (2013) Current and future treatments for Alzheimer’s disease. Ther Adv Neurol Disord 6(1):19–33 4. Mendez MF (2019) Early-onset Alzheimer’s disease and its variants. Continuum (Minneap Minn) 25(1):34–51 5. Rabinovici GD (2019) Late-onset Alzheimer’s disease. Continuum (Minneap Minn) 25(1):14– 33 6. Dubois B, Picard G, Sarazin M (2009) Early detection of Alzheimer’s disease: new diagnostic criteria. Dialogues Clin Neurosci 11(2):135–139 7. Petersen RC, Smith GE, Waring SC, Ivnik RJ, Tangalos EG, Kokmen E (1999) Mild cognitive impairment: clinical characterization and outcome. Arch Neurol 56:303–308 8. Dubois B, Albert ML (2004) Amnestic MCI or prodromal Alzheimer’s disease? Lancet Neurol 3:246–248 9. Bhushan I, Kour M, Kour G, Gupta S, Sharma S, Yadav A (2018) Alzheimer’s disease: causes and treatment—a review, Ann Biotechnol 1(1):1002, 1–8. 10. Jackson M, Pelone F, Reeves S, Hassenkamp AM, Emery C, Titmarsh K, Greenwood N (2016) Interprofessional education in the care of people diagnosed with dementia and their carers: a systematic review. BMJ Open 6(8):e010948 11. Wattmo C, Wallin ÅK (2017) Early- versus late-onset Alzheimer’s disease in clinical practice: cognitive and global outcomes over 3 years. Alz Res Therapy 9:70 12. Sahyouni R, Verma A, Chen J (2017) Alzheimer’s disease decoded—the history, present, and future of Alzheimer’s disease and Dementia. World Scientific 13. Al-Shoukry S, Rassem TH, Makbol NM (2016) Alzheimer’s diseases detection by using deep learning algorithms: a mini-review. IEEE Access 4:1–11 14. Mirzaei G, Adeli A, Adeli H (2016) Imaging and machine learning techniques for diagnosis of Alzheimer’s disease. Rev Neurosci 27:857–870 15. Klöppel S et al (2008) Accuracy of dementia diagnosis—a direct comparison between radiologists and a computerized method. Brain 131(11):2969–2974 16. Folego G, Weiler M, Casseb RF, Pires R, Rocha A (2020) Alzheimer’s disease detection through whole-brain 3D-CNN MRI. Front Bioeng Biotechnol 8:534592 17. Helaly HA, Badawy M, Haikal AY (2021) Deep learning approach for early detection of Alzheimer’s disease. Cogn Comput 1–17 18. Seixas FL, Zadrozny B, Laks J, Conci A, Saade DCM (2014) A Bayesian network decision model for supporting the diagnosis of dementia, Alzheimer’s disease and mild cognitive impairment. Comput Biol Med 51:140–158
32 Detection of Alzheimer’s Disease Using Convolutional Neural Network
411
19. Afzal S, Javed M, Maqsood M, Aadil F, Rho S, Mehmood I (2019) A segmentation-less efficient Alzheimer detection approach using hybrid image features. In: Handbook of multimedia information security: techniques and applications. Springer, pp 421–429 20. Sarraf S, Tofghi G (2016) Classifcation of Alzheimer’s disease structural MRI data by deep learning convolutional neural networks, pp 8–12 21. Noor MBT, Zenia NZ, Kaiser MS, Al Mamun S, Mahmud M (2020) Application of deep learning in detecting neurological disorders from magnetic resonance images: a survey on the detection of Alzheimer’s disease, Parkinson’s disease, and schizophrenia. Brain Inf 7(1):11 22. Jarrett K, Kavukcuoglu K, Ranzato MA, LeCun Y (2009) What is the best multi-stage architecture for object recognition? Proceedings of IEEE International conference on computer vision, pp. 2146–2153. 23. Wang SH, Phillips P, Sui Y, Liu B, Yang M, Cheng H (2018) Classification of Alzheimer’s disease based on an eight-layer convolutional neural network with leaky rectified linear unit and max pooling. J Med Syst 42(5):85 24. Wang Y et al (2018) A novel multimodal MRI analysis for Alzheimer’s disease based on convolutional neural network. In: 2018 40th Annual International conference on IEEE engineering in medicine and biology society, pp 754–757
Chapter 33
Performance Evaluation of Multiple ML Classifiers for Malware Detection Md. Masroor Fahim, Mahbuba Sharmin Mim, Tahmid Bin Hasan, and Abu Sayed Md. Mostafizur Rahaman
1 Introduction With the hike of cyber world, the malware has had a deleterious impact to it. Thus, detecting malware is a critical aspect as it acts as an early warning system for potential intrusions and malicious attacks on computer system. It protects sensitive information and stops hackers from accessing the system. Malicious software can be created for many different reasons, including pranks, activism, cybercrime, espionage, and other serious crimes; however, the vast majority of malware is created to generate illicit profit. Several studies have been conducted in the detection of malware with the popular malware datasets and with the supervised learning techniques but with a few classifiers, while our paper illustrates the performance assessment with a greater number of classifiers (13 classifiers) along with the unique ensemble techniques. In our study, it provides a framework for recognizing the malware and benign apps by evaluating the precision, recall, F1-score, specificity, and accuracy defining the work of ours genuinely exceptional and durable.
Md. M. Fahim (B) · M. S. Mim · T. B. Hasan Department of Information and Communication Technology, Bangladesh University of Professionals, Dhaka, Bangladesh e-mail: [email protected] M. S. Mim e-mail: [email protected] T. B. Hasan e-mail: [email protected] A. S. Md. M. Rahaman Department of Computer Science and Engineering, Jahangirnagar University, Dhaka, Bangladesh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_33
413
414
Md. M. Fahim et al.
Several studies were regulated in regard to the detection of malware and the experts contributed outstanding comprehension to this domain. Some of the related works are defined in here in the section. Rana et al. [1] did a comparison study of various supervised ML classifiers. In their model, they used the Drebin dataset and there were 179 different families of malware. They attained an accuracy of 94.33% with RF classifier. Talha et al. [2] employed the ‘APK auditor’ permission-based detection method. They gathered information from a number of sources, including the Drebin dataset, Android Malware Genome Project, and the Contagious Mobile, to compile malware samples for their study. Using these datasets, they were able to attain 88% of accuracy along with 92.5% specificity. Sahs et al. [3] proposed a supervised machine learning strategy that employs Support Vector Machines (SVMs) to find malware on Android-based devices. Androguard and Scikit-learn tools like framework were used for the extraction process to get the necessary information from the APKs. Selamat et al. [4] proposed a paper a comparative analysis of several machine learning techniques for malware detection, which includes algorithms like KNN, DT, and SVM. Portable executable (PE) data were used in the study as a method of feature extraction, and with DT, it attained an accuracy of 99%. Nayanshi et al. [5] suggested a method that seeks to attain effective and efficient recognition. They integrated all categories into a feature vector after extracting features from different categories. As the detection rate was increased and the false-positive rate was reduced, the greatest performance was reached from this. Wu et al. [6] proposed DroidDolphin which is a framework designed to identify malware on Android systems and it attained a Fscore of 87.5% along with the precision of 86.15 using SVM. Akhtar et al. [7] created a model that utilizes machine learning methods to identify malware by measuring the difference in correlation symmetry between the classifiers’ integrals and attained 99% in DT. Zahra et al. [8] presented an API call analysis-based method for detecting malware, and in their model, they incorporated this technique as a feature and assessed how it affected the classification procedure. The evaluation’s findings show that in the best-case scenario, the Random Forest (RF) classifier had an accuracy of 98.4%. Based on the pertinent context research stated above, we proposed a framework that would employ thirteen ML classifiers to train a dataset to differentiate between benign and malicious apps. Various performance metrics such as recall, precision, specificity, F1-score, and accuracy have been used to evaluate the overall performance. We used the Variance Threshold, Pearson Correlation, PCA, and Mutual Information technique to extract the feature in order to enhance the accuracy of the desired classification model. Furthermore, we assessed the efficiency of several machine learning classifiers by utilizing individual analogies, combined some classifiers for the soft and hard votings in the different voting techniques, and the stacking of the different ML classifiers in the basis of heuristic.
33 Performance Evaluation of Multiple ML Classifiers for Malware Detection
415
2 Methodology The malware detection model of ours is depicted in Fig. 1. Firstly, we trained and tested 13 distinct machine learning classifiers utilizing performance metrics including the precision, recall, F1-score, specificity, and accuracy on an individual level. Secondly, several selective classifiers are randomly integrated to take part in the voting process in order to assess the accuracy of both the soft and hard voting classifiers, respectively. After that, we employed a heuristic method to create stacking classifiers by combining multiple classifiers, and the suggested model exhibited an enhancement in the gain of accuracy. The dataset that we used in the study is ‘DREBIN-215’ dataset. The dataset for this study was obtained from a website with a title named as ‘Android_malware_dataset_for_machine_learning_2’ [9] and was extracted from the paper Yerima et al. [10] proposed named as ‘DroidFusion’. The dataset is preprocessed after that in various ways, which includes Missing Data Handling, Removal of Constant Columns, Pearson’s Correlation, PCA, and Mutual Information Gain and SMOTE-TOMEK. The dataset was stratified and split into two portions which are the training (80%) and testing (20%), respectively. The trained model was applied with the test datasets for the evaluation of the performance metrics. The following contributions are made by the paper: (1) An assessment of the performance of the ML classifiers on an individual level.
Fig. 1 Proposed methodology
416
Md. M. Fahim et al.
(2) An assessment of the performance of classifiers utilizing (soft and hard) voting methods and is taken in with a group of four and eight ML classifiers randomly, respectively. (3) An assessment of the performance of classifiers utilizing an ensemble stacking method by taking the best ML classifiers’ combination.
3 Implementation 3.1 Data Acquisition/Collection The first step in our model creation approach is the acquisition of the dataset. It is named as DREBIN-215.csv. Initially, the structure of this dataset was 15,036 × 215. The features are mainly categorized with one of the following API call signature, Manifest Permission, Commands signature, or Intent. The dataset after preprocessing becomes 15,036 × 70 and is 6.6 MB of disk space.
3.2 Data Preprocessing Data preprocessing is an essential part for a model. While data usage, it is typical to run across noise, inconsistencies, missing or confusing values, and instances of similar values or constant column in each row creates a lot of disturbance and hampers the performance to a great extent. So, we took some approaches to tackle the case. Variance Threshold. VT can be defined as a feature selection method which removes the features which does not qualify for the variance threshold value. It is a technique to exclude the low informative features as well as the constant columns. Pearson Correlation Coefficient. The PCC is utilized to measure the strength of linear relationships between variables, but it is generally used to find the correlation between two independent variables leaving the target variable aside. The outcome of the PCC ranges from -1 to 1. In our model, we took 75% as the threshold Pearson threshold value. Principal Component Analysis. PCA can be defined as a learning method for lowering the dimensionality of data or reduction of the number of feature due to the overfitting problem and to achieve an optimal performance as well as the increase of the interpretability. Variation Inflation Factor. It is the measurement of multi-collinearity that exists in the set of data. Usually, the vif value above 5 or more indicates that there is a high correlation with the other independent data in the dataset. It is also used to find the severity of multi-collinearity between the variables.
33 Performance Evaluation of Multiple ML Classifiers for Malware Detection
417
Mutual Information. If there were two random numbers, then the non-negative number that determines the dependency of that two random numbers is the mutual information. It can be zero if the two random numbers are completely independent of each other and higher value will mean higher dependency. SMOTE-TOMEK. It is used for class imbalance problem and is an accumulation of two different oversampling techniques, SMOTE-TOMEK Links. In the model, the minority class is balanced by this technique.
3.3 Data Splitting The dataset of ours has been divided into two distinct parts: (1) data that would train the model—the training dataset—and (2) data that would test the model—the testing dataset. Here, we used 80–20% ratio for training and testing phases, respectively.
3.4 Machine Learning Algorithm Different types of ML classifiers have been applied to the model, but our target is to recognize the best classifiers for the model through hyperparameter tuning. The performance metrics are calculated for the classifiers. Logistic Regression (LR). Due to its lower computing complexity, it is employed in modeling and prediction. The algorithm is used to classify the likelihood that a binary (yes/no) outcome will occur. Support Vector Machine (SVM). This algorithm aims to locate an N-dimensional space hyperplane that can effectively differentiate between data points. K-th Nearest Neighbor (KNN). This ML algorithm is a non-parametric algorithm. It utilizes proximity to make predictions about the groups of data points [11]. Naïve Bayes (NB). It can be defined as a probabilistic classifier that applies Bayes’ theorem. It that utilizes Bayes’ rule with a strong assumption that the features are conditionally independent. Random Forest (RF). It creates a forest in the form of an ensemble of decision trees which introduces more randomness while growing the trees. The algorithm used in this model splits a node and searches for the optimal attributes’ node from a subset of attributes that is randomly chosen, thus adding diversity to the model, resulting in improved performance. Decision Tree (DT). It is also a non-parametric supervised algorithm utilized for both classification and regression problems. It gives predictions of current output based on how a previous set of questions were answered.
418
Md. M. Fahim et al.
Passive Aggressive (PA). This ML algorithm can moderate the weights with the new data that will enter. It was built to improve the performance of Perceptron algorithm. Extreme Gradient Boosting (XGB). It follows level-wise strategy and checks the gradient values. In XGB, instead of building sequential trees similar to GBDT, they are built in parallel. Light Gradient Boosting (LGM). It is based largely on decision trees as well as a leaf-wise algorithm and is mainly concerned with increment of efficiency and decreasing the use of memory. Stochastic Gradient Descent (SGD). In the SDG, it selects a random batch of values instead of taking all the values used by simple gradient descent of values in each iteration until minima is reached making it far more faster than the conventional ones. Perceptron. It is a linear ML algorithm used generally used for the classification problems. It is specially used for binary problems. Extra Trees (ET). It is an ensemble method where multiple de-correlated decision trees are gathered together to form a new forest which generates the classification output. It fits various random samples in number of randomized decision trees and outputs the average to avoid overfitting and get more accurate results. Ridge Classifier (RC). It is a method for examining linear discriminant models and used to avoid overfitting which is a type of regularization that penalizes model coefficients [12]. Voting Classifier. The voting classifier is a type of ensemble method that fuses several base models to assemble the final optimal solution [13]. It is categorized into two types—hard and soft. The hard voting method produces its final output based on the majority vote of the combined models and the soft voting method produces its final output based on the average probability which is highest in the final output. Stacking Classifier. It is as an ensemble model as well as known as a blending algorithm which combines several classification models and it does it by taking all the good classifiers which performs well or it takes the good classifiers to create a hybrid super-classification model [14].
4 Result Analysis 4.1 Measurement Metrics Confuse Matrix. The confusion matrix is a structural tabular view that is commonly used to demonstrate a classification model’s effectiveness. Every column signifies the occurrences in a given expected class, whereas every row signifies the occurrences
33 Performance Evaluation of Multiple ML Classifiers for Malware Detection Table 1 Confusion matrix
419
Predictive class Actual class
− ve
+ ve
− ve
TN
FP
+ ve
FN
TP
in a definite class. Before we can calculate the parameters, we must first understand the terms that make up the confusion matrix. 1. True Positive (TP): In these circumstances, we predicted true, and it turned out to be correct. 2. True Negative (TN): In these circumstances, we predicted false, and it turned out to be incorrect. 3. False positive (FP): In these circumstances, we predicted true, and it turned out to be incorrect. 4. False Negative (FN): In these circumstances, we predicted false, and it turned out to be the correct. Table 1 depicts the True Positive, True Negative, False positive, and False Negative in the dataset of the confusion matrix. ROC Curve. The performance of the classifier at each potential threshold is shown graphically by this unique curve. Precision–Recall Curve. It shows the adjustment or changes occurring in precision as well as recall.
4.2 Performance Results The evaluation metrics of the 13 ML classifiers have been measured on the basis of the dataset that was taken. It has been detected that all the classifiers except Naive Bayes did an excellent job keeping the accuracy more than 90%. The reason of NB model getting accuracy below 90% is due to its assumption can’t hold, almost constant. Regard of the data as well as can’t interpret the model complexity. It has been also observed that six ML classifiers’ accuracy is greater than 95% which indicates that the model has fitted nicely with the algorithms. Table 2 depicts the performance metric of all the multiple classification methods used. After calculating the base classifiers, we moved ahead and did analysis to check the performance of the soft and hard voting classifiers. Firstly, four classifiers were randomly chosen and blended to see the behavior of them in the combined form and were tested in both the soft and hard voting analogies. The result was not as satisfactory for both the analogies, but the hard voting classifier performed better than
420
Md. M. Fahim et al.
Table 2 Performance comparison of the different ML classifiers S. No. Classification methods
Precision (%)
Recall (%)
F1-score (%)
Specificity (%)
Accuracy (%)
1
Random forest 98.01
96.35
97.18
98.04
97.20
2
Decision tree
97.30
95.25
96.26
97.35
96.30
3
KNN classifier
96.36
96.41
96.38
96.38
96.38
4
SVM classifier
97.95
96.14
97.04
97.99
97.07
5
Logistic regression
95.71
94.24
94.97
95.77
95.01
6
Naïve Bayes
76.30
98.31
85.92
69.46
83.89
7
SGD classifier 95.45
94.19
94.82
95.50
94.85
8
XGB classifier
95.14
94.03
94.58
95.19
94.61
9
Passive aggressive
93.99
95.03
94.51
93.92
94.48
10
Extra trees
98.06
96.35
97.20
98.10
97.23
11
Ridge classifier
94.14
93.34
93.74
94.18
93.77
12
LGM classifier
97.84
95.72
96.77
97.88
96.80
13
Perceptron
97.23
89.01
92.94
97.46
93.24
that of the soft classifier but the individual ML classifiers’ performance was better than that of the combined. Again, we tried to incorporate four more classifiers with the existing four classifiers to evaluate the performance metrics but in vain as there was no considerable change in the classifiers performance. As a matter of fact, the performance of hard voting analogy degraded after the integration. The performance comparison of voting classifier consisting of four classifiers is shown in Table 3. Along with the confuse matrix in Fig. 2 and for the voting classifier consisting of eight classifiers, the comparison and confuse matrix are shown in Table 4 and Fig. 3, respectively. The performance of the voting classifier was not that considerably great of what we were looking. So, at last, the stacking classification was implemented and heuristics were used to form the eight best models for the enhancement of the performance. In this last comparison, the best performance classifier accumulated to form the eight models and the analysis of the performances was performed thereafter. In all eight models, we utilized the ‘Logistic Regression’ classifier as the final estimator classifier. The performance comparison of these eight models is depicted in Table 5. In the model-4 (consisting of RF, ET, DT, LGM, and SGD classifiers) provided the best accuracy score among all the models of stacking classifiers with an accuracy of 97.30% along with the other metrics enhanced too, which indicates an enhancement
Extra trees
XGB classifier
Naive Bayes
3
4
Random forest
1
2
Naive Bayes
4
Hard
XGB classifier
3
Random forest
Extra trees
Soft
1
ML (classifiers)
2
Voting classifier
SL
83.89
94.61
97.23
97.20
83.89
94.61
97.23
97.20
Indi. accuracy (%)
95.58
96.82
98.31
96.47
98.36
96.46
0
95.45
96.83
1
Recall (%)
0
1
Precision (%)
Table 3 Performance comparison of voting classifier consisting of four classifiers
96.95
96.64
0
96.86
96.65
1
F1-score (%)
98.36
96.46
Specificity (%)
96.91
96.65
Accuracy (%)
33 Performance Evaluation of Multiple ML Classifiers for Malware Detection 421
422
Md. M. Fahim et al.
Fig. 2 Confuse matrix of voting classifier consisting of four classifiers
of the overall performance than the individual base classifiers as well as much more than that of the accuracy of voting classifier that we have seen earlier. Model 3 and model 5 also performed much well and these models’ accuracy is more than the individual ML classifiers along with the other performance evaluation metrics.
Naive Bayes
Logistic regression
LGM classifier
KNN classifier
SVM classifier
5
6
7
8
Random forest
1
4
SVM classifier
8
XGB classifier
KNN classifier
7
3
LGM classifier
6
Extra trees
Logistic regression
5
Hard
Naive Bayes
4
2
XGB classifier
3
Random forest
Extra trees
Soft
1
ML (classifiers)
2
Voting classifier
SL
97.07
96.38
96.80
95.01
83.89
94.61
97.23
97.20
97.07
96.38
96.80
95.01
83.89
94.61
97.23
97.20
Individual accuracy (%)
95.28
96.15
98.15
97.69
98.20
97.73
95.14
96.09
1
0
0
1
Recall (%)
Precision (%)
Table 4 Performance comparison of voting classifier consisting of eight classifiers
96.72
96.93
0
96.62
96.88
1
F1-score (%)
98.20
97.72
Specificity (%)
96.70
96.91
Accuracy (%)
33 Performance Evaluation of Multiple ML Classifiers for Malware Detection 423
424
Md. M. Fahim et al.
Fig. 3 Confuse matrix of voting classifier consisting of eight classifiers
Table 5 Performance of stacking classifier Final Estimator Sl.
Classification Methods
LR
LR
LR
LR
LR
LR
LR
LR
M 1
M 2
M 3
M 4
M 5
M 6
M 7
M 8
Precision (%)
Recall (%)
F1_Score (%)
Specificity (%)
Accuracy (%)
1
RF
98.01
96.35
97.18
98.04
97.2
2
SVM
97.95
96.14
97.04
97.99
97.07
3
LR
95.71
94.24
94.97
95.77
95.01
4
DT
97.3
95.25
96.26
97.35
96.3
5
KNN
96.36
96.41
96.38
96.35
96.38
6
GNB
76.3
98.31
85.92
69.46
83.89
7
SGD Classifier
95.45
94.19
94.82
95.5
94.85
8
XGB Classifier
95.14
94.03
94.58
95.19
94.61
9
93.99
95.03
94.51
93.92
94.48
10
Passive Aggressor Extra Trees
98.06
96.35
97.2
98.1
97.23
11
Perception
93.24
89.01
92.94
97.46
93.24
12
LGM Classifier
97.84
95.72
96.77
97.88
93.77
13
Ridge Classifier
94.14
93.34
93.74
94.18
93.77
14
Model(version 1)
97.91
96.35
97.12
97.93
97.15
15
Model(version 2)
97.8
96.41
97.1
97.83
97.12
16
Model(version 3)
98.17
96.3
97.23
98.2
97.25
17
Model(version 4)
98.2
96.35
97.23
98.2
97.3
18
Model(version 5)
98.17
96.3
97.23
98.2
97.25
19
Model(version 6)
97.8
96.35
97.07
97.83
97.09
20
Model(version 7)
97.91
96.3
97.1
97.93
97.12
21
Model(version 8)
97.79
95.93
96.85
97.83
96.88
In Fig. 4, apart from NB classifier, the ROC curve of all is more than 0.92 and all the lines are considerably in the top-left corner of the graph suggesting a good fit as well as low FP rate with high TP rate, and for the precision–recall curve apart from NB, all the classifiers are at top-left corner suggesting high precision as well as recall which also indicates the low FP rate which suggests a great performance for the classifiers to detect malware. In Figs. 5, 6, and 7, the curve of accuracy, F1score, and specificity is illustrated, respectively. In the accuracy curve, the model-4 achieved the best performance with 97.30% and ET as base performed the best. For the F1-score, the model-4 attained 97.3% along with models 3 and 5 and again ET performed the best as base classifier here again. For the specificity, the model -4 again bagged the most with 98.2% specificity along with the two which are models 3 and 5, respectively.
33 Performance Evaluation of Multiple ML Classifiers for Malware Detection
Fig. 4 ROC and precision–recall curve of all the classifiers along with the models
Fig. 5 Accuracy comparison curve of all models along with base classifiers
425
426
Md. M. Fahim et al.
Fig. 6 F1-score comparison curve of all models along with base classifiers
Fig. 7 Specificity comparison curve of all models along with base classifiers
5 Conclusion Malware detection has become an essential part of cyber life with the advancement of science as well as the widespread of mobile and other devices related to the cyber field. Thus, to ensure the safety from malware, various kinds of datasets concerning Android malware have been researched and analyzed for a safe future in this field. In our research, we evaluated the performance of several classifiers to recognize the best performance among all the classifiers for the best fit of the dataset. Among the individual classifiers, Extra Trees algorithm attained the highest accuracy along
33 Performance Evaluation of Multiple ML Classifiers for Malware Detection
427
with the other metrics such as precision, specificity, and F1-score. After analyzing the individual classifiers, we tried to combine the classifiers randomly to four and after that to eight classifiers at a time with voting classifiers, but we have seen that the performance was not that significantly good as expected and it was lower than some of the individual classifier performance that we obtained. After that, we combined the best performing classifiers which are above the accuracy of 94.5% to enhance the overall performance as well as to create a super hybrid classifier and the stacking classifier’s model-4 attained the best performance with an accuracy of 97.30% along with the other metrics such as precision, recall, F1-score, and specificity, and the model attained 98.2%, 96.35%, 97.23%, and 98.2%, respectively. Although the overall performance increased for the stacking models, the computation complexity increased as well for the models and is the limitation. But, lastly, the overall performance enhanced significantly in all evaluation metrics for the stacking classifiers.
References 1. Rana MS, Gudla C, Sung H (2018) Evaluating machine learning models for android malware detection—a comparison study. In: 2018 8th ICNCC. ACM, Taiwan, pp 17–21 2. Talha KA, Alper DI, Aydin C (2015) APK auditor: permission-based Android malware detection system. Digit Investig 13:1–14 3. Sahs J, Khan L (2012) A machine learning approach to android malware detection. In: 2012 The European intelligence and security informatics conference (EISIC). IEEE, Odense, Denmark, pp 141–147 4. Selamat NS, Ali FHM (2019) Comparison of malware detection techniques using machine learning algorithm. Indonesian J Electr Eng Comput Sci 16(1):435–440 5. Nayanshi M, Gunjan K, Sony S, Kumar A (2021) Malware detection Using ML. Int J Innov Res Technol 8(1):67–71 6. Wu WC, Hung SH (2014) DroidDolphin: a dynamic Android malware detection framework using big data and machine learning. In: 2014 conference on research in adaptive and convergent systems (RACS). ACM, New York, NY, USA, pp 247–252 7. Akhtar MS, Feng T (2022) Malware analysis and detection using machine learning algorithms. Symmetry 14(11):2304. https://doi.org/10.3390/sym14112304 8. Zahra S, Mahboobeh G, Ashkan S (2012) A miner for malware detection based on API function calls and their arguments. In: 16th CSI International symposium on artificial intelligence and signal processing (AISP), (May). IEEE, Shiraz, Iran, pp 563–568 9. Android malware dataset for machine learning 2. https://figshare.com/articles/dataset/And roid_malware_dataset_for_machine_learning_2/5854653. Last accessed 21 Jan 2022 10. Yerima SY, Sezer S (2019) DroidFusion: a novel multilevel classifier fusion approach for Android malware detection. IEEE Trans Cybernet 49(2):453–466 11. Jalal MM, Tasnim Z, Islam MN (2021) Exploring the machine learning algorithms to find the best features for predicting the risk of cardiovascular diseases. In: Vasant P, Zelinka I, Weber GW (eds) Intelligent computing and optimization (ICO) 2020. Advances in intelligent systems and computing, vol 1324. Springer, Cham 12. Ridge-classification-concepts-python-examples for ridge. https://vitalflux.com/ridge-classific ation-concepts-python-examples/. Last accessed 21 Jan 2022 13. The-voting-classifiers. https://www.codingninjas.com/codestudio/library/the-voting-classi fier. Last accessed 21 Jan 2022
428
Md. M. Fahim et al.
14. Tasnim A, Saiduzzaman M, Rahman M, Akhter J, Rahaman ASMdM (2022) Performance evaluation of multiple classifiers for predicting fake news. J Comput Commun 10(9):1–21
Chapter 34
An Analysis of Feature Engineering Approaches for Unlabeled Dark Web Data Classification Ashwini Dalvi , Vedashree Joshi, and S. G. Bhirud
1 Introduction There are several reasons why feature engineering can be beneficial to unlabeled data. A key benefit is that it can reveal hidden patterns and structures in data that are not immediately apparent. As a result, more informative and relevant features will be created, improving machine learning models’ performance. Furthermore, feature engineering can reduce the dimensionality of unlabeled data, which makes it useful for unlabeled data. Thus, it can be more manageable and easier to analyze, and overfitting can be reduced. Additionally, feature engineering improves the interpretability of results by enhancing them with domain-specific knowledge, e.g., TF-IDF can help identify the essential words in a text document, giving insight into its contents. A feature engineering step can be beneficial when working with unlabeled data since it is essential in machine learning. A good machine learning model can be created by identifying patterns and structures in the data, reducing dimensionality, and improving interpretability. Feature engineering converts the collected raw data into meaningful and insightful features using machine learning/statistical methods. Researchers attempted diverse
A. Dalvi · S. G. Bhirud Veermata Jijabai Technological Institute, Mumbai 400077, India e-mail: [email protected] S. G. Bhirud e-mail: [email protected] V. Joshi (B) K. J. Somaiya College of Engineering, Vidyavihar, Mumbai 400077, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_34
429
430
A. Dalvi et al.
Fig. 1 Feature engineering tasks
use cases with feature engineering and machine learning [1, 2]. The primary advantage of introducing new features in the training dataset is to help improve the model’s accuracy and simplify and speed up data transformations. In addition, one can use visualization to distinguish between the features or statistical methods to analyze the data. A brief overview of the feature engineering process in classification/prediction is shown in Fig. 1. The first step is obtaining raw data from diverse sources containing structured and unstructured data. Step two is data cleaning and transformation. It is essential to clean raw data, i.e., remove duplicates, null values, and wrongly formatted values from the dataset. This data is still in its primitive form. Hence it is further structured, cleansed, and transformed to create desired features for modeling. Step three is feature engineering. Feature engineering methods are applied, like Bag of Words, TF-IDF, and BM25. These features are then appended to the training datasets. Next, feature selection and extraction are performed on it. Finally, the selected features are fed as input to the learning algorithm. In step four, models are created. The learning algorithms are then iteratively used to evaluate the performance measure of the quality of critical features in the model. The final step is to gain meaningful insights and interpretations from the selected features. The method may constitute classification or prediction. Several methods can be used, such as association rules, clustering, linear regression, and logistic regression. The proposed work offered to apply feature engineering approaches to unlabeled dark web data. Authors proposed a method for unlabeled text classification on crawled dark web data using feature engineering techniques such as Bag of Words, TF-IDF, and BM25. A Latent Dirichlet Allocation (LDA) model is trained to detect latent topics using the extracted features. Log perplexity was computed to compare each feature engineering technique’s performance. The proposed method is effective at identifying latent topics from unlabeled text data. The paper is organized into sections: Sect. 2 covers background work. Then, Sects. 3 and 4 discussed methodology and results. Finally, Sect. 5 concludes the paper.
34 An Analysis of Feature Engineering Approaches for Unlabeled Dark …
431
2 Background Work Natural language processing (NLP) tasks often use feature engineering approaches such as Bag of Words, TF-IDF, and BM25.
2.1 Bag of Words A Bag of Words involves representing text as a bag of its words, disregarding grammar and order but keeping track of how many times each word appears throughout the text. It is easy to implement and is often used as a starting point for constructing more advanced NLP models [3].
2.2 TF-IDF Another feature engineering approach for NLP tasks is term frequency-inverse document frequency (TF-IDF). It measures the importance of a word in a collection or corpus based on its numerical value. Term frequency is calculated as the product of inverse document frequency and term frequency. It is commonly used in text mining and information retrieval to improve the effectiveness of text-based searches. The term frequency denotes the relative frequency of the words appearing in the document with the rest. It is calculated as follows: TF =
Frequency of appearance of the term in the document Frequency of all terms in the document
The inverse document frequency calculates the ratio of the total documents in the corpus to those in the corpus containing the term in question. IDF =
log (Number of documents in the corpus) Number of documents in the corpus containing the term
Finally, the TF-IDF score is calculated as follows: TF-IDF = TF ∗ IDF The TF-IDF feature weighting method can be used to study its effectiveness and limitations using unstructured datasets [4].
432
A. Dalvi et al.
2.3 Okapi BM 25 Okapi BM (Best Match) 25 is quite similar to TF-IDF. It is a ranking function used in information retrieval, e.g., search queries often use it to rank similar documents. It also sheds light on the word’s relevance, unlike just the relative concentration in the case of TF-IDF. The proposed work referred to the literature which employed feature engineering approaches for dark web content classification [5–11]. Researchers developed a machine learning method that uses relevant laws and regulations as training data to classify illegal activity on the dark web [12]. The work combined feature engineering techniques like TF-IDF with the Naive Bayes classifier, achieving a high accuracy of 0.935. Researchers have proposed a new model called a ‘Fusion NN-S3VM’ that combines a neural network and support vector machines to improve the accuracy of predicting criminal activity on the dark web [13]. This model is designed to handle better the uncertain and complex nature of dark web data and may provide more accurate predictions of user behavior. It is often difficult to analyze dark web data, consisting of short and extended texts and various languages, jargon, and slang. In addition, sparsity in short texts frequently challenges natural language processing (NLP). Nevertheless, several methods can deal with sparse data [14]. Through classification, valuable insights can be extracted from the vast amounts of dark web data available to understand the dark web. Further, researchers can attempt efforts like [15] to employ transfer learning to optimize the classifier’s performance.
3 Methodology It is challenging to classify dark web data since it lacks labeled data and is anonymous and constantly changing. Traditionally supervised classification frameworks do not work well in dark web environments, so researchers have used unsupervised and semi-supervised methods. Although research on the classification of dark websites is not comprehensive, there is still room for improvement and new approaches to be developed. The methodology of the experiment conducted in this paper, as shown in Fig. 2, is as follows: 1. Data Preprocessing Firstly all the unwanted columns are dropped. Then, various data preparation steps are applied, like removing punctuations, stop words, and lemmatization. The raw data obtained has several irrelevant columns. Therefore, only the needed columns—text in our case—are retained, and the rest are dropped. 2. Preparation of Feature-Specific Corpus (Data Transformation)
34 An Analysis of Feature Engineering Approaches for Unlabeled Dark …
433
Fig. 2 Flowchart of methodology
The dataset consists of unlabeled text data. Authors initially explored three prominent feature engineering techniques—Bag of Words, term frequency, inverse document frequency (TF-IDF), and Okapi BM25. The corpus and dictionary (id2word) for each feature were thus created. 3. Application of Unsupervised Learning Algorithm-LDA Topic Modeling for each feature: An unsupervised machine learning algorithm—Latent Dirichlet Allocation (LDA) is applied for each corpus fed into the model. 4. Latent Dirichlet Allocation Latent Dirichlet Allocation is a popular topic modeling technique that clusters documents to find the document’s topic. It is an unsupervised learning algorithm that derives the key topics in a collection of documents. 5. Performance Evaluation of the Model for Each Feature The models’ performance concerning the features mentioned above is measured using the performance measure model perplexity. Model perplexity score is a performance evaluation measure statistically determining the LDA model’s prediction power. The lower the value, the better the performance of the model.
4 Results and Discussions 1. Data Preprocessing The data is first cleaned and preprocessed. Further, a dictionary with all the words in the dataset is created. Finally, the words, along with their word id, were generated. {‘abuse’: 0, ‘access’: 1, ‘code’: 2, ‘contribute’: 3, ‘find’: 4, ‘hide’: 5, ‘index’: 6, ‘indexing’: 7, ‘information’: 8, ‘material’: 9, ‘need’: 10, ‘network’: 11, ‘onion’: 12, ‘possible’: 13, ‘remove’: 14, ‘report’: 15, ‘search’: 16, ‘see’: 17, ‘service’: 18, ‘soon’: 19, ‘source’: 20, ‘tor’: 21,…….}.
434
A. Dalvi et al.
Fig. 3 Sample of the Bag of Words scores for document 0
2. Preparation of Feature-Specific Corpora (Data Transformation) The authors implemented the feature-specific corpora for each feature—Bag of Words, TF-IDF, and Okapi BM 25. Bag of Words As shown in Fig. 3, the Bag of Words for the first few words from document 0 was documented. The Y axis represents the frequency of words, and the X axis represents the words in the document. TF-IDF As shown in Fig. 4, the TF-IDF scores for the first few words from document 0 were documented. The Y axis represents the weights (TF-IDF scores) of words, and the X axis represents the words in the document. Okapi BM 25 As shown in Fig. 5, the Okapi BM 25 scores for the first few words from document 0 were documented. The Y axis represents the weights (BM 25 scores) of words, and the X axis represents the words in the document. 3. Application of Unsupervised Learning Algorithm: LDA Topic Modeling for Each Feature Now that the features are implemented, the three feature-specific corpora are evaluated based on their performance in the LDA model to determine the best- performing feature engineering technique. The LDA model is built with different topics, each derived from a combination of keywords. The keywords for every topic and the significance (weight) of each keyword for each feature were recorded.
34 An Analysis of Feature Engineering Approaches for Unlabeled Dark …
435
Fig. 4 Sample of the TF-IDF scores for document 0
Fig. 5 Sample of the Okapi BM 25 scores for document 0
The distribution of words with their weights for topic 0 for different features: Bag of Words, TF-IDF, and Okapi BM 25 was plotted against each other, as shown in Figs. 6, 7, and 8 respectively. The Y axis represents the weights, and the X axis represents the words belonging to topic 0. 4. Performance Evaluation The LDA model for each feature-specific corpus was evaluated. A low log perplexity score commensurates to a high performance. The model gave log perplexity scores given in table 1 as follows.
436
A. Dalvi et al.
Fig. 6 Sample of weights for topic 0 for Bag of Words
Fig. 7 Sample of weights for topic 0 for TF-IDF
It was observed that the TF-IDF corpus gave the lowest perplexity score among all the other feature corpora. Thus, it was concluded that TF-IDF had a higher probability of enhancing the model’s performance out of all the feature engineering techniques.
34 An Analysis of Feature Engineering Approaches for Unlabeled Dark …
437
Fig. 8 Sample of weights for topic 0 for Okapi BM25
Table 1 Log perplexity scores
Feature
Log perplexity score
Bag of words
−7.019411393041132
TF-IDF
−9.009516741585388
Okapi BM 25
−7.5558534250399925
5 Conclusion and Future Work The paper proposed a method for the unsupervised classification of crawled dark web data in its raw HTML format. First, feature engineering techniques such as Bag of Words, TF-IDF, and BM25 are used to identify latent topics from unlabeled text data. Then the Latent Dirichlet Allocation (LDA) model is trained on the extracted features. On evaluating the performance, the authors concluded that TF-IDF performed better among selected feature engineering approaches for unlabeled dark web data. As for the future scope, various other models apart from the LDA model can be created and trained to examine further and analyze the performance of the feature engineering techniques. With the development of many more new features, enhancement of model performance can be implemented and tested by comparing more feature engineering techniques. Classifying dark web data will help organizations better understand malicious activities and threats.
438
A. Dalvi et al.
References 1. Leierzopf E, Kopal N, Esslinger B, Lampesberger H, Hermann E (2021) A massive machinelearning approach for classical cipher type detection using feature engineering. In: International conference on historical cryptology, pp 111–120 2. Yuan R, Xue D, Xu Y, Xue D, Li J (2022) Machine learning combined with feature engineering to search for BaTiO3 based ceramics with large piezoelectric constant. J Alloys Compd 908 3. Qader WA, Ameen MM, Ahmed BI (2019) An overview of bag of words; importance, implementation, applications, and challenges. In: 2019 International engineering conference (IEC). IEEE, pp 200–204 4. Das M, Kamalanathan S, Alphonse PJA (2021) A comparative study on TF-IDF feature weighting method and its analysis using unstructured dataset. In: COLINS, pp 98–107 5. Al Nabki MW, Fidalgo E, Alegre E, De Paz I (2017) Classifying illegal activities on tor network based on web textual contents. In: Proceedings of the 15th conference of the European chapter of the association for computational linguistics: volume 1, Long Papers, pp 35–43 6. Ghosh, Shalini, Phillip Porras, Vinod Yegneswaran, Ken Nitz, and Ariyam Das. “ATOL: A framework for automated analysis and categorization of the Darkweb Ecosystem.” In Workshops at the Thirty-First AAAI Conference on Artificial Intelligence, 2017. 7. Rajawat AS, Rawat R, Mahor V, Shaw RN, Ghosh A (2021) Suspicious big text data analysis for prediction—on darkweb user activity using computational intelligence model. In: Innovations in electrical and electronic engineering. Springer, Singapore, pp 735–751 8. Samtani S, Chai Y, Chen H (2022) Linking exploits from the dark web to known vulnerabilities for proactive cyber threat intelligence: an attention-based deep structured semantic model. MIS Q 46(2):911–946 9. Dalvi A, Raut SM, Joshi N, Bhuta DR, Nalla S, Bhirud SG (2022) Content labelling of hidden services with keyword extraction using the graph decomposition method. In: Using computational intelligence for the dark web and illicit behavior detection. IGI Global, pp 181–205 10. Dalvi A, Siddavatam I, Jain A, Moradiya S, Kazi F, Bhirud SG (2022) ELEMENT: text extraction for the dark web. In: Advanced computing and intelligent technologies. Springer, Singapore, pp 537–551 11. Alaidi AHM, Al_Airaji Roa’ AM, Haider TH, Al Rikabi, Aljazaery IA, Abbood SH (2022) Dark web illegal activities crawling and classifying using data mining techniques. Int J Interact Mobile Technol 16(10) 12. He S, He Y, Li M (2019) Classification of illegal activities on the dark web. In: Proceedings of the 2019 2nd International conference on information science and systems, pp 73–78 13. Rajawat AS, Bedi P, Goyal SB, Kautish S, Xihua Z, Aljuaid H, Mohamed AW (2022) Dark web data classification using neural network. Comput Intell Neurosci 14. Pradhan R, Sharma DK (2022) A hierarchical topic modelling approach for short text clustering. Int J Inf Commun Technol 20(4):463–481 15. Pradhan R, Sharma DK (2022) An ensemble deep learning classifier for sentiment analysis on code-mix Hindi–English data. Soft Comput 1–18
Chapter 35
Anomaly Detection to Prevent Sensitive Data Exposure Using GMM Clustering Model Shivangi Mehta, Lataben J. Gadhavi, and Harshil Joshi
1 Introduction Anomaly detection is the process of identifying data points, objects, observations, or events that do not fit within a group’s expected pattern. These anomalies are uncommon, but they could indicate a major and serious threat, such as cyber breaches or fraud. Anomaly detection includes employing a variety of strategies and procedures to detect certain anomalies. The techniques involve artificial intelligence, machine learning as well as the statistical approaches. The process of detecting anomalies is known as anomaly detection. It is difficult since it necessitates a thorough examination of numerous factors [1, 2]. This study will use cybersecurity analytics to track and mitigate insider threats and targeted attacks, integrating big data tools with threat intelligence. The focus is on analyzing data to generate proactive security measures and to efficiently perform complex analysis to recognize pattern changes and correlations across various data sources. Confidential data can be exposed through data leakage or sensitive data exposure, which occurs when personal information is unintentionally revealed due to poor encryption or software loopholes, putting sensitive information such as birth dates, bank information, and user account information at risk [3].
S. Mehta Gujarat Technological University, Ahmedabad, Gujarat, India S. Mehta · H. Joshi (B) Charotar University of Science and Engineering, Changa, Gujarat, India e-mail: [email protected] L. J. Gadhavi Government Polytechnic College, Gandhinagar, Gujarat, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_35
439
440
S. Mehta et al.
Fig. 1 Cybersecurity analytics
Figure 1 illustrates a cybersecurity analysis approach to identify vulnerabilities and possible risks in applications. Correlation analysis can detect brute force attacks and anomalous ports. The challenge for IT professionals is to secure the vast amounts of data they have and will create, and security analytics on big data can prevent intellectual property theft and data vandalism.
2 Related Work Srivastava and Jaiswal [4] outlined a methodology for managing big data in order to create an efficient system and extract critical information from the massive amounts of data held by the sector. To deal with the massive volume of data, the IT industry relied on logs. Data used by the IT companies is only kept for a few days. Because the data that is stored is useful in terms of its worth, it is necessary to keep it in
35 Anomaly Detection to Prevent Sensitive Data Exposure Using GMM …
441
the form of logs for a long period. Security analytics, in comparison with other ways, provide a ‘richer’ cybersecurity scenario by separating what is ‘normal’ from what is ‘abnormal,’ i.e., separating the patterns established by a valid customer from those formed by suspect or malicious customers. We can distinguish between regular and irregular network traffic using cybersecurity, where malicious traffic indicates a malware injection or active cyberattack [5]. Gaidarski and Kutinchev [6] stated that a comprehensive approach is required to achieve successful data protection. When it comes to the safeguarding of sensitive data, typical security solutions are insufficient. A new type of technology is required that can safeguard data from both inside and outside directions, while also dealing with large amounts of data. Data leak prevention (DLP) systems are examples of solutions that use a data-centric approach. DLP solutions are intended to prevent data leaks from the inside to the outside, whether deliberate or inadvertent, as a result of human error. To identify sensitive information, PII, or other forms of information that may be managed, banned, logged, etc., data can be scanned and compared with specified keywords and RegEx expressions. An approach proposed by Yang et al. [7], data masking always leads to a reduction in data quality. In their article, they proposed a system that would accurately and securely locate and safeguard sensitive government data in cyberspace. Through the use of automatic categorization and classification of government data, sensitive information can be quickly located in massive datasets in response to targeted commercial use cases. Data is accurately categorized, and then its attributes are determined in conjunction with particular business scenarios and data characteristics, in order to implement differentiated data privacy protection using artificial intelligence technology; sensitive data is automatically detected through rule matching NLP and artificial intelligence modeling, and its sensitivity level is computed. Multiple masking methods and strategies are compatible with auto-discovery. Mehrotra et al. [8] mentioned that the secure and effective way for the transmission of data is still a topic of research. Their research stated that some datasets are sensitive, while some are not sensitive. They have proposed a new secure technique in which sensitive data is transmitted in the encrypted form and the data that is not sensitive is transmitted in the clear text form. A transmitted query should be mapped to a series of queries over sensitive and non-sensitive data such that there occurs no leakage of the information. Gruschka et al. [9] stated that structured and unstructured data with large volumes can be analyzed to generate value. Because of the large amount, fast production, and variety of data, special storage and processing technologies are required. While processing big data offers many advantages, it also poses a number of privacy risks when dealing with personal data. Even in datasets that apparently lack personal connecting information, the more data there is, the more likely it is to reidentify individuals. Big data analysis might infer new information from ‘harmless’ personal data that is significantly more important and was not intended to be revealed by the person affected. As a result, corporations and other organizations have not always been willing to make this effort in the past, but that is changing as new privacy rules and regulations take effect. The blacklisted malicious IP addresses is fed into a detection mechanism by several sources of intelligence. IP blacklists can be used
442
S. Mehta et al.
to match the source and destination addresses of each connection to the IP blacklist. As a result, the intelligence streams are automatically updated each day, allowing for real-time detection [10]. Anomalies can result from various factors like human error, instrument faults, population variations, system behavioral patterns, or flaws. The handling of the anomaly varies according to the application area. The ability to detect anomalies promptly and in real time is crucial for success, especially in contexts like intrusion monitoring and fraud detection systems [3]. However, due to the complexity of data, each clustering technique has its own advantages and disadvantages [11]. For example, Chandola et al. [12] and Akoglu et al. [13] stated the research on anomaly detection. It doesn’t matter if the data is online or offline, or what it’s about, as long as substantial changes/outliers are being detected. Instead, we concentrate on detecting anomalies in a specific sort of data, such as logs, traces, or KPIs tracked on application services, online as it happens. Instead, Steinder and Sethi [14], Wong et al. [15], and Sole et al. [16] examine solutions for determining the likely reasons of software system failures. Steinder and Sethi [14] and Wong et al. [15], on the other hand, focus on computer networks and monolithic programmers, which are not multi-service applications.
3 Background Theory Security analytics is used to control how the website is communicated with by external clients and users, because internal traffic patterns are usually different from external traffic patterns. In the cases of phishing frauds, we wanted to make sure that the dataset was updated in a timely manner whenever attackers revised their URL creation approach [17]. The decision to put an IP address on a blacklist involves close consideration of different aspects of packet traffic information, as well as the behavioral history. The bulk of the new IP blacklisting security monitoring relies heavily on domain knowledge from specialists with experience [18]. The main reason behind blacklisting the IP addresses is to avoid the malicious users to access the website or web application. For detecting any connection to or from malicious IP, we must process the network traffic and match the source and destination IP addresses for each connection with IP blacklist [19]. To add more effectiveness toward achieving security, deep packet inspection can be used to block malware before exploitation of endpoints and other network properties. It can help filter out activity from ransomware, viruses, spyware, and worms. It also offers network-wide visibility that can be heuristically analyzed to detect suspicious traffic trends and alert security teams to malicious activity that is representative of current compromises. DPI basically does signature-based identification, application-based identification, and behavior-based identification [20]. One of the major entity for bringing the bulk of malicious traffic on the websites is onion router [21]. There are various ways to block Tor network such as detecting and blocking Tor bridges, blocking the IP addresses of nodes, whitelisting, probing actively, DPI, and flow analysis [22]. Tor features a
35 Anomaly Detection to Prevent Sensitive Data Exposure Using GMM …
443
policy that divides routers in the Tor network into three categories: entrance, middle, and exit. An automated machine learning system can be used to identify unreported dangerous domains, predict domains that can be used by attackers in the future, and adapt its identification method [23]. A NAT gateway computer can house a proxy, firewall, and an IDS, as well as a private network [24]. We need a list of all of the identified IPs of Tor exit nodes to detect Tor traffic. We don’t need any relay or entry node information, as they’ll never connect to our website.
4 Results To analyze the occurrence of various cyberattacks, we have analyzed different opensource datasets that are available on Kaggle. We analyzed KDD cyberattack dataset for browsing network traffic to classify or cluster cyberattacks [25], AWS Honeypot dataset for visualizing cyberattacks on web [26], UCI Malware Executable detection dataset for detecting malware from .exe files [27], and ICO security incidents for data breaches [28]. Figure 2 shows the count of cyberattacks in central government institutions by its incident type. Figure 3 shows the reasons for data leakage in various organizations such as healthcare, military, web, government, academics, transport, social network, advertising, etc. Figure 4 shows the type of cyberattacks on various sectors such as education, childcare, telecoms, online technology, finance, insurance, credit etc., by its incident type. Figure 5 shows malicious and non-malicious data extracted from .exe files.
Fig. 2 Count of cyberattacks in central government institutions by its incident type
444
Fig. 3 Reasons for data leakage in various organizations
Fig. 4 Types of cyberattacks on various sectors
S. Mehta et al.
35 Anomaly Detection to Prevent Sensitive Data Exposure Using GMM …
445
Fig. 5 Malicious and non-malicious data extracted from .exe files
With respect to the above-analyzed datasets, various reasons are figured out due to which data breaches have occurred and sensitive data has been exposed. Figure 6 shows the loopholes from the entire dataset through which sensitive data is exposed and Fig. 7 shows the filtered results that include really important loopholes through which sensitive data is exposed. Thus, the data is analyzed and visualized based on the historical cyberattacks. With the development of more advanced AI and ML algorithms, a new type of cyberattack has evolved in recent years. It is complex because it takes advantage of the abundance of disparate data made available by various online mechanisms, such as social networks, cookies, and the like. Combining data science, artificial intelligence, and machine learning, it enables highly precise and covert attacks. To counteract this trend in cyberattacks, a new generation of cybersecurity systems utilizing AI and machine learning is currently under development [29].
4.1 Applying Gaussian Mixture Models Clustering Algorithm It’s tough to train a vanilla CNN classifier when one class has a lot more observations than the other. To achieve high accuracy, the CNN classifier may assume that all observations are from the primary class. To make data more balanced, one solution is to use oversampling or downsampling. It’s also a good idea to alter class weights to push the classifier to deal with data from the rare class. When the data is
446
S. Mehta et al.
Fig. 6 Analysis of top word count from entire dataset
Fig. 7 Important word count filtered from entire dataset
extremely unbalanced, however, applying the previous strategies may result in model overfitting. As a result, we’ll look at another way, known as anomaly detection, to cope with this situation. We’ll treat the observations in the main class as regular data
35 Anomaly Detection to Prevent Sensitive Data Exposure Using GMM …
447
Fig. 8 Results obtained after applying GMM model
and train our model using only these observations. Then we can tell whether a new observation is normal or not. Keras and Scikit Learn are used to detect anomalies by importing various libraries such as VGG16, global average pooling 2D, Gaussian mixture, etc., to build the model. We use Keras and Scikit Learn to implement anomaly detection within a few lines of codes. First, import libraries to build the model. These four libraries are all we needed. Furthermore, we have transformed images into a better feature space by using pretrained models as feature representations. We get the output of the chosen layer using Keras and a VGG16 model that has been pretrained on ImageNet. To minimize dimensions, we run the output via a global average pooling layer. We employ a clustering model from Scikit Learn, the Gaussian mixture model (GMM), to create a one-class classifier. As a result, construct a GMM using only one component. The data closer to the center of a Gaussian distribution are more likely to be normal, as can be seen in Fig. 8. We can assess if an observation is normal or not by adjusting the distribution’s range. For implementing the model Keras’ MNIST dataset is used. We consider ‘1’ to be normal data and ‘7’ to be aberrant. As a result, we only use ‘1’ for training and both ‘1’ and ‘7’ for testing. We can quickly compute the likelihood of data using GMM’s score samples function. We can forecast our testing data by taking the mean likelihood of the training data plus three times the standard deviation as the threshold. To extract features from testing data, we employ the VGG model. The trained GMM is then used to compute the likelihood of the outcomes. Finally, if the chance of the observation is less than the threshold, we can discover anomaly. We have received normal accuracy of 0.98 and abnormal accuracy of 0.96. The scattergram is used to show the results, with the x-axis representing the data index and the y-axis representing the score. The threshold is plotted as a black line, with ‘1’ as blue points and ‘7’ as pink points. We can observe that the threshold may distinguish the majority of the points. That’s how we spot data that’s out of the ordinary. The visualization is shown in Fig. 9.
448
S. Mehta et al.
Fig. 9 Data visualization
Fig. 10 True positives and false positives
We can also look into the failure scenarios. The model is more prone to make mistakes when ‘1’ is more intricate and ‘7’ is too thin, as shown in Fig. 10.
5 Conclusion and Future Work The paper has shown that how applications are vulnerable to data exposure and how analyzing it can save the information to certain extent. It has been demonstrated that by studying big data, firms can predict future attacks and formulate appropriate countermeasures. Blacklisting the fraudulent IP addresses and filtering the traffic in the network can avoid the cyberattacks. As a result, using an automated strategy for data enrichment, threat evaluation, and reaction allows security teams to focus on the most serious risks, and unusual behavior can be spotted rapidly. The future work includes the remote monitoring of threats using the SIEM tools. In order to cluster IP addresses discovered in the dataset, both the GMM and K-means clustering techniques are applied in the research. It is observed that GMM outperformed Kmeans in terms of outcomes, notably for destination IP addresses by sharply defining the ellipsoidal data chunks. A tool can be implemented for combining process of network analysis with log management together so that analysis can be quickly carried out. Also, along with this, by using the machine-generated data one can work to get operational insights into threats, vulnerabilities, and identity information.
35 Anomaly Detection to Prevent Sensitive Data Exposure Using GMM …
449
References 1. Alabad M, Celik Y (2020) Anomaly detection for cyber-security based on convolution neural network: a survey. In: IEEE International conference on human-computer interaction, optimization and robotic applications (HORA). https://doi.org/10.1109/HORA49413020.915 2899 2. Pawar P, Palwe S, Munde S, Gadhave P, Pachouly S (2016) Privacy preservation and detection of sensitive data exposure over cloud. Int J Adv Res Comput Commun Eng 3. Hilal W, Andrew Gadsden S, Yawney J (2022) Financial fraud: a review of anomaly detection techniques and recent advances. Expert Syst Appl 193:116429. ISSN 0957-4174. https://doi. org/10.1016/j.eswa.2021.116429 4. Srivastava N, Jaiswal UC (2019) Big data analytics technique in cyber security. In: 3rd International conference on computing methodologies and communication (ICCMC) 5. Joshi H, Rathod D (2022) Internet of Things (IoT)-based distributed denial of service (DDoS) attack using COOJA network simulator. In: Senjyu T, Mahalle P, Perumal T, Joshi A (eds) IOT with smart systems. smart innovation, systems and technologies, vol 251. Springer, Singapore. https://doi.org/10.1007/978-981-16-3945-6_66 6. Gaidarski I, Kutinchev P (2019) Using big data for data leak prevention. In: Big data, knowledge and control systems engineering (BdKCSE) 7. Yang H, Huang L, Luo C, Yu Q (2020) Research on intelligent security protection of privacy data in government cyberspace. In: IEEE 5th International conference on cloud computing and big data analytics (ICCCBDA) 8. Mehrotra S, Sharma S, Ullman JD, Mishra A (2019) Partitioned data security on outsourced sensitive and non-sensitive data. In: IEEE 35th International conference on data engineering (ICDE) 9. Gruschka N, Mavroeidis V, Vishi K, Jensen M (2018) Privacy issues and data protection in big data: a case study analysis under GDPR. In: IEEE International conference on big data (Big Data) 10. Ghafir I, Prenosil V (2015) Blacklist-based malicious IP traffic detection. In: Proceedings of 2015 Global conference on communication technologies (GCCT) 11. Xu D, Tian Y (2015) A comprehensive survey of clustering algorithms. Ann Data Sci 2:165– 193. https://doi.org/10.1007/s40745-015-0040-1 12. Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):58. Article 15. https://doi.org/10.1145/1541880.1541882 13. Akoglu L, Tong H, Koutra D (2015) Graph based anomaly detection and description: a survey. Data Min Knowl Disc 29(3):626–688. https://doi.org/10.1007/s10618-014-0365-y 14. Steinder M, Sethi AS (2004) A survey of fault localization techniques in computer networks. Sci Comput Program 53(2):165–194. https://doi.org/10.1016/j.scico.2004.01.010 15. Wong EW, Gao R, Li Y, Abreu R, Wotawa F (2016) A survey on software fault localization. IEEE Trans Softw Eng 42(8):707–740. https://doi.org/10.1109/TSE.2016.2521368 16. Solé EM, Muntés-Mulero V, Rana AI, Giovani2017 Survey on models and techniques for root-cause analysis. arXiv:1701.08546. Retrieved from https://arxiv.org/abs/1701.08546 17. Anand A, Gorde K, Antony Moniz JR, Par N, Chakraborty T, Chu B-T (2018) Phishing URL detection with oversampling based on text generative adversarial networks. In: IEEE International conference on big data (Big Data) 18. Jeon D, Tak B (2019) BlackEye: automatic IP blacklisting using machine learning from security logs 19. Mayank P, Singh AK (2017) Tor traffic identification. In: 7th International conference on communication systems and network technologies 20. EI-Maghraby RT, Abd Elazim NM, Bahaa-Eldin AM (2017) A survey on deep packet inspection. In: 12th International conference on computer engineering and systems (ICCES) 21. Saputra FA, Nadhori IU, Barry BF (2016) Detecting and blocking onion router traffic using deep packet inspection. In: International electronics symposium (IES). IEEE, p 283. ISBN: 978-1-5090-1640-2
450
S. Mehta et al.
22. Shahbar K, Nur Zincir-Heywood A (2017) An analysis of tor pluggable transports under adversarial conditions. In: IEEE symposium series on computational intelligence (SSCI) 23. Arnaldo I, Arun A, Kyathanahalli S, Veeramachaneni K (2018) Acquire, adapt, and anticipate: continuous learning to block malicious domains. In: IEEE International conference on big data (Big Data) 24. Ling Z, Luo J, Wu K, Yu W, Fu X (2015) TorWard: discovery, blocking, and traceback of malicious traffic over Tor. IEEE Trans Inf Forens Secur 25. KDD-cyberattack. https://www.kaggle.com/datasets/slashtea/kdd-cyberattack 26. AWS-honeypot-attack-data. https://www.kaggle.com/datasets/casimian2000/aws-honeypotattack-data 27. Malware executable detection. https://www.kaggle.com/datasets/piyushrumao/malware-exe cutable-detection 28. ICO security incidents. https://www.kaggle.com/kerryisonline/ico-security-incidents 29. Rhode M, Burnap P, Wedgbury A (2021) Real-time malware process detection and automated process killing. Secur Commun Netw
Chapter 36
Design Analysis and Fabrication of Mono Composite Leaf Spring by Varying Thickness Mangesh Angadrao Bidve and Manish Billore
1 Introduction Use of natural resources to save energy and money can be done by decrease in weight in automobile sector in current situation. Weight discount may be performed by replacing advanced fabric to the prevailing material, design, optimization and higher production manner. The leaf spring is one of the primary parts for reduction in weight for car accounting up to debts for 10–20% of unsprung weight that is analyzed to be not counted in the total mass of the leaf spring [1]. Application of composite leaf spring is useful in the automobile to get more fuel average and progressed using exceptional. The advent of composite (glass and epoxy) substances has made viable lessen that the weight of leaf spring can reduced without reducing load bearing capacity and stiffness because composite substance has grater strain energy capacity and high weight to strength ratio. In recent days combined fabric also known as composite material is used in vehicles to update the metallic components, since the composites substances have more elastic stress strength garage capability and high power to weight ratio in comparison with the ones of metallic. Composite substances gives possibility for giant weight saving. Springs are designed to soak up the vibrations and keep the elasticity, then. Launch it for the reason strain strength of composite material [2].
M. A. Bidve (B) · M. Billore Department of Mechanical Engineering, Oriental University, Indore, Madhya Pradesh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_36
451
452
M. A. Bidve and M. Billore
1.1 Multi Leaf Spring Use of multiple leaf springs is tremendous in vehicle and rail cushion. It includes collection of plane plates having regular thickness and semi elliptical in shape. These plane flats plates are held together mechanically with the help of U-bolt fastening, in addition to that some clips are also provided to maintain the proper alignment during the operating condition of the vehicle to avoid the lateral movement of the plates. Each individual flat plate is called leaf and the longest among them is referred as master leaf which is bent at both sides forming eye shape. Spring is supported with axel of the vehicle chassis at the center position [3]. Sometimes multiple leaf springs are provided with one or more than one full length master leaves. The strength of any multi-leaf spring depends on the number of leaves present in the design. More the number of springs more will be the support to the shear force. Such shaped leaf springs are used heavily for the suspension of light vehicles as well as in commercial vehicles such as heavy duty trucks, tractors, trolleys, etc. Multiple leaf springs are used in cars also, for the rear wheel suspension. The location of the leaf spring is below the chassis frame and between the two tires of the vehicles. Whenever the vehicle meets any obstacle on the road, vehicle body moves upwards and deflects the spring at the same time the energy is stored or released by the spring; this happens because of the elastic nature of the material used for the leaf spring, the capacity of bouncing depends upon the strength of the material being used for the leaf spring [4].
1.2 Introduction of Composite Material Addition of one or more substances gets brand new fabric with enhanced properties of materials [5]. Materials represent almost 60–70% of automobile value and make contributions to the pleasant and overall performance of the car, even a small quantity in weight reduction of car can also have a wider economic impact. The stress strength of the material becomes a prime assignment in designing of springs. The dating of the particular stress power may be expressed as, U=
σ2 ρE
where, U = energy storing capacity, σ = P/A, ρ = mass density of the fabric and E = constant of elasticity of the material. The energy storing capacity is directly proportional to square of the most allowable pressure and inversely proportional to constant of elasticity and mass density [6]. Figure 1 shows proposed material has sufficient energy storing capacity and required properties that of metallic component for those reason composite substances were decided on for leaf spring [7].
36 Design Analysis and Fabrication of Mono Composite Leaf Spring …
453
Fig. 1 Proposed material
Fig. 2 Hand lay-up technique
2 Methodology Figure 2 shows the manufacturing method used in this research is hand lay-up technique which is highly trusted and used from so many years ago for the production of the composite materials. This technique consists of a mold on which resin is placed, after that gel coating is applied on that, then reinforcing material is added into it and the hand roller is used to apply the catalyst. A continuous film of resin and glass is formed in number of layers, we can add the layers of different materials [8]. Layers of different materials can be acquired to similarly beautify the general characteristics of the laminated composite fiber [9].
3 Fabrication Procedure Table 1 gives the following dimensions that are used for fabrication of mold.
454
M. A. Bidve and M. Billore
Table 1 Dimensions of plywood mold
S. No.
Parameter
Value in mm
1
Length of arc
1160
2
Length
1010
3
Width
4
Height of arc
45 130
3.1 Steps to Prepare Composite Fibre Material Leaf Spring 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.
Solution preparation of resin by mixing 10–20% hardener 758 Apply the glass fiber material on the mold Place the epoxy with the help of brush Avoid air trap Allow the solution to solidify Make the same solution again Add the second layer of glass fiber material Add the second layer of epoxy with brush Do the same procedure till the required thickness is achieved Allow to cure for 24 h Remove unwanted material with blade Get the required leaf spring from mold.
3.2 Resin Selection Table 2 gives properties and uses of resin which are used in the current research work, the table gives the proportion used between the hardener and the resin [10]. Dobeckot 520 is used as resin and Hardener 750 is used as hardener. Table 2 Selection criteria for resin Resin
Proportion Properties and uses
Dobeckot 100:55 520
High viscosity, lengthy pot life, excessive adhesive power and chemical resistance
Hardener 758
Good mixing characteristic, low curing time
36 Design Analysis and Fabrication of Mono Composite Leaf Spring …
455
4 Result and Conclusion 4.1 Analytical Results of Conventional and Composite Leaf Spring Table 3 gives the analytical result of traditional and composite leaf spring with deflection, stresses and stiffness at various thicknesses. Table 3 indicates analytical end result, this is given by varying thickness and constant load deflection within fiber leaf spring is much lower than traditional leaf spring, pressure in fiber leaf spring gives top notch distinction in comparison with traditional spring and stiffness in fiber leaf spring is extra compared with traditional spring [11].
4.2 FEA Result Table 4 gives the FEA result of conventional and composite leaf spring with deflection, stresses and stiffness at various thicknesses. Table 4 gives the FEA end result, this is given by varying thickness and regular load deflection in composite leaf spring is less when compared with standard leaf spring, strain in composite leaf spring offers amazing difference compared with traditional leaf spring and stiffness in composite leaf spring is more in comparison with traditional leaf spring [9]. Table 3 Analytical result of steel and composite leaf spring Thickness
Deformation
Stresses (N/mm2)
Stiffness (N/mm)
A
B
A
B
A
B
28
154.52
128.97
877.97
252.75
19.04
22.81
30
120.32
104.85
743.10
220.18
24.45
28.06
32
99.33
86.40
653.27
193.51
29.62
34.06
where A = leaf spring (steel) and B = leaf spring (composite)
Table 4 FEA result of steel and fiber resin leaf spring Width
Deformation A
Stresses (N/mm2)
Stiffness (N/mm)
B
A
B
A
B
28
154.5
128.94
875.97
278.25
19.29
22.82
30
119.49
104.99
741.29
246.20
24.62
28.03
32
96.33
86.56
651.27
219.11
30.55
34.00
where A = leaf spring (steel) and B = leaf spring (composite)
456
M. A. Bidve and M. Billore
References 1. Jadhav KK et al Experimental investigation and numerical analysis of composite leaf spring. Int J Eng Sci Technol (IJEST) 4759–4764 2. Ravindra P et al (2014) Modeling and analysis of carbon fiber epoxy based leaf spring under the static load condition by using FEA. Int J Emerg Sci Eng (IJESE) 2(4):39–42. ISSN: 2319-6378 3. Saini P et al (2013) Design and analysis of composite leaf spring for light vehicles. Int J Innov Res Sci Eng Technol 2(5):1–9 4. Mahesh J et al (2012) Performance analysis of two mono leaf spring used for Maruti 800 vehicle. Int J Innov Technol Explor Eng (IJITEE) 2(1):65–70. ISSN: 2278-3075 5. Ghodake AP et al (2013) Analysis of steel and composite leaf spring for vehicle India. IOSR J Mech Civ Eng (IOSR-JMCE) 5(4):68–76. e-ISSN: 2278-1684. www.iosrjournals.org 6. Patunkar MM et al (2011) Modelling and analysis of composite leaf spring under the static load condition by using FEA. Int J Mech Ind Eng 1(1):1–4 7. Pozhilarasu V et al (2013) Performance analysis of steel leaf spring with composite leaf spring and fabrication of composite leaf spring. Int J Eng Res Sci Tech 2(3):102–109. ISSN: 23195991. www.ijerst.com ˙ P et al (2014) Performance analysis of carbon fiber with epoxy resin based composite 8. RavindraA leaf spring. Int J Current Eng Technol 4(2):536–541. E-ISSN: 2277-4106, P-ISSN 2347-5161 9. Shaikh NS, Rajmane SM (2014) Modelling and analysis of suspension system of TATA SUMO by using composite material under the static load condition by using FEA. Int J Eng Adv Technol (IJEAT) 3(3):53–60. ISSN: 2249-8958 10. Hou P, Chherruault YJ, Nairne I, Geronimidins G, Mayer MR (2007) Evolution of the eye-end design of a composite leaf spring for heavy axle loads. Compos Struct 78(3):351–358 11. Kaw AK (2006) Mechanics composite material, 2nd edn. Taylor & Francis Group, LLC
Chapter 37
Real-Time Driver Drowsiness Detection System Using Machine Learning Apash Roy and Debayani Ghosh
1 Introduction Injuries and deaths due to road accident is a deep concern for the government and the people irrespective of the geographical region. According to the road accident report published in the website of ministry of road transport and highways, Government of India, there were 366,138 road accidents in India in the year 2020, as updates last till date [1]. One of the major causes is alertness of the driver, especially during night time. With the advancement of automated vehicle technology, the researches are going on different aspect of safety too. The approach to detect driver drowsiness is one of them. Here, we are in search of an automated system that can detect drowsiness and alert the driver. Machine learning is used in diversified areas like handwritten character recognition [2–4], medical image processing [5–7]. Here, in this work, we used machine learning technique to detect drowsiness and alert by a computer vision fashion. The system uses a camera to capture the real-time picture of a vehicle driver, spot the region of interests as face and eye. Then using a pre-trained CNN classifier identifies the open and closed eye. If eyes are found closed for a time duration as the defined threshold value which is 3 s in this case, then it rings an alarm. The system is tested in different scenarios, e.g. male and female, day and night, with or without glasses. We note that the initial results are quite interesting. The document starts with discussing the works so far in the related field. Then, the proposed methodology is discussed. Then, the implementation, data set, and the results of various trials are discussed.
A. Roy (B) NSHM Knowledge Campus, Durgapur, West Bengal, India e-mail: [email protected] D. Ghosh Thapar Institute of Engineering and Technology, Patiala, Punjab, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_37
457
458
A. Roy and D. Ghosh
2 Study of the Literature Over the recent years, detection of driver fatigue has generated quite a lot of research interest and can be broadly classified into four categories [8]. The first category deals with methods based on physiological signals obtained from the driver, for example, electroencephalograph (EEG), electrocardiograph (ECG), and electrooculogram (EOG) signals [9, 10]. It has been shown that these methods have good predictive ability. However, obtaining clean datasets in this case offers considerable challenge in designing such methods [11, 12]. The second category relies on the behaviour of the driver such as decrease in the grip strength on the steering wheel or the lack of ability to control the steering wheel, both provide a measure for the driver’s fatigue [13]. The departure of the vehicle from the intended trajectory, that is, deviation of the vehicle state can also be a good measure for the driver’s fatigue and forms the foundation for the third category [14]. Finally, the driver’s drowsiness can also be detection through physiological reactions from the driver, such as closed eye over a duration, which is the focus of the current work. The frame wise facial expression and their ratio are used for detecting the drowsiness in [15]. Another work based on facial expressions [16] claims 95.58% sensitivity and 100% accuracy for off-line detection with SVM classifier. Eye closure and yawning ratios are also used as facial expression and classified through machine learning algorithm to detect drowsiness [17]. Considering eye as the only facial expression to detect sleepiness is sufficient and is established in many work [18, 19].
3 Proposed Methodology The methodology adapted in the current work is shown in Fig. 1. The first stage consists a webcam that captures the real-time video of the driver. It then locates the face as the first region of interest (ROI) from then the eye, which is the second region of interest. The second stage consists of a previously trained Convolutional Learning Model that is used to classifier; the job of the classifier is to classify the state of the eyes as ‘open’ or ‘close’. The final stage of the proposed system is to ring an alarm if the eyes are found close for some threshold value (3 s in this case). We now briefly explain the Convolutional Neural Network model, which we use here for classifying the images of eye as open or closed. A CNN model is a feedforward Neural Network which consists of the following layers: (i) input layer, (ii) hidden layers, and an (iii) output layer. The hidden layers can consist of convolution layers, ReLU layers, and pooling layers. In particular, the workflow of our CNN model is as shown in Fig. 2. The hidden layers consist of three convolution layers to extract significant features from the input images, along with three ReLU layers to rectify the feature maps. Note that the ReLU activation function is as follows: f (x) = max(0, x).
(1)
37 Real-Time Driver Drowsiness Detection System …
459
Fig. 1 Overview of the system
Each convolution layer is succeeded by a pooling layer that reduces the dimensionality of the feature map. Here, we have implemented a max pool operation in the pooling layer. Finally, the fully connected layer classifies the eye as ‘open’ or ‘close’ based on the input feature map. The CNN model is first pre-trained with a set of 70,000 eye images, for classification. Then, in real time, the same CNN model is used to classify the state of the eyes from the frames of the video, captured through the webcam.
460
A. Roy and D. Ghosh
Fig. 2 Learning model
4 Results and Discussion 4.1 Implementation The implementation is done in a real-time manner. It captures real-time video frames through camera and spots the eye as region of interest. Using the help of pre-trained CNN model, it classifies the eye that is opened or closed and gives an alarm sound if the eye is found closed for a predefined threshold time (here 10 s in our case). A webcam is used to capture the image, then several Python packages are used like OpenCV to detect the face and eye, Keras to build the model, TensorFlow as Keras used it as backend, and finally Pygame to play alarm sound.
4.2 Dataset The CNN model is trained with a dataset with 7000 eye images, found in the website https://www.kaggle.com/datasets/serenaraju/yawn-eye-dataset-new. It consists of open and close eye images taken from different persons of both male and female gender. Various light conditions are also covered.
4.3 Trials A total of 1000 trials by ten volunteer drivers (seven males and three females) is taken into consideration. The test is taken in day and night times and with or without glass. Figure 3 shows the summary of the results. With a good light condition during day times, and if the driver is without glasses, the system shows an outstanding result of 100% accuracy. But, if the driver is wearing glasses, or the light condition is not
37 Real-Time Driver Drowsiness Detection System …
461
appropriate during night times, system seeks refinement. However considering all scenarios for day and night, driver with or without glasses, male and female, the overall accuracy is 83.9%, which is quite encouraging. Drowsiness detection
Gender No. of trials
D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 Total Total percentage (% of success)
M M M M M M M F F F
25 25 25 25 25 25 25 25 25 25 250
Success in day with glass 24 23 23 23 24 23 24 23 23 24 234 93.6
Success in day without glass 25 25 25 25 25 25 25 25 25 25 250 100
Success at night with glass 12 16 14 16 12 12 10 15 10 11 128 51.2
Success at night without glass 21 21 24 24 24 24 21 23 21 24 227 90.8
Total trials
Total success
100 100 100 100 100 100 100 100 100 100 1000
82 85 86 88 85 84 80 86 79 84 839 83.9
5 Conclusions There are a huge number of road accidents recorded due to drowsiness of drivers. To overcome this, people are looking for an automated system which can detect and alert the driver if there is drowsiness. Here, in this work, one such approach is proposed. A simple architecture with a camera, a CNN trained with huge data set is used to detect the drowsiness. The camera captures the real-time video; the system spots the face and finally the eyes as region of interest. Then, the eye region is fed into the CNN, which is pre-trained to classify the eyes that are closed or open. If the eyes are found close for 3 s at-a-stretch, the system rings an alarm, and the driver gets alert. The trials are showing outstanding result without a glass and day time. But when the light conditions are tough at night or the drive is with a glass, the system needs refinement, which is the focus of our future work.
References 1. https://morth.nic.in/road-accident-in-india 2. Kumar J, Roy A (2022) DograNet—a comprehensive offline Dogra handwriting character dataset. J Phys 2251:012008. https://doi.org/10.1088/1742-6596/2251/1/012008 3. Roy A, Ghosh D (2021) Pattern recognition based tasks and achievements on handwritten Bengali character recognition. In: 2021 6th International conference on inventive computation
462
4. 5.
6.
7.
8. 9.
10.
11. 12. 13. 14. 15.
16. 17.
18.
19.
A. Roy and D. Ghosh technologies (ICICT), Coimbatore, India, 2021. IEEE, pp 1260–1265. https://doi.org/10.1109/ ICICT50816.2021.9358783 Roy A (2019) Handwritten Bengali character recognition—a study of works during current decade. Adv Appl Math Sci 18(9):867–875 (0974-6803) Deshmukh S, Roy A (2022) Early detection of diabetic retinopathy using vessel segmentation based on deep neural network. In: Proceedings of international conference on recent advances in materials, manufacturing and machine learning (RAMMML-2022), Nagpur, 26–27 April 2022 Mushtaq S, Roy A, Teli TA (2021) A comparative study on various machine learning techniques for brain tumor detection using MRI. In: Global emerging innovation summit (GEIS-2021). Bentham Science Publishers, pp 125–137. https://doi.org/10.2174/97816810890101210101 Deshmukh SV, Roy A, An empirical exploration of artificial intelligence in medical domain for prediction and analysis of diabetic retinopathy: review. J Phys: Conf Ser 1831:012012. https:// doi.org/10.1088/1742-6596/1831/1/012012 Wang L, Wu X, Yu M (2007) Review of driver fatigue/drowsiness detection methods. J Biomed Eng 24(1):245–248 Borghini G, Astolfi L, Vecchiato G, Mattia D, Babiloni F (2014) Measuring neurophysiological signals in aircraft pilots and car drivers for the assessment of mental workload, fatigue and drowsiness. Neurosci Biobehav Rev 44:58–75 Myllylä T, Korhonen V, Vihriälä E et al (2012) Human heart pulse wave responses measured simultaneously at several sensor placements by two MR-compatible fibre optic methods. J Sens 2012:8p, Article ID 769613 Simon M, Schmidt EA, Kincses WE et al (2011) EEG alpha spindle measures as indicators of driver fatigue under real traffic conditions. Clin Neurophysiol 122(6):1168–1178 Lal SKL, Craig A (2005) Reproducibility of the spectral components of the electroencephalogram during driver fatigue. Int J Psychophysiol 55(2):137–143 Jap BT, Lal S, Fischer P, Bekiaris E (2009) Using EEG spectral components to assess algorithms for detecting fatigue. Expert Syst Appl 36(2):2352–2359 Huang CX, Zhang WC, Huang CG, Zhong YJ (2008) Identification of driver state based on ARX in the automobile simulator. Technol Econ Areas Commun 10(2):60–63 Chinthalachervu R, Teja I, Ajay Kumar M, Sai Harshith N, Santosh Kumar T (2022) Driver drowsiness detection and monitoring system using machine learning. J Phys: Conf Ser 2325:012057. https://doi.org/10.1088/1742-6596/2325/1/012057 Haribabu J, Navya T, Praveena PV, Pavithra K, Sravani K (2022) Driver drowsiness detection using Machine Learning. J Eng Sci 13(06) Prasath N, Sreemathy J, Vigneshwaran P (2022) Driver drowsiness detection using machine learning algorithm. In: 2022 8th International conference on advanced computing and communication systems (ICACCS), 2022, pp 01–05. https://doi.org/10.1109/ICACCS54159.2022. 9785167 Cheerla S, Reddy DP, Raghavesh KS (2022) Driver drowsiness detection using machine learning algorithms. In: 2022 2nd International conference on artificial intelligence and signal processing (AISP), 2022, pp 1–6. https://doi.org/10.1109/AISP53593.2022.9760618 Al Redhaei A, Albadawi Y, Mohamed S, Alnoman A (2022) Realtime driver drowsiness detection using machine learning. In: 2022 Advances in science and engineering technology international conferences (ASET), 2022, pp 1–6. https://doi.org/10.1109/ASET53988.2022.9734801
Chapter 38
Nature-Inspired Information Retrieval Systems: A Systematic Review of Literature and Techniques Bhushan Inje, Kapil Nagwanshi, and Radhakrishna Rambola
1 Introduction 1.1 Information Retrieval (IR) 1.1.1
Information Retrieval (IR) Approaches
In information system IR plays a vital role almost in every field. The meaning of the IR is very broad like searching for the IR in google and it gives results in terms of web information that is nothing but the IR. It defines as “Information retrieval (IR) is finding material (usual documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers)” [1]. IR is popular nowadays because it is the process of extracting meaningful full knowledge from the text documents, this knowledge does not occur in the available text documents it can be retrieved with the help of text mining approaches. IR is a multidisciplinary field, involving information retrieval, text analysis, information extraction, clustering, categorization, visualization, database technology, machine learning, and data mining. Text data collected from many sources like a social network, patient records, health care insurance data, news stories from new wires. It is a challenging issue to find useful and accurate information and knowledge to help B. Inje (B) Amity School of Engineering and Technology, Amity University, Jaipur, India e-mail: [email protected] K. Nagwanshi Computer Science and Engineering, Guru Ghasidas Vishwavidyalaya (A Central University), Bilaspur, India B. Inje · R. Rambola NMIMS University MPSTME Shirpur, Dhule, Maharashtra, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_38
463
464
B. Inje et al.
the user to find the solution for their need. Following methods gives solution of the problem of IR applications like market analysis, business intelligence, etc. are using these approaches for extracting the knowledge and use full information from a huge amount of the data [2].
1.1.2
Various Task Performs in IR
Multimedia retrieval—Content-based retrieval of text, speech, images, sound, combinations and Use structure to improve effectiveness. Multilingual retrieval—Retrieve text in any language and Retrieve documents written in different languages simultaneously. Interactive retrieval—Effectively include user in search and Evaluation issues [3]. Classification—It is an important data mining technique, It is used for the retrieval of important data and gives relevant information for the user. It uses to classify data in different classes. It plays a very important role in document classification too. Once getting knowledge of the data we can categorize or classify data. Definition of the classification made by the different researcher are as follow. “Classification use as the task of assigning predefined class labels to previously unseen data objects according to values of their features” [4]. “Classification is the process to classify new observation based on the predetermined classes, i.e., supervised learning” [5]. Clustering—Clustering is one of the important data mining tasks use to dividing the data objects or population into numbers of groups. Data objects are also called clusters that belong to either the same group or a different group. Definition of the clustering made by the researcher are as follow. “The process of grouping a set of physical or abstract objects into classes of similar objects is called clustering” [6]. The most popular and fast-growing area of clustering is text clustering, which is the application of text-based document clustering. It uses to categorize unstructured and text documents that represent text data using machine learning (ML) and natural language processing (NLP). Information Extraction (IE)—It is the subfield of the IR and the text mining used to extract the information in structure form from the unstructured documents and it is defined as “It is the process of selectively structuring and combining data that are explicitly stated or implied in one or more natural language documents” [7]. Text Analysis—It is one of the fast-growing areas of text mining, text analysis is the process of deriving high-quality information from unstructured textual data. Categorization—In text mining categorization is used to sorting the documents into groups, here it uses NLP for sorting huge numbers of documents without reading. It
38 Nature-Inspired Information Retrieval Systems: A Systematic Review …
465
is also used to retrieve the interesting pattern from the document and manages the documents according to pre-defined class [8]. Visualization—Visually representing any text documents is one of the popular and important tasks of text mining especially the content of the documents. Text visualization represents text data in terms of word count, numbers of paragraphs, pattern in the text, pattern taxonomy but fails to visualized text directly. In the rest of this paper, we perform an extensive survey on hybrid IR, Classification, and Clustering techniques in Sections 2, 3, and 4, and Sect. 5 gives bibliometric data analysis for hybrid clustering methods. Section 6 is discussion and future direction followed by conclusions.
2 Nature Inspired Algorithms (NIA) IR is the process of data mining which is used to finding useful information from unstructured sources like text documents usually. The basic aim of IR is to facilitate the access for the required knowledge only it is not used to analyze and finding the hidden pattern from the documents Djenouri Youcef et al. paper title Bees swarm optimization guided by data mining techniques for document information retrieval, the author proposed different pre-processing methods as compared to traditional pre-processing techniques follows in data mining and effect of this improves the overall run time performance and the quality of the returned documents. In the beginning, they have used pre-processing by using k-means clustering and closed frequent pattern mining (FPM) algorithms, k-means is used to generate k-clusters, and then on each cluster, FPM is applied for generating useful information or knowledge from the text documents and because of the use of closed frequent pattern algorithm only retrieve most frequent pattern. There are two bees swarm optimization algorithms are proposed to explore the search space and the effect of these bees in the search space knowledge is extracted in the pre-processing step [9]. Different mathematical relations are defined in [10] are as follows. K-means Algorithm J=
k x n − µ j 2 .
(1)
j=1 n∈S j
where xn = a vector that represents the nth data point. µ j = the centroid of the data points in S j . K-means clustering is one of the variants of the partition clustering method for N data points into k centroids with subset S j .
466
B. Inje et al.
Representation for Documents For this vector space model is used and document term is computed with TF-IDF (Term frequency-inverse document frequency) [10]. wi j = t f ji × id f ji
(2)
where wi j = Weight of the term i in the document j t f ji = Numbers of occurrence of term i in the document j id f ji = log2
m d f ji
(3)
where id f ji = Inverse document frequency of term i in the document j d f ji = term frequency in the collections of m documents. Similarity Computation dt d j cos di , d j = i |di |d j
(4)
where dit d j = Document vectors di, d j |di | = Length of the vector di New Centroid Representation |C |
gi
i 1 dj |Ci | j=1
(5)
where gi = New centroid of ci |di | = Length of the vector di
3 Nature Inspired Hybrid Classification Classification and prediction are the processes of text mining that are used to extract data from the large data repository and classification predict labeled and categorical data. For example, with the help of a classifier will able to classify data of bank credit card approval to the customer whether it is safe or risky. There are many classification algorithms are proposed recently here we discuss those classifiers that are used for optimization problems. The Figure 1 gives an overview of different types of classifiers.
38 Nature-Inspired Information Retrieval Systems: A Systematic Review …
467
Classificaon
Stastacal Based
Distance Based
Decision Based
Naive Bays Classifier
ID3
Nural Network Based
KNN
Rule Based
Backpropogaon
IF-THEN Rule
Fig. 1 Overview of different classification techniques and their algorithms
According to Fig. 2, the second most widely used classification method is the decision tree i.e., ID3 (Iterative Dichotomiser 3), The ID3 method is a well-known decision tree technique that has obtained excellent results in data categorization mining. However, ID3 has significant drawbacks, such as characteristics biassing multi-values, excessive complexity, and enormous sizes. Let us consider few cases to understand the statistical classifiers in more details, Case 1: Consider you are a medical practitioner in your hospital few patients are admitted one is suffering from a heart attack other has lungs cancer detected at the primary stage, Will the heart attack patient have any chance of another attack in the next few months or What is the chances of survival for the lung’s cancer patient? Case 2: Consider you are a branch manager in a bank, a customer wants to take out a personal loan will he/she default on the personal loan, like these problems we can solve with help of what data is available and which is important from the available one. For example, the problem 1 doctor looks the all the medical history of the patient with the different parameters i.e., age, count of the previous attack, demographics, diet and exercise routine also other clinical parameters. Or in case of problem 2 you
Fig. 2 Statistics of datasets for hybrid clustering
468
B. Inje et al.
may always look at certain parameters before approving a loan like credit score, an applicant has many credit cards, cibil score, etc. These are the classification problem and to solve this problem with some statistical methods and predictive models and classify the data points [11]. In the following section, we are identifying literature that contributed to optimization in classification problems using nature-inspired techniques. Nguyen et al. are using the naïve bays classifiers for the book classification problem in this author identifies the book rating prediction problem is targeted. This is the paper author use optimization in knowledge representation with feature selection. Naïve Bayes (NB) classifier is used for prediction purposes by applying the different strategies of feature selection. In this paper, NB is for classification and feature selection help to give optimized result as compare to other classifiers and used in the application of book recommendation system [12]. Banchhor et al. In this paper, the author introduces the MapReduce model and the classification problem can be solved with the Cuckoo–Grey wolf-based Correlative Naive Bayes classifier and MapReduce Model (CGCNBMRM). In this paper, the author proposed a modified Correlative Naive Bayes (CNB) classifier and also proposed integration of Cuckoo Search (CS) Algorithm into Grey Wolf Optimizer (GWO) and come up with a new variation it Cuckoo–Grey Wolf based Optimization (CGWO). Here classification is done with the help of a probability index table also with posterior probability data. The performance of the proposed algorithm is measured with a common method i.e., accuracy, sensitivity, and specificity. This paper contributed research work in the following steps. 1. To get the fitness-based maximum posterior probabilities on correlation, mean and variance within upper and lower bounds by developing a hybrid optimization algorithm combination of CS and GWO. 2. Designing of the classifier for the big data classification, by integrating CGWO and optimize the CNB model using a model parameter. 3. Applied CGCNB-MRM on MapReduce problem and perform the classification using probability index table [13]. Dubey et al. In this paper, a hybrid framework has been proposed on Ant colony optimization (ACO) and particle swarm optimization (PSO). In the beginning data pre-processing is desired step of data mining that can be achieved in this paper by assigning weights on basis of size, keywords, and content. The research work is carried in the following steps. 1. Data pre-processing is done by assigning weights and providing automated data pre-processing. 2. Applied content and keyword classification using feature selection with the help of a hybrid ACO-PSO model. 3. Define weight matrix on basis of reading and write a query. 4. Develop a decision-making system using simple additive weighting [14].
38 Nature-Inspired Information Retrieval Systems: A Systematic Review …
469
In 2020, authors Hassib et al. have been proposed the efficient hybrid approach for classification and using the Whale optimization algorithm (WOA) for finding the best set of features, overall contribution is as follows. 1. Feature selection: In this phase, WOA is used to prune redundant and irrelevant features to increase the accuracy of the classifier. 2. Pre-processing: In this process, the Synthetic minority oversampling technique (SMOTE) is used for sampling the dataset [15]. 3. Split the dataset into training and testing data and train the model using the bidirectionnal recurrent neural network (BRRN) classifier is one of the deep learning approaches. 4. Classification using hybrid approach i.e., WOA–BRNN framework. In the first phase, the feature selection phase we will use the WOA to eliminate the irrelevant and redundant features to improve the classification accuracy. In the second phase, pre-processed phase, we will deal with the class imbalance problem, pre-processing datasets for obtaining a balance between dataset classes. In the third phase, the classification phase, a deep learning approach called bidirectional recurrent neural network (BRNN) is proposed for classifying the pre-processed dataset. BRNN classifier has two important parameters, weights, and biases that have a huge effect on the performance of the BRNN classifier [16]. Afizi et al. has been proposed in Artificial Bee Colony based Data Mining Algorithms for Classification Tasks, proposed the use of the ABC algorithm as a new tool for data mining particularly in classification tasks, and indicated that ABC algorithm is competitive, not only with other evolutionary techniques but also to industry-standard algorithms such as PART, SOM, Naive Bayes, Classification tree and Nearest Neighbour (kNN) [17]. Kamila et al. proposed Pareto-based multi-objective optimization (PBMOO) to overcome the problem of the data mining classification and this method increased the impact on the evaluation of selected constraints. This method is used to solve class sub-class on sensitive data. In the beginning, multi-objective optimization (MOO) and single-objective optimization (SOO) problems are addressed, and then how PBMOO is used to improved Pareto optimality in the case of biological data like cancer data which having different class and subclass. This approach has experimented with six datasets like Audiology, Arrhythmia, Dermatology, Yeast, Glass, Ecoli [18]. Pathak et al. have been proposed the rule-based classification using Nature Inspired Algorithm (NIA) to finding accurate rules using GA and ACO, here the author address the problem of the classification model is that decision rules are generated obvious information and unseen instances which leads the misclassified problem. The decision tree generates a low support count which deviates from the obvious behavior known as exception these terms are not discovered by any rulebased measure in the context of knowledge discovery. In this paper, these problems are overcome with the help of NIA rule mining and exception mining and also find some suggestions on the existing algorithms. According to their suggestions to
470
B. Inje et al.
accommodate exceptions in the information retrieval model by the CAnt-MinerPB algorithm [19]. Jyoti et al. have been investigated the rule-based classification algorithms using genetic algorithms and discovers rules in CNF form by using conjunction with all attributes because there are few disjunctions in the value of an attribute. Also proposed the effective rulesets using an encoding scheme, using genetic operators with syntactic constraints and fitness function it is used to measure the effectiveness of the rules. Along with that, Kapila et.al proposed GA with entropy-based filtering bias for automated rule mining using the initial population method [20, 21]. Kapila et al. investigated automated rule mining with the enhancement of GA. It reduces the search space because of the using probabilistic-based initialization algorithm to reduce the number of fitness which results from fit rule and improvement in the run time. During the experiment the proposed enhance GA is applied on various UCI machine learning repository datasets like the Tic-Tac-Toe dataset for the game. This data set includes the nine different attributes each corresponding to the nice square for the Tic-Tac-Toe game and the winner player there are two classes each. The scope of this research is, to avoid the objective function evaluations for the duplicate rules GA with long-term memory is used to store the fitness score [21]. S. Dehuri et al., Proposed a multi-objective algorithm for mining large database and predict based on rule-based classification, the name of the proposed model is an elitist multi-objective genetic algorithm (EMOGA) [22]. Punitha et al. has been developed PSO based expert prediction system using a rule-based classification in this paper financial forecasting make the area of focus in this context stock market prediction (SMP) analysis is takes place using PSO, to increase the effectiveness of the prediction classification rule mining (CRM) is used. CRM-PSO model is first evaluated over the factors and prediction on different companies and total work is carried out in the steps [23]. Freitas extensively studied all the Evolutionary Algorithms (EAs) used in data mining, according to their survey evolutionary algorithms are categorized into many types such as Genetic Algorithms (GA), Genetic Programming (GP), Classifier Systems (CS), Evolution Strategies (ES), Evolutionary Programming (EP), Estimation of Distribution Algorithms (EDA), etc. [26] forms these methods each one is contributed like author Sandeep Kumar et al. [27] extensively studied the GA and figure out some pros and cons of this method. Bio-inspired associative classification algorithm proposed by Omar S. Soliman et al. for finding association rules and also looking for best subset of rules based on Quantum-Inspired Artificial Immune system (QAIS) for building the rule-based classifier [24].
38 Nature-Inspired Information Retrieval Systems: A Systematic Review …
471
4 Nature Inspired Hybrid Clustering Clustering plays a very important role in data mining as well as in information retrieval (IR), this technique having the ability to improve the main parameters of IR i.e., precision and recall is most likely using in IR. Clustering is an unsupervised learning method there are two types of clustering namely partitional clustering and Hierarchical clustering. In partitional clustering documents are classified from the data set, then make clusters multiple groups based on their similarity. K-means is an example of a partitional clustering approach, this is a simple yet powerful algorithm in IR [28]. K-means is centroid based approach where the clusters are built by minimizing the distance between center and objects [29]. There are few disadvantages of the k-means clustering method, it is difficult to predict K-Value and with a global cluster, it didn’t work well. Another difference in an initial partition can result in different final clusters. It does not work well with clusters (in the original data) of Different sizes and different densities. On the other hands, there is the Hierarchical clustering method in which groups similar objects into groups called clusters there are two types of approaches which are most widely used one is the divisive approach(top-down) and the agglomerative approach (bottom-up) according to names says, this method divides the dataset into smaller clusters [30, 31]. From the above discussion, the most, common problem found in the k-means clustering algorithm is initial k values this lead to the optimization problem. In paper Steinbach et al. [32] have proposed bisecting k-means clustering and argue that it is the best variant as compared to a regular k-means method to solve document clustering problem and quality of the cluster is measured with the help of entropy, F measure, and overall similarity. Here k value is randomly selected for documents and bisecting k-means algorithm starts with a single document and works as follows: Step 1 Split cluster. Step 2 By using bisecting step find the pair of sub-clusters. Step 3 Repeat step 2 unless and until cluster with the highest similarity found. Step 4 Repeat step 1, 2, 3 until the desired number of clusters get form. Many clustering methods have been proposed by the researchers but not all of them are given promising results Table 2. Representing such methods of clustering which are more frequently used in the field of information retrievals and data mining. Partitioning clustering method [33], work on five different characteristics. The first selection of staring seed because most of the clustering algorithms do this on a random basis, for example, k-means method, Second and third types of characteristics represent numbers of the pass are represents for the type of clusters and last two represents the type of clusters are getting form i.e., fixed or variable. On other hand there is most popular clustering method has been proposed in [34], the time complexity of this method is O (n2 log n) (Table 1).
472
B. Inje et al.
Table 1 List of the NIA in classification Work
NIA algorithm
IR method
Purpose of data mining
Dataset
Performance results
[14]
ACO-PSO
Simple additive weighting (SAW)
Classification
UCI Repository
Accuracy: PSO-SAW is 98%, ACO-SAW is 95%
[13]
CGCNB-MRM
Correlative Naïve Bayes classifier
Big data classification
Localization and Accuracy = skin dataset 79.468%
[19]
GA and ACO
Rule-based
Classification
UCI Repository
–
[20]
GA
Rule-based classifier
Classification
UCI data set repository
–
[21]
GA
Automated rule mining
Classification
Tic-Tac-Toe dataset
Fitness = 0.0629218
[22]
EMOGA
Rule-based classifier
Classification
UCI machine repository Zoo, Nursery, Adult
[23]
CRM-PSO
Rule-based classifier
Classification
Federal Reserve Accuracy = Bank of St 90% Louis, Big Charts Historical Stock Quotes, and annual report of companies
[24]
QAIS
Rule-Based classifier
Associative classification
Adult, Nursery, Iris and Breast-Cancers
–
[25]
LFNN
Text classification
Classification
RCV1
Accuracy = 95%
Table 2 List of clustering methods Sr. No
Clustering method
Year
Types
Description
1
Partitioning methods
1987 [33]
K-means k-medoids PAM CLARA
It set of data non-overlapping clusters
2
Hierarchical methods
1984 [34]
BIRCH ROCK Chamelon UPGMA
It creates a nested set of clusters that are organized as a tree
(continued)
38 Nature-Inspired Information Retrieval Systems: A Systematic Review …
473
Table 2 (continued) Sr. No
Clustering method
Year
Types
Description
3
Density-based methods
2009 [36]
DBSCAN OPTICS DENCLUE
It groups the data objects with arbitrary shapes
4
Grid-based methods
2004 [37]
STING wave cluster
It multiresolution grid structure to cluster the data objects
5
Model-based methods
2001 [38]
Expectation–maximization COBWEB SOM
It uses a model for each cluster and determines the fit of the data to the given model
6
Frequent pattern-based clustering
2003 [39]
pCluster MineClus PROCLUS
It uses patterns that are extracted from subsets of dimensions to group the data objects
7
Constraint-based clustering
2001 [40]
It uses user constraint like user’s requirement and forms the clusters on user-specified or application-specific constraints
In the paper Majhi et al. [35] identifies the problem in k-means clustering is that efficiency is dependent on cluster centers. Applied swarm intelligent techniques like Ant Lion Optimization (ALO) based on stochastic global optimization models. Here k-means is integrated with ALO to improve the efficiency of clustering operation and compare this Particle swarm optimization-based clustering (PSO-Clustering). Calculating the effectiveness of this algorithm with the sum of intra-clusters measure and f measure. According to the author, every clustering algorithm must follow these important priorities i.e., velocity, volume, variety, and value. According to Jensi et al. There are many soft computing-based techniques are proposed in the context of document clustering and many of these can search effectively local, but the global optimal solution can be achieved with high speed and high-quality optimization technique. These are a specialty of optimization algorithms that perform globalized searching in the search space such algorithms are introduced in this paper very well. They argue that all the algorithms like PSO and ACO are contributed to document clustering and found improved document clustering. This is the reason optimization techniques are used to improve the quality of clustering [41]. The most popularly used algorithms as follows: 1. 2. 3. 4.
Genetic Algorithm (GA) [42] Bees Algorithm (BA) Particle Swarm Optimization (PSO) Ant Colony Optimization (ACO)
474
B. Inje et al.
Zou et al. proposed Cooperative Article Bee Colony CABC, the basic aim of this algorithms to solve the clustering problem and compare the result with PSO, CPSO, ABC algorithms. CABC algorithm on D-dimensional vector because individual not give the best solution, here two vectors are getting form and form the good solution vector called as gbest at all as compared to GA, PSO, and ABC. There is a total of six functions are used to evaluate the proposed method listed as Sphere function, Rosenbrock function, Griewank function, Rastrigin function, Ackley function, and Schwefel function. Most of all are having global minimum value is ranging from 0 at (0, 0,..., 0) except two functions i.e., Rosenbrock function whose global minimum value is 0 at (1, 1,..., 1) and Schwefel function whose global minimum value is 0 at (420.9867, 420.9867,..., 420.9867) all the function are used to obtain the global solution and time measure for each function [43]. Kumar et al. Proposed improved Artificial Bee Colony (ABC) and Fuzzy c-means (FCM) IABCFCM algorithms use to tackle the local optima problem, FCM algorithm is the most popular method clustering but there is one disadvantage i.e., at the initial level it stuck in local optima, the problem of FCM is overcome with IABCFCM. ABC is a well-known algorithm for overcome infeasible solutions. From the results hybrid method perform well as compare to FCM and ABC algorithms but in some dataset, it performs average [44]. In the research article author studied the various bio-inspired clustering methods using distance measures. This paper compares the PSO clustering measure i.e., Chebyshev with Euclidean and Manhattan distance measure it is observed that Chebyshev is a better fitness value as compare to Euclidian and Manhattan. Some traditional algorithms are stuck in local optima in general, but this method performs poorly in high dimensional data. Most of the clustering algorithms are proposed recently some popular algorithms are listed as follows [45]. Aboubi et al. proposed BAT inspired algorithms for clustering large application, in this article author proved that the BAT-CLARA algorithm is a substitute for kmedoids algorithm for partitional clustering application the results are compared these algorithms and perform clustering on two types of dataset i.e., Concrete, Wisconsin Breast Cancer, the proposed BAT-CLARA algorithms is outperforming over the all the types of the clustering methods. Partition Around Medoids (PAM) is also called a k-medoids algorithm but this algorithm having many issues like it has efficiency problem [46].
5 Dataset Analyzing the robustness of the clustering approach involves using some of the most frequently used UCI machine learning datasets. In this part, the specifics of the datasets are briefly reviewed, and Table 9 lists the numerical values.
38 Nature-Inspired Information Retrieval Systems: A Systematic Review …
475
• Two features in Artificial Data Set 1 have four single classes each. Four independent bivariate normal distributions yield 600 data samples each. • Three featured classes and five distinct classes make up Artificial Data Set 2’s 250 data samples. • The 250 data samples of Artificial Data Set 2 are distributed among three featured classes and five different classes. • The selection of the contraceptive method choice (CMC) is based on 1473 data samples and 10 attributes from various classes. • 150 data samples of flowers from the three classes, each with four attributes, are included in the iris dataset. • 210 data samples with 7 attributes from 3 classes are included in seeds datasets. • 871 data samples make up the vowel dataset, there are three input features and six distinct vowel classes.
6 Discussion and Future Direction This section presents a discussion of the various algorithms mentions in Tables 1, 2, 3 and 4 respectively, Initially, there are many NIA where proposed, In a paper [14] ACO-PSO new hybrid variation of the PSO have been proposed for pre-processing and data classification. Here data pre-processing is takes place on the output of the PSO i.e., weights assigned based on size, content, and keywords. Assigning the weights are done through the unbiased random mechanism, and the further ACO and PSO are then applied considering different computation for minimization and maximization of the different aspects like uniform distribution, random initialization, iterations, and time constraint. Then simple adaptive weight (SAW) method has been used for ranking effect of this improves the classification accuracy as 98%. In the paper [23], CRM-PSO has been proposed, and comparably this method used for the rule-based classification (RBC) and addressed the financial forecasting area for predicting the flow change of the stock market as per the results this model is good for stock market prediction (SMP), According to [12, 13, 15–17] rule-based classification is the most popular in classification, Most of the hybridization work is perform in RBC with genetic algorithm (GA). Table 2 represents the list of hybrid RBC using NIA algorithms, and it is found that most of the research is carried out in this field by using GA and GP-based NAI i.e., 28 publications have been published on this topic. RBC is one of the variations of classification task in data mining which makes the class decision based on the IF–THEN rules, these rules are easily interpretable that’s why this type of classifier is used in descriptive analytics now a days. There is problem in RBC that rules are not mutually exclusive because of that difficult to decide the rule to the different records. The solution of this problem is ether the rule can be ordered or assign the weights to the each class rule can be unchanged.
476
B. Inje et al.
According to Table 2, a list of all popular clustering algorithms are mention, the most frequently used method in the field of IR and the most preferable method as per the list is the hierarchical method introduced in 1984 [26], most recently developed method was a density-based method [27], have been proposed in 2009, but hardly used in IR research and most popular clustering method used by IR researcher is a partition-based clustering method. There are lots of new hybrid clustering algorithms that have been proposed recently. Table 3 shows the list of hybrid k-means clustering algorithms, according to Table 3 most of the hybridization perform on k-means clustering algorithms only and extensively used in almost every field of research and every application of the data engineering field. The major challenge address by the researcher here is the determination of the optimal centroid used in every optimization algorithm and it is used to determine the optimal centroids for the local optimum points that is the reason the optimization algorithms is required, another reason is the increase in the dimensional space because of the complexity that affects the performance of the clustering algorithms [66]. There have been many hybrid classification and clustering algorithms proposed for information retrieval, from these some have archive success and some fail because of limitations. Due to the unsupervised learning nature of clustering algorithms, they are harder in the random selection of centroids at the initial stage, few algorithms are not suitable for non-convex data and few of them are sensitive to the outlier. Many clustering algorithms are easily trapped into local optima and some of them are required to preset the clusters it results in the quite sensitive behavior of the algorithm. Hierarchical clustering can handle big data well, but having a relatively high time complexity and the total numbers of clusters needed to preset. Along with fuzzy clustering algorithms having low scalability issues and trap easily local optima. Increasing in the data in terms of size because of that the scalability of the clustering must be focused in the future by the data scientists and researchers. In data mining algorithms, computational time is a big issue even for any size of the dataset.
38 Nature-Inspired Information Retrieval Systems: A Systematic Review …
477
Table 3 List of hybrid K-means clustering algorithms as follows Sr. No.
Algorithm
Variation of algorithm
Application
Measures used
Discussion
1
K-MEANS
Fuzzy K-means
Soil science [47]
Hybrid K-means
Agricultural for grape sampling variable [48]
Elbow Analysis Silhouette Analysis Precision Specificity Sensitivity Recall
It is too computationally expensive The k-means algorithm converges towards the global optimal direction
DNA microarray data clustering [49] Bee Colony K-means clustering
Recommendation system [50]
Deep K-means
t-SNE visualization method for 2D [51]
Quantum-inspired ant lion optimized hybrid k-means
Intrusion detection [52]
K-means with ALO
Comparison of different hybrid K-means algorithms on UCI repository dataset [53]
Average of sum of intra-cluster distances F-measure, Iterations, best value, average value, worst value Standard deviation
The proposed algorithm achieves the 90% accuracy
K Means-PSO
Content-centric network [54]
DR(Recall) FPR Precision F-measure
This algorithm can work on low FPR and give 99% satisfactory DR
K Means-ABC
Image analysis [55]
J e ,d min , d max fitness
Euclidean distance
2
K-MEDOIDS
BAT-CLARA
Breast cancer detection [46]
Numbers of neighbor’s
3
DBSCAN
l-DBSCAN
Hierarchical leaders clustering [56]
Rand-Index Time ratio
With same parameters l-DBSCAN take less time over DBSCAN
NIA algorithm
ALO
CABC
IABCFCM
BAT-CLARA
CSO
Work
[35]
[43]
[44]
[46]
[57]
Clustering
Clustering
Clustering
Clustering
Clustering
IR method
K-means
K-clustering
c-means
K-means
K-means
Purpose of IR
Table 4 List of hybrid clustering algorithms as follows
UCI repository
Best case
F-measure
Iris (k = 3) Glass (k = 6) Luncancer (k = 3) Soyabean (k = 4) wine (k = 3) Vowel (k = 6) Concrete Wisconsin BreastCancer
Accuracy:
Optimal Values
Performance parameter
Motor Cycle Iris Wine CMC Cancer Glass
Iris, wine, breast cancer, seed, diabetes, vehicle
Dataset
Iris-96.94 Cancer-985.1 CMC-5712.78 Wine-6431.76 Glass-256.53 (continued)
Numbers of neighbors
Iris-0.8991 Glass-0.7672 Lung Cancer-0.5683 Soyabean-0.9129 Wine-0.7325 Vowel-0.5754
Motor Cycle-2.0606e 003 Iris.-9.4603e 001 Wine-5.6937e 003
Iris-2.3926, Wine-1.1633, breast cancer 39.3384, seed-4.3598, diabetes 459.7587, vehicle-579.0972
Performance results
478 B. Inje et al.
NIA algorithm
ICSO
PSO
ACO
CSO
Work
[58]
[59]
[60]
[57]
Table 4 (continued)
Clustering
Clustering
Clustering
Clustering
IR method
K-means
K-means
K-means
K-means
Purpose of IR
UCI repository
UCI repository
UCI repository
UCI repository
Dataset
Accuracy:
Accuracy
Accuracy
Accuracy
Performance parameter
Best Case Iris-96.94 Cancer-2985.16 CMC-5712.78 Wine-16,431.76 Glass-256.53
Best Case Iris-96.89 Cancer-0.778 CMC-5756.42 Wine-16448.35 Glass-273.22
Best Case Iris-96.48 Cancer-2978.68 CMC-5792.4 Wine-16,483.61 Glass-264.56
(continued)
Best Case Iris-95.78 Cancer-2943.24 CMC-5654.11Wine-16296.44 Glass-249.25
Performance results
38 Nature-Inspired Information Retrieval Systems: A Systematic Review … 479
NIA algorithm
TLBO
HRKM
HWCA
WOATS
WGC
Work
[61]
[62]
[63]
[64]
[65]
Table 4 (continued)
Clustering
Clustering
Clustering
Clustering
Clustering
IR method
Optimal centroid
Partitional clustering
Partitional clustering
K-means
K-means
Purpose of IR
UCI repository
UCI repository
UCI repository
UCI repository
UCI repository
Dataset
F-measure Rand coefficient Jaccord coefficient
Accuracy: Silhouette index criteria
Accuracy
Accuracy
Accuracy
Performance parameter
Iris F-measure-0.9634Rand coefficient-0.9695 Jaccord coefficient -0.8949 Wine dataset F-measure 0.8703 Rand coefficient 0.8831 Jaccord coefficient 0.7778 MSE 7297.52
Iris-0.7435 Wine-0.7391Cancer-0.6980 CMC-0.6829 Glass-0.3963 Ecoli-0.3171
Iris-94.27 Cancer-96.64 Wine-73.03 Car evaluation-50.84
Iris-110.32 Wine-27,826.00 Glass-570.96
Best Case Iris-96.56 Cancer-2876.28 CMC-5778.61 Wine-16,578.42 Glass-246.89
Performance results
480 B. Inje et al.
38 Nature-Inspired Information Retrieval Systems: A Systematic Review …
481
7 Conclusion This paper introduces an extensive survey on hybrid data mining and IR algorithms. To better summarize this surveyed two major fields of data mining which is used in IR i.e., classification and clustering now firstly examine the classification method according to our survey there are a lot of scopes for the researcher to improve the existing classifiers and replaced them with NIA. Especially in the RBC because there still scope of recently developed NIA algorithm can use to improve the performance of the RBC. Further in our studies, we have addressed different clustering algorithms and take place an extensive bibliographic survey on them. Another contribution of this paper is to identify and study hybrid clustering algorithms, based on our knowledge there exists no such work that focuses on the optimization of documents clustering using ACO, PSO, CSO, TLBO, GOA, and WOA. The main purpose of this paper is to use basic and the core idea of each commonly used NIA classification and clustering algorithm specifying the source of each of the NIA classification and clustering also mention the different measures on which it is evaluated. It is very difficult to cover all the NIA used for classification and clustering due to the diversity of the information so that some popularly known algorithms are extensively studied in this paper and only a few of them are discussed in details, readers can gain a systematic and clear understanding of important hybrid classification and clustering.
References 1. Chiranjeevi HS, Shenoy MK (2021) Advanced text documents information retrieval system for search services Advanced text documents information retrieval system for search services. Cogent Eng 7:1856467. https://doi.org/10.1080/23311916.2020.1856467 2. Babu SS, Jayasudha K (2020) A survey of nature-inspired algorithm for partitional data clustering. J Phys Conf Ser 1706:012163. https://doi.org/10.1088/1742-6596/1706/1/012163 3. Widyassari AP, Rustad S, Shidik GF et al (2022) Review of automatic text summarization techniques & methods. J King Saud Univ—Comput Inf Sci 34:1029–1046. https://doi.org/10. 1016/j.jksuci.2020.05.006 4. Zorarpacı E, Özel SA (2020) Differentially private 1R classification algorithm using artificial bee colony and differential evolution. Eng Appl Artif Intell 94:103813. https://doi.org/10.1016/ j.engappai.2020.103813 5. Gupta MK, Chandra P (2020) A comprehensive survey of data mining. Int J Inf Technol 12:1243–1257. https://doi.org/10.1007/s41870-020-00427-7 6. Han J, Kamber M, Pei J (2012) Data mining: concepts and solution manual 7. Moens MF (2006) Information extraction: algorithms and prospects in a retrieval context 8. Mohd Sharef N, Kasmiran KA (2012) Examining text categorization methods for incidents analysis. Lect Notes Comput Sci 7299:154–161. https://doi.org/10.1007/978-3-642-30428-6_ 13 9. García J, Crawford B, Soto R, Astorga G (2019) A clustering algorithm applied to the binarization of Swarm intelligence continuous metaheuristics. Swarm Evol Comput 44:646–664. https://doi.org/10.1016/j.swevo.2018.08.006
482
B. Inje et al.
10. Djenouri Y, Belhadi A, Belkebir R (2018) Bees swarm optimization guided by data mining techniques for document information retrieval. Expert Syst Appl. https://doi.org/10.1016/j. eswa.2017.10.042 11. Kossack CF (2010) Statistical classification techniques. IBM Syst J 2:136–151. https://doi.org/ 10.1147/sj.1963.5388521 12. Nguyen TTS, Do PMT (2020) Classification optimization for training a large dataset with Naïve Bayes. J Comb Optim 40:141–169. https://doi.org/10.1007/s10878-020-00578-0 13. Banchhor C, Srinivasu N (2020) Integrating Cuckoo Search-Grey wolf optimization and correlative naive Bayes classifier with map reduce model for big data classification. Data Knowl Eng 127:101788. https://doi.org/10.1016/j.datak.2019.101788 14. Dubey AK, Kumar A, Agrawal R (2020) An efficient ACO-PSO-based framework for data classification and preprocessing in big data. Evol Intell. https://doi.org/10.1007/s12065-02000477-7 15. Lawrence O (2006) Hall WPKNVCKWB snopes.com: Two-Striped Telamonia Spider. J Artif Intell Res 2009:321–357 16. Hassib EM, El-Desouky AI, Labib LM, El-kenawy ESM (2020) WOA + BRNN: an imbalanced big data classification framework using Whale optimization and deep neural network. Soft Comput 24:5573–5592. https://doi.org/10.1007/s00500-019-03901-y 17. Afizi M, Shukran M (2011) Artificial bee colony based data mining algorithms for classification tasks. 5:217–231. https://doi.org/10.5539/mas.v5n4p217 18. Kamila NK, Jena L, Bhuyan HK (2016) Pareto-based multi-objective optimization for classification in data mining. Cluster Comput 19:1723–1745. https://doi.org/10.1007/s10586-0160643-0 19. Pathak A, Vashistha J (2016) Classification rule and exception mining using nature inspired algorithms 20. Vashishtha J, Kumar D, Ratnoo S, Kundu K (2011) Mining comprehensible and interesting rules: a genetic algorithm approach. Int J Comput Appl 31:39–47. https://doi.org/10.5120/ 3792-5221 21. Kapila S, Kumar D, Kanika A (2010) A genetic algorithm with entropy based initial bias for automated rule mining. Int Conf Comput Commun Technol 2010:491–495. https://doi.org/10. 1109/ICCCT.2010.5640477 22. Dehuri S, Patnaik S, Ghosh A, Mall R (2008) Application of elitist multi-objective genetic algorithm for classification rule generation. Appl Soft Comput J 8:477–487. https://doi.org/10. 1016/j.asoc.2007.02.009 23. Punitha S, Jeyakarthic M (2020) Particle swarm optimization based classification algorithm for expert prediction systems. Int Conf Inven Comput Technol 2020:671–675. https://doi.org/ 10.1109/ICICT48043.2020.9112392 24. Soliman OS, Bahgat R, Adly A (2012) Associative classification using a bio-inspired algorithm. Conf Res Pract Inf Technol Ser 134:119–125 25. Ranjan NM, Prasad RS (2018) LFNN: lion fuzzy neural network-based evolutionary model for text classification using context and sense based features. Appl Soft Comput J 71:994–1008. https://doi.org/10.1016/j.asoc.2018.07.016 26. Freitas AA (2006) Evolutionary algorithms for data mining. Data Min Knowl Discov Handb 20:435–467. https://doi.org/10.1007/0-387-25465-x_20 27. Nayyar A, Le DN, Nguyen NG (2018) Advances in swarm intelligence for optimizing problems in computer science. CRC Press, Boca Raton 28. Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recognit Lett 31:651–666. https://doi.org/10.1016/j.patrec.2009.09.011 29. Mohammed AJ, Yusof Y, Husni H (2014) Nature inspired data mining algorithm for document clustering in information retrieval. Lect Notes Comput Sci 8870:382–393. https://doi.org/10. 1007/978-3-319-12844-3_33 30. Hu G, Zhou S, Guan J, Hu X (2008) Towards effective document clustering: a constrained Kmeans based approach. Inf Process Manag 44:1397–1409. https://doi.org/10.1016/j.ipm.2008. 03.001
38 Nature-Inspired Information Retrieval Systems: A Systematic Review …
483
31. Gil-García R, Pons-Porrata A (2010) Dynamic hierarchical algorithms for document clustering. Pattern Recognit Lett 31:469–477. https://doi.org/10.1016/j.patrec.2009.11.011 32. Mokriš I, Skovajsová L (2008) Comparison of two document clustering techniques which use neural networks. ICCC 2008—IEEE international conference on computational cybernetics. IEEE, New York, pp 75–78 33. Milligan GW, Cooper MC (1987) Methodology review: clustering methods. Appl Psychol Meas 11:329–354. https://doi.org/10.1177/014662168701100401 34. Day WHE, Edelsbrunner H (1984) Efficient algorithms for agglomerative hierarchical clustering methods. J Classif 1:7–24. https://doi.org/10.1007/BF01890115 35. Majhi SK, Biswal S (2019) Kmeans and ant lion optimization. Springer, Singapore 36. Nasibov EN, Ulutagay G (2009) Robustness of density-based clustering methods with various neighborhood relations. Fuzzy Sets Syst 160:3601–3615. https://doi.org/10.1016/j.fss.2009. 06.012 37. Park NH, Lee WS (2004) Statistical grid-based clustering over data streams. SIGMOD Rec 33:32–37. https://doi.org/10.1145/974121.974127 38. Meilˇa M, Heckerman D (2001) Experimental comparison of model-based clustering methods. Mach Learn 42:9–29. https://doi.org/10.1023/A:1007648401407 39. Yiu ML, Mamoulis N (2003) Frequent-pattern based iterative projected clustering. Proc—IEEE Int Conf Data Mining, ICDM 689–692. https://doi.org/10.1109/icdm.2003.1251009 40. Tung AKH, Han J, Lakshmanan LVS, Ng RT (2001) Constraint-based clustering in large databases. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 1973:405–419. https://doi.org/10.1007/3-540-44503-x_26 41. Jensi R, Jiji DGW (2013) A Survey on Optimization Approaches to Text Document Clustering. Int J Comput Sci Appl 3(31):44. https://doi.org/10.5121/ijcsa.2013.3604 42. Baccichetti F, Bordin F, Carlassare F (1979) λ-Prophage induction by furocoumarin photosensitization. Experientia 35:183–184. https://doi.org/10.1007/BF01920603 43. Zou W, Zhu Y, Chen H, Sui X (2010) A clustering approach using cooperative artificial bee colony algorithm. Discret Dyn Nat Soc https://doi.org/10.1155/2010/459796 44. Kumar A, Kumar D, Jarial SK (2017) A hybrid clustering method based on improved artificial bee colony and fuzzy C-means algorithm. Int J Artif Intell 15:40–60 45. Jafar OAM, Sivakumar R (2013) A study of bio-inspired algorithm to data clustering using different distance measures. Int J Comput Appl 66:33–44 46. Aboubi Y, Drias H, Kamel N (2016) BAT-CLARA: BAT-inspired algorithm for clustering LARge applications. IFAC-PapersOnLine 49:243–248. https://doi.org/10.1016/j.ifacol.2016. 07.607 47. Heil J, Häring V, Marschner B, Stumpe B (2019) Advantages of fuzzy k-means over k-means clustering in the classification of diffuse reflectance soil spectra: a case study with West African soils. Geoderma 337:11–21. https://doi.org/10.1016/j.geoderma.2018.09.004 48. Al Kindhi B, Sardjono TA, Purnomo MH, Verkerke GJ (2019) Hybrid K-means, fuzzy Cmeans, and hierarchical clustering for DNA hepatitis C virus trend mutation analysis. Expert Syst Appl 121:373–381. https://doi.org/10.1016/j.eswa.2018.12.019 49. Kumar A, Kumar D, Jarial SK (2018) A novel hybrid K-means and artificial bee colony algorithm approach for data clustering. Decis Sci Lett 7:65–76. https://doi.org/10.5267/j.dsl. 2017.4.003 50. Ruihong Z, Zhihua H (2020) Collaborative filtering recommendation algorithm based on bee colony K- means clustering model. Microprocess Microsyst 103424. https://doi.org/10.1016/ j.micpro.2020.103424 51. Moradi Fard M, Thonet T, Gaussier E (2020) Deep k-Means: Jointly clustering with k-Means and learning representations. Pattern Recognit Lett 138:185–192. https://doi.org/10.1016/j.pat rec.2020.07.028 52. Chen J, Qi X, Chen L et al (2020) Quantum-inspired ant lion optimized hybrid k-means for cluster analysis and intrusion detection. Knowledge-Based Syst 203:106167. https://doi.org/ 10.1016/j.knosys.2020.106167
484
B. Inje et al.
53. Majhi SK, Biswal S (2018) Optimal cluster analysis using hybrid K-means and ant lion optimizer. Karbala Int J Mod Sci 4:347–360. https://doi.org/10.1016/j.kijoms.2018.09.001 54. Karami A, Guerrero-Zapata M (2015) A fuzzy anomaly detection system based on hybrid PSOKmeans algorithm in content-centric networks. Neurocomputing 149:1253–1269. https://doi. org/10.1016/j.neucom.2014.08.070 55. Hancer E, Ozturk C, Karaboga D (2012) Artificial bee colony based image clustering method. IEEE Congr Evol Comput CEC 2012:1–5. https://doi.org/10.1109/CEC.2012.6252919 56. Viswanath P, Pinkesh R (2006) L-DBSCAN: A fast hybrid density based clustering method. Proc—Int Conf Pattern Recognit 1:912–915. https://doi.org/10.1109/ICPR.2006.741 57. Chu SA, Tsai PW, Pan JS (2006) Cat swarm optimization. Lect Notes Comput Sci 4099:854– 858. https://doi.org/10.1007/11801603_94 58. Kumar Y, Singh PK (2018) Improved cat swarm optimization algorithm for solving global optimization problems and its application to clustering. Appl Intell 48:2681–2697. https://doi. org/10.1007/s10489-017-1096-8 59. Toreini E, Mehrnejad M (2011) Clustering data with particle swarm optimization using a new fitness. Conf Data Min Optim 266–270. https://doi.org/10.1109/DMO.2011.5976539 60. Kolhe SR, Sawarkar SD (2017) A concept driven document clustering using WordNet. Int Conf Nascent Technol Eng ICNTE 2017:1–5. https://doi.org/10.1109/ICNTE.2017.7947888 61. Naik A, Satapathy SC, Parvathi K (2012) Improvement of initial cluster center of c-means using teaching learning based optimization. Procedia Technol 6:428–435. https://doi.org/10. 1016/j.protcy.2012.10.051 62. Liu C, Wang C, Hu J, Ye Z (2017) Improved K-means algorithm based on hybrid rice optimization algorithm. 2017 9th IEEE international conference on intelligent data acquisition and advanced computing systems: technology and applications (IDAACS). IEEE, New York, pp 788–791 63. Wu ZX, Huang KW, Girsang AS (2018) A whole crow search algorithm for solving data clustering. Conf Technol Appl Artif Intell TAAI 2018:152–155. https://doi.org/10.1109/TAAI. 2018.00040 64. Ghany KKA, AbdelAziz AM, Soliman THA, Sewisy AAEM (2020) A hybrid modified step Whale Optimization Algorithm with Tabu search for data clustering. J King Saud Univ— Comput Inf Sci. 34:832–839. https://doi.org/10.1016/j.jksuci.2020.01.015 65. Jadhav AN, Gomathi N (2018) WGC: Hybridization of exponential grey wolf optimizer with whale optimization for data clustering. Alexandria Eng. J. 57:1569–1584 66. van Eck NJ, Waltman L (2010) Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics 84:523–538. https://doi.org/10.1007/s11192-009-0146-3
Chapter 39
Deep Learning Based Smart Attendance System Prabhjot Kaur, Mridul Namboori, Aditi Pandey, and Keshav Tyagi
1 Introduction Attendance plays a very important role in educational organizations whether it is school or college. It also reflects the seriousness of the child towards education as having a poor attendance can also hamper the academic record of the child. The most common method to take attendance is the manual system at present, in which the teachers, lab assistants mark the attendance manually on registers which results in a lot of paperwork as well as time and there are also high chances of falsified attendance. Each class requires about 5–6 min for the process so in a whole semester the time wasted almost equals the duration of a lecture or more and the process doesn’t end here. Now the institute has to mark this in their systems also so more time is now required for counting and adding records in the system. Slowly we are automating almost everything so if the process of attendance is also automated then it will result in saving of a lot of time and paperwork for the teachers as well as the students and in preventing fake attendance. There are many methods for automation of attendance like the Radio Frequency Identification System (RFID), fingerprint identifications and Face Recognition. The RFID system requires a card and sometimes students can forget the card [1] so it’s not always effective as for fingerprint, the students need to form a line in order to mark their attendance which is also time consuming. In the Facial Recognition System, the faces are detected and recognized from the given image or video, it uniquely recognizes the person by using their facial features. It detects the face and then tries to match it with the images in the database. In the last decade, facial recognition systems have gone through drastic changes making it more efficient and effective. Researchers have proposed many systems P. Kaur (B) · M. Namboori · A. Pandey · K. Tyagi Department of Information Technology, Maharaja Surajmal Institute of Technology, GGSIPU, Delhi, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_39
485
486
P. Kaur et al.
to take attendance using face recognition system Hapani et al. [2], Sanli and Ilgen [3], Sarkar et al. [4], Arsenovic et al. [5], Fu et al. [6], Dev and Patnaik [7], Zhang et al. [9], Deng et al. [10], Nguyen et al. [11], Varsha and Chitra [12], Raj et al. [13], Reddy et al. [14], Ali et al. [15]. However, when it comes to more challenging images such as faces taken in uncontrolled environments like those taken with long-distance cameras, face recognition is not perfect. Also, in case of low-quality images result is not yet satisfactory. Face recognition is the most efficient, more appropriate and faster technique for attendance management system and it also reduces the chance of proxy interaction. The aim of this paper is to develop an automatic attendance management system that uses MTCNN model to detect students’ faces from images, extract facial features using the ArcFace model, and classify them using Cosine Similarity with the help of encodings stored in database. The attendance will be marked after successfully recognizing the faces.
2 Literature Review Hapani et al. [2] proposed a system whose objective was to build a system that can detect and recognize faces from the videos. They used Viola Jones algorithm for detecting the images and Fisher Face algorithm for recognizing. An accuracy of 45–50% was achieved and by improving training process it can be increased. Sanli and Ilgen [3] presented a system which uses Haar filtered Adaboost for the purpose of face detection and Principal Component Analysis (PCA) and Local Binary Pattern Histogram (LBPH) algorithms for face identification. Their accuracy rate is below 90% lying in the range 75–90%. Sarkar et al. [4] in 2019 presented a system employing deep learning frameworks. This automated attendance system was created with convolutional neural network in mind. Their proposed system has an accuracy rate of 98.67% in case of LFW datasets while 100% in case of classroom datasets. They further improved the accuracy of facial verification by learning the alignment of faces using a spatial transformer network. Arsenovic et al. [5] proposed a system where a deep learning-based face recognition attendance system is proposed. CNN cascade for face detection and CNN for generating face embeddings. The practical use of these deep learning approaches for face recognition tasks was the main objective of this research. For a small dataset of the original face photos of employees in the real-time setting, the overall accuracy was 95.02%. Fu et al. [6] in 2017 proposed a university classroom automatic attendance system. By combining the MTCNN face detection and Center-Face face recognition deep learning algorithms, it achieved an accuracy rate of 98.87%. Dev and Patnik [7] in 2020 proposed a system, which was evaluated using three different algorithms, As per their analysis, KNN was the top ranker with 99.27% accuracy, CNN was ranked second with overall accuracy of 95% and SVM was third with an accuracy of 88%.
39 Deep Learning Based Smart Attendance System
487
3 Scope of the Work To follow the attendance of students is a major concern in many educational institutions. The traditional system of manual management of the attendance is a time consuming and tedious task for crowded classrooms. Earlier also many methods for automation of attendance have been proposed like the Radio Frequency Identification System (RFID), fingerprint identifications and Face Recognition but each of them has certain disadvantages, for example, the RFID system requires a card and sometimes students can forget the card so it’s not always effective as for fingerprint, the students need to form a line in order to mark their attendance which is also time consuming. This paper proposes an effective approach to monitor attendance using the human face. It is proposed to reduce the shortcomings of traditional system of attendance management. The system gave good results on the datasets used, saving a lot of time especially when there are large number of students. The use of image processing methods in a classroom is demonstrated by this attendance system. The system can be enhanced by putting it into use in real-time, grabbing photos from live videos, identifying faces from constructed frames, and then recording attendance, which is the next task. Sentiment analysis and emotional analysis may also be used in the future. Though Sentimental analysis it can be found at what time and on which topics the concentration of the students is the highest. Through Emotional analysis the faculties can get the feedback of their class and can change their methods.
4 Proposed Methodology This paper proposes a face detection and recognition-based automatic attendance management system. The suggested system intends to capture images of the students from classroom photographs, recognise their faces from those images, and mark their attendance automatically after successful recognition. The proposed system architecture is divided into two phases. In the first phase, each student is registered in the system using their faces with different orientation and with different expressions and the faces are stored in respective folders according to the serial number assigned to them and after that encodings are generated and stored in the respective folders. After the database is created, in the second phase the input image of the classroom is given to the model which then extracts the face from it and after processing the image it gets the encodings and then each encoding is compared with the encodings stored in the database. Once the perfect score is calculated for each face, then the scores are evaluated to check whether they cross the threshold value. If it crosses the threshold value, attendance is marked else not. Figure 1 shows the flowchart for the proposed system.
488
P. Kaur et al.
Fig. 1 Architecture for the proposed attendance system
4.1 Database Development In the first phase, each face was captured in a different orientation and with different expressions. The various illumination conditions are maintained while capturing the face images of individuals and the faces are stored in their respective folders which is denoted by the serial number of the subject in the database. After that the encodings are generated using these faces and these are stored in database along with the serial number of the respective student. Figure 2 shows the algorithm used for developing the database by converting raw images into relevant encodings.
39 Deep Learning Based Smart Attendance System Fig. 2 Algorithm for conversion of raw images into corresponding embeddings
489
Input: Declare empty lists ‘X’, ‘Y’ and ‘image’ Output: Encoding of Subjects Step 1: process each directory of subjects from 1 to N Apply MTCNN over each image If face is detected then Crop the face and store in ‘image’ list Store serial number in ‘Y’ list else connue Step 2: Apply ArcFace over ‘image’ list and store in ‘X’ list Step 3: Convet ‘X’ and ‘Y’ as numpy arrays Step 4: Save ‘X’ and ‘Y’ at local computer
4.2 Input Images of the students from the classroom serve as the model’s input. Images should be taken in such a way that it captures all the students and their faces effectively. Then the pictures are uploaded on the system so that the model can conduct further processing.
4.3 Image Processing Feature Detection The process of locating and extracting faces from photographs is called face detection. Face detection is performed using MTCNN. This cutting-edge deep learning face detection model is described by Brownlee [8] and Zhang et al. [9]. The faces in the photographs were localised, and bounding boxes were made around them. In order to condense the dataset to a set of faces, the initial step was to recognise faces in photos. The photos were loaded in this as a Numpy array. Then, a fresh dataset was obtained [1]. Figure 3 shows a sample of the reduced dataset. Here, the faces are detected from each image of one class and the dataset is reduced to a series of faces only. Feature Extraction Face embeddings are made in this section. Face embeddings are vectors that show the features of the face that have been extracted [1]. The vectors created for the other faces are then compared to these embeddings. The face embeddings for a face are produced by the ArcFace model proposed by Deng et al. [10]. Before feeding the images directly to the model, it resizes each image to a fixed size of (112, 112), convert them to a RGB image, and normalize them. Each face embedding is a vector of 512 numbers.
490
P. Kaur et al.
Fig. 3 Sample of dataset created after successfully detecting faces using MTCNN
4.4 Face Classification When the face embeddings get generated from the detected faces using ArcFace, these embeddings are compared with the embeddings stored in database using Cosine Similarity. After obtaining a perfect score for each embedding, the score is checked whether it has crossed the minimum threshold value set for recognition. If the value is crossed, the attendance is marked for the subsequent serial number else the face is rejected and marked as ‘0’. Figure 4 shows the algorithm for marking the attendance of subjects from group session image(s).
Input: Declare empty list ‘image’, ‘embeddings’, ‘subjects’ Output: Aendance sheet Step 1: Process each session image 1 to N Apply MTCNN over each image If face(s) are detected then Crop the face(s) and store in ‘image’ list else Connue Step 2: Apply ArcFace over ‘image’ list and store encodings in ‘embeddings’ list Step 3: Load the database arrays ‘X’ and ‘Y’ Step 4: Compute maximum Cosine Similarity for each face against every face embedding in database from 1 to M if value > 0.5 then Serial number of corresponding subject is stored in ‘subjects’ list else ‘0’ is stored in ‘subjects’ list Step 5: use ‘subjects’ list to compile the aendance sheet
Fig. 4 Algorithm for marking attendance of subjects in a classroom
39 Deep Learning Based Smart Attendance System
491
4.5 Testing For testing, we took images of our classroom with all the registered students present in it along with a subject which has not been registered in the database. After processing the image, face embeddings are generated and they are used to recognize the faces and mark attendance. In the end we obtain an excel sheet as the output with each student name and next to it whether they are present or not.
5 Results and Discussions 5.1 Datasets For database development, faces of subjects was captured from different positions in different expressions considering the necessary lighting conditions. The images of the subjects are stored the different folders which are named after the serial number of the subjects. Figure 5 shows a sample of database images of subject ‘4’ which will later be used for attendance marking. For this system, we created our own dataset of 5 subjects. The dataset table is shown below in Table 1. For testing, 134 group session images were taken with all subjects along with an unrecognized subject from outside the database.
Fig. 5 Sample of database of subject ‘4’
492
P. Kaur et al.
Table 1 Table showing the number of images of each subject used to build database taking different orientations of faces into account Serial number of subject
1
2
3
4
5
Front
542
395
594
867
176
Left
638
584
414
640
175
Right
594
266
425
920
174
Total
1773
1245
1433
2426
525
5.2 Experimental Results The group image which was used as the input is shown below in Fig. 6. The confusion matrix as depicted in Fig. 7 shows the results evaluation of 5 subjects over 134 session images where serial number of subjects are in the range [1, 5]. Here ‘0’ represents one or more unrecognized subjects in the session image. After the successful detection and recognition of subjects, the attendance of the recognized faces is marked in an excel document (Table 2). The qualitative results are shown in Table 3. Hence, the proposed system manages the attendance efficiently.
Fig. 6 A sample session image given as an input to proposed system
39 Deep Learning Based Smart Attendance System
493
Fig. 7 Confusion matrix showing the performance of proposed system over 134 session images
Table 2 Attendance marked in excel document
Table 3 Performance metrics
S. No
Name
Attendance status
1
Aditi Pandey
Present
2
Kavya Sharma
Present
3
Keshav Tyagi
Present
4
Mridul Namboori
Present
Performance metrics
Performance score
Accuracy
0.97
Recall
0.99
Precision
0.99
F1 score
0.99
6 Conclusion The proposed attendence system provides better alternative to traditional attendance system as it minimizes the disadvantages such as time wastage and proxy attendence etc. associated with the manual (traditional) attendence system. The system was developed using the concepts of face detection and recognition and it worked well with the given dataset of 5 subjects giving an accuracy of 97.01%. It peformed well even when it was fed with different facial scenerios.
494
P. Kaur et al.
The proposed method is highly scalable and can be implemented into a larger organizations. Future applications may also include sentimental analysis and emotional analysis. Though Sentimental analysis it can be found at what time and on which topics the concentration of the students is the highest. Because of the large number of students, the classroom learning environment differs from the one-on-one online tutoring and offline tutoring environments in that the teacher cannot pay attention to each student’s emotional state and provide constant feedback while taking the course schedule into account.
References 1. Arsenovic M, Sladojevic S, Anderla A, Stefanovic D (2017) FACETIME—deep learning based face recognition attendance system. In: 2017 IEEE 15th international symposium on intelligent systems and informatics (SISY), pp 000053–000058 2. Hapani S, Prabhu N, Parakhiya N, Paghdal M (2018) Automated attendance system using image processing. In: 2018 fourth international conference on computing communication control and automation (ICCUBEA), pp 1–5 3. Sanli O, Ilgen B (2018) Face detection and recognition for automatic attendance system. In: Proceedings of SAI intelligent systems conference, pp 237–245 4. Sarkar PR, Mishra D, Subhramanyam GRS (2019) Automatic attendance system using deep learning framework. In: Machine intelligence and signal analysis, pp 335–346 5. Arsenovic M, Sladojevic S, Anderla A, Stefanovic D (2017) FaceTime—deep learning based face recognition attendance system. 2017 IEEE 15th International symposium on intelligent systems and informatics (SISY). Piscataway, IEEE, pp 000053–000058 6. Fu R, Wang D, Li D, Luo Z (2017) University classroom attendance based on deep learning. 2017 10th international conference on intelligent computation technology and automation (ICICTA). Piscataway, IEEE, pp 128–131 7. Dev S, Patnaik T (2020) Student attendance system using face recognition. 2020 international conference on smart electronics and communication (ICOSEC). Piscataway, IEEE, pp 90–96 8. Brownlee J (2020) Machine learning algorithms in python, machine learning mastery. Available: https://machinelearningmastery.com/machine-learning-with-python/. Accessed 8 June 2020 9. Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503 10. Deng J, Guo J, Xue N, Zafeiriou S (2019) Arcface: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4690–4699 11. Nguyen DD, Le MH, Nguyen XH, Ngo HT, Nguyen MS (2022) Smart desk in hybrid classroom: automatic attendance system based on face recognition using MTCNN and ARCFACE. In: 2022 international conference on multimedia analysis and pattern recognition (MAPR). Piscataway, IEEE, pp 1–6 12. Varsha M, Chitra Nair S (2022) Automatic attendance management system using face detection and face recognition. In: IoT and analytics for sensor networks: proceedings of ICWSNUCA 2021, Singapore. Springer, pp 97–106 13. Raj AA, Shoheb M, Arvind K, Chethan KS (2020) Face recognition based smart attendance system. 2020 international conference on intelligent engineering and management (ICIEM). Piscataway, IEEE, pp 354–357 14. Reddy NS, Sumanth MV, Babu SS (2018) A counterpart approach to attendance and feedback system using machine learning techniques. JETIR-Int J Emerg Technol Innov Res. ISSN:23495162
39 Deep Learning Based Smart Attendance System
495
15. Ali M, Zahoor HU, Ali A, Qureshi MA (2020) Smart multiple attendance system through single image. 2020 IEEE 23rd international multitopic conference (INMIC). Piscataway, IEEE, pp 1–5
Chapter 40
Prediction of Clinical Depression Through Acoustic Feature Sampling Using Deep Learning and Random Forest Technique Based on BDI-II Scale of Psychiatry Pratiksha Meshram
and Radha Krishna Rambola
1 Introduction Clinical depression can also be the reason for causing variety of disorders in human body. In medical study several autoimmune diseases exist, for which no exact cause as well no exact remedy is available in market. Basic disease generations start from thinking pattern of a person, his buried sorrows, his or her past shocks and the circumstance and conditions he or she experienced. Corona-virus outbreak affects lot for psychological conditions of the people. A frequent concern nowadays that affects everyone is mental health. Anxiety and depression disorders are the most prevalent issues across all age groups, from adolescence to the elderly. Machine learning is the most popular technique to analyze such type of data. As depression can be scaled depend on mood swing, expressions, speech and body language [1, 2]. Now a day gender-based depression is also categorized for vowel-based depression detection. Connection amongst dialog structures and the enactment of classifiers was witnessed in several research for depression detection [3, 4].
P. Meshram (B) Department of I.T, Mukesh Patel School of Technology Management & Engineering, SVKM’s NMIMS, Shirpur, India e-mail: [email protected] R. K. Rambola Department of Computer Engineering, Mukesh Patel School of Technology, SVKM’s NMIMS, Shirpur, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_40
497
498
P. Meshram and R. K. Rambola
Depression is the reason behind the suicidal tendency for the most of the peoples. It is evident quite a bit by monitoring a person’s current state. Inability to focus and lack of interest are the main symptoms, but severe depression can also bring migraines and suicidal thoughts, which causes the death toll to climb exponentially. Accordingly, it was shown that depression is mostly on the rise in teenagers aged 13–20. Consequently, early diagnosis is crucial to preventing the development of new illnesses [5, 6]. Database of voice recordings were used for comparison and continuous monitoring of depression. Context and speech are considered as an important parameter. Depression has been proven to have an effect on vocal features, including pitch of the voice, pace, hoarseness, intensity, flat voice, and protracted pauses [7–9]. Several efforts have been made to predict depression level identification using the voice qualities, because of recent advancements in machine learning and improved sensor technologies [10]. Additionally, combining speech analysis and facial expression geometry, depression has been detected [8, 11]. As a result, therapists and medical professionals may find an automated computer Programme to be highly helpful when treating individuals with clinical depression [12, 13]. This paper proposes the detection mechanism for depression using Speech via machine learning approach. SectionI contains introduction for the proposed research methodology Section two contains deep literature survey in order to analyze the existing approaches and methodologies for the depression detection. Section three outlines the proposed methodology for the research work, Section four conclude with results and discussion Section five contains the conclusion extended with references utilized for the research work.
2 Literature Survey Various studies being conducted to substitute the current survey approach, of detecting depression under the surveillance of a psychiatric practitioner with an automated speech analysis method. Studies discovered that the speech (acoustic) features are prosodic, glottal, cepstral, and ethereal, which are majorly divided into two major classes: perceptual (consist of prosodic, glottal, cepstral and glottal) another class is physiological (comprises of ethereal aspects). A building block architecture was developed by the researchers to categorize the depressed patients. The primary phase in the architecture’s step by step categorization method is the Database of individual’s Speech, which entails collecting perceptual features from depressed as well as healthy patients for formation of the database. Pre-processing is an additional. In this step, the audio files are cleaned up of background noise and the vocals of the patients and physicians are separated to extract the features for analysis. This feature extraction step involves identifying multiple aspects of the speech using different feature extraction methods and determining which elements are essential for describing depression. The characteristics that are collected from the dataset are then used for training the classification model, which are subsequently trained by applying different ML algorithms. Decision incorporates a training model that is useful in categorizing the
40 Prediction of Clinical Depression Through Acoustic Feature Sampling …
499
patient is healthy or sad. For the purpose of tracking mental illnesses including depression, anxiety, and bipolar disorders, sensor data is employed along with machine language. World Mental Health Survey Consortium declares that, in industrialized nations mental health patients issues are common than nations which are developing, yet the demand for treatment have less chances to meet in the latter phases [1]. Data was gathered by several nations to explore the contemporaneous classification of despair and concurrent mental illness utilizing multidimensional feature of the Bayesian network categorization., which is multi-dimensional in feature as well as which is also known as MBC approach [5]. High-risk unhappy patients were used in an experiment, along with major unhappy and happy patients, where glottal and jittered voice flow spectrum were measured from excitation in dialogue levels. Frequency instability is measured by vocal jitter and the glottal spectrum of voice flow reveals the intensity of air flow affects the spectrum [7]. MDD is identified using “on body” system according to younger researchers to evaluate its treatment. In this study dataset is generated using the telephone conversations obtained during patient’s office visits. For scaling severity of depressed patients Hamilton Depression Rating Scale (17-item, HAM-D) is used [10, 13, 14]. Another researcher made use of the theory that unrestricted speech rather than read speech is better at characterizing clinical suffering. They used a cross-approval strategy known as “leave-one-out cross-validation” applying the Support Vector Machine (SVM). A few Low-Level Descriptors (LLD) may get identified as edge-byedge descriptions and were selected using an application referred as “open SMILE.“ Because it was assumed that SVM produced a greater order, unrestricted discourse was often acknowledged more quickly than read discourses [8, 15]. Few Professionals introduced a framework by ignoring the acoustic highpoints of younger patients and then bifurcated the remaining data in 2 classes: Discouraged and Control class—then applied AI techniques called as Gaussian Mixture Model (GMM) and the Support Vector Machine (SVM). The Audio data set obtained from Oregon Research Institute, which consist of vocal samples of 139 individuals, was subjected to component extraction. Approximate 14 auditory highlights was selected to visibly illustrate the variations between the discourse of discouraged and control patients. On both male and female patients, numerous investigations were conducted while combining various highlights and AI processes; the normal accuracy for male patients calculated 83% on the other hand for female patients is 75% [8]. Other Survey specialist proposed the basis with primary goal of discovering a bunch of discourse highlights which will be useful to separate individuals the individuals who are discouraged through relative investigation by include choice technique and get the top k highlights, for highlight choice they utilized order them in frequencies, cover’s and made arrangements, they include determination strategy by consolidating the negligible repetition maximal significance (mRMR) measure as the channel approach and the Sequential Forward Floating Selection (SFFS). Later to category out the list of capabilities will help to seriousness of burdensome issue and even to additionally anticipate by follow up examination and further to find which talk with way is better based on discovering they executed viable detecting emotionally
500
P. Meshram and R. K. Rambola
supportive network to help clinicians. Of late order was implemented using K-mean clustering and Support Vector Machine (SVM) [16]. Limitation observed for component determination measure exceptionally productive pursuit calculation requires tends to brought about great computational expense and information assortment measure their joined subject to understand section, picture portrayal and meeting. No notice of accuracy. Another researcher developed a technique to identify depression using both speech and facial expressions. The system is demonstrated in the following. Principal Component Analysis (PCA) was employed for both video and audio feature selection, and finish points on Beck depression scale were used for conclusion [17]. The researcher also talked about developing a chatbot to identify the causes of sadness while simultaneously analyzing user speech to determine whether the user is depressed. Finding the reason using a Radial Bias Function Network. Other research involved gathering data physically or through an organization (such as DVAC), preprocessing, choosing acoustic features, and then classifying the data using ML algorithms like K-means Clustering algorithm, SVM, fivefold cross validation, regression and mathematical methods [17, 18]. Limitation: The data gathering technique included reading a text, watching at pictures, and interviewing. Audio and video data quality and features are dependent on gender of individual. Facial images widely contribute for scaling depression using facial action coding system (FACS) CNN is applied for feature selection and random forest outperformed as compared to other algorithms of machine learning [38] and Variety of approaches can still be debated and put into practice, and each had its own outcome. The analysis of sample data collected through lifeline numbers [19] was the first approach. Another way involves using the k mean procedure, which, although producing outcomes by narrow down correct direction, was extremely tough to measure and monitor” [20]. After learning to recognize “red warning signals,” a CNN model was developed that might illustrate the acuteness of depression [21]. Although a logistic regression model was effective, it was skewed against men [22]. With the limitations of poor precision there is necessity of questions, then the final technique relied on BDI questionnaire, voice recognition using hybridization of k mean and Google API [23, 24]. It is additionally conceivable that the proportion of language is a downturn related develop that is free of the other estimated factors and is excluded from the downturn scale. It would likewise be intriguing to explore whether the identified language movement mirrors the presence or nonattendance of sorrow in an individual’s present burdensome state (attributes). Other issue to consider is whether low Pitch language movement reflects a burdensome condition in the present or something that distinguishes people who are likely to to melancholy from others who are not. The model may determine BDI scale based on both the volume of onerous side-effects and the severity of mental disease since self-announced wretchedness varies [2, 25]. To the most awesome aspect our insight, there is little investigation into the utilization of language capacities to distinguish despondency-based language. Be that as it may, utilizing the language work offers the likelihood to computerize the identification of grief and increment the screening capacity, as speech tests and surveys can also be filled in. We accept voice can be utilized to forestall and treat sorrow,
40 Prediction of Clinical Depression Through Acoustic Feature Sampling …
501
and over the long haul, we will probably build up an exhaustive voice biomarker for discouragement to decide the presence or nonappearance of gloom in individuals with a wide scope of mental illnesses [11, 26, 27]. Third, the alluring aftereffects of this investigation recommend that intricate cerebrum availability properties can describe unusual mind action for precise determination of gloom. In addition, unpretentious measures to screen misery side effects in regular day to day existence could have extraordinary advantages in the early recognition of discouragement for clinical treatment. Alexithymia predicts psychological well-being, which is estimated by the existence or nonappearance of side-effects of sorrow in depressive patients having a wide scope of mental illnesses [25, 27, 28]. Social media pattern text and post which people use to post on wall can also be used as on of the modality of depression analysis [29]. The most prevalent type of mental disorder in the United States, PD, has been studied in individuals. This data set explains the function of dopaminergic neurotransmitters in PD patients’ depression. Depressive symptoms are substantially less frequent in PD patients than in healthy patients, which shows that PD and depression play a significant role in the early diagnosis and providing solution to depression treatment, despite the fact that prosodic motor symptoms is a strong predictor to start dopamine therapy for healing the patients [27]. We as a whole realize misery will strike, yet what we don’t understand is that it can go inseparably with the time of despondency. The passionate condition of an individual with sorrow can influence the acoustic nature of discourse. In any case, it is as yet hazy whether acoustic parameters are significant for recognition of melancholy in their initial phases, precisely for finding of sadness. Each kind of character and everyone can encounter sorrow, and this doesn’t generally prompt a more profound state than depression [26, 30]. On one level, it isn’t astonishing that burdensome manifestations are emphatically connected with a hereditary danger of significant misery. The way that varieties in ventral help cephalon volume are hereditarily connected with the danger of intermittent serious sorrow proposes that these characteristics are affected by similar hereditary variables, and this thus adds to the advancement of melancholy as a psychological instability. Then again, this doesn’t really prompt sorrow, yet it recommends that they are influenced by a portion of the characteristics that are impacted by a similar hereditary factor [25]. From one viewpoint, it isn’t astonishing that burdensome manifestations are unequivocally connected with a hereditary danger of extreme gloom. Indeed, past examination has demonstrated that piece of this improvement might be because of an inclination to cynicism in members with melancholy [2, 25]. Survey conveys about multiple modalities like text audio, video, patient history, brain MRI and writing pattern can also be used for identification of depression using any kind of scale of psychiatry [31–33]. Notwithstanding voice-based programmed sorrow discovery, most created frameworks follow one of two procedures. Examination says, we applied a direct relapse model to foresee seriousness of sorrow, which expanded fundamentally in execution because of basic relapse models. Here the
502
P. Meshram and R. K. Rambola
model gave a constant BDI score and we likewise utilized STEDD-20, which, contingent upon language and feeling, is the best device for distinguishing gloom in an indistinguishable data set for comparison [25, 26, 34]. The Hamilton Anxiety Scale predicts that it is so hard to perceive and portray sentiments in patients with wretchedness and sound subjects. PERM scale, where trouble in distinguishing feelings anticipated the perm vaguely; theatrical and jumpy styles anticipated remotely situated reasoning; and avoidable style anticipated trouble in depicting emotions. Troubles in distinguishing feelings were anticipated by the solitary style of PERM and the PERM-subordinate style of better subjects, yet not by the theatrical or jumpy style [28]. It was additionally discovered that individuals with ADHD frequently display ADHD qualities, for example, hyperactivity, lack of caution and diminished compassion, just as a propensity to self-hurt. Individuals with ADD/ADHD were additionally found to habitually show hyperactive conduct, over the top imprudence, hyper power, low-confidence, helpless restraint, significant gradations of nervousness, sorrow, uneasiness issue, or wretchedness, such as-symptoms.
3 Design Methodology 3.1 Audio Dataset In this architecture, we have used the AVEC-2019 dataset for training the ML model. It contains the recorded sessions of depression patients, their recorded interaction with the AI based assistant. The average age was recorded as 32.5 years with a range of 16–64 years. Each sample length varies from eight to 23 min having mean recording duration of eighteen minutes. In all there was 170 recorded samples. preprocessing of the recorded clips is done to process to extract audio file. We extracted the portion consisting the voice of the depressed individual and discarded the rest referring to the transcript file using voice activation detection algorithm using toolkit from MATLAB. After cleaning and extracting the useful portion from data, we further extracted two sets of audio characteristics. To extract the characteristics from the dataset, we have used an open-sourced Programme called openSMILE. The other part comprised cosine coefficients for discrete transformation for each descriptor of the respective audio segment, whereas the first set contained statistical characteristics for descriptors of lower level. By keeping the coefficients from the second set and normalizing the features from the first dataset, we were able to simplify the difficulty of the estimated features. In all, 2300 characteristics were retrieved, of which 35 related to spectra, mass, and energy, such as loudness, harmonicity, and skewness. The 10 characteristics included their jitter, F0 score, shimmer, and other acoustic and vocal-related aspects. For selecting and ranking features we randomized the dataset and applied algorithms for selecting and ranking important features according to their priority and
40 Prediction of Clinical Depression Through Acoustic Feature Sampling …
503
importance. We divided the database into three groups, 80:10:10, for training, testing, and validation. To grade or categorize pitch-related information for a better classification than other variables, we employed Mel-frequency cepstral coefficients. To improve the effectiveness of classification, visualization, and selection for the characteristics collected from the dataset, we employed a genetic algorithm. The dataset does not contain spectrograms. The audios were first divided into short chunks of 7– 10 s, with the debate samples being moved ahead by one second. During computing the mean of the right and left channels, we have done segmentation. Additionally, we do normalization by limiting the amplitude and frequency to 15 kHz and charting the lowermost and extreme values ranges from −1 to 1. We used Pysox audio manipulation library to sample and segment audio files according to their period of duration capturing acoustic samples and ignoring background noise. After segmenting the speech samples, we created a spectrogram for each segment before training and validating convolutional neural networks on that segment.
3.2 Speech Segments Extraction of Features To make the dataset meet the needs of our model, we first utilized data pre-processing techniques. Dimensionality reduction is vital for extraction of deep features from audio dataset after segmenting audio data into manageable chunks. To prevent using up additional memory, we shrink the size of feature vector by turning input constraints. For the purpose of computing the depression scale, we employed a victimization regression approach. By doing this we can capture the dynamic data by decreasing spatiality of feature vector. As shown in Fig. 1, we turn the audio segments into spectrograms after fragmenting them and estimating their periodogram spectral estimate. After computing the estimations, we use the deep feature extraction approach to extract the Mel frequency cepstral constant (MFCC) and combined with the spectrogram of audio characteristics as shown in Fig. 2. The sequential truths for each pattern are divided into transitory audio chunks for pre-processing in a deep characteristic procedure. The proposed segments are scaled and subtracted. For the purpose of extracting significant characteristics, the produced segments are then further processed into a deep network. By transforming patterns into 0 and 1 s, features are ranked and normalized in accordance with the FDH set rules of patterns. One row of a vector is delivered as the output. For the specified design, pre-trained CNN has to be employed for residual learning techniques. To prepare classifier, we fine-tune the bounds on a ready-made ResNet architecture. We used a tiered technique in which the dataset-indicated layer of appropriate measures replaced the peripheral layer of the design. Newly added layer with two classes viz. discouraged and non-discouraged, in which the number of yields is equivalent to the numeral of class forecast. The eighth layer of the fully convolutional network in this model is replaced by the above layer and frozen layers will pre-made ResNet designs. For the purpose of extracting highlights from the sound dataset, we used ResNet-50
504
P. Meshram and R. K. Rambola
Fig. 1 Acoustic segments converted into spectrogram containing acoustic features
engineering. The ResNet-50 engineering consists of 55-phase, multi-layered convolutional networks. Three convolutional layers make up each convolutional square and personality block. Adaptation of the pre-prepared design to our sound dataset can only be possible, by using two hyper-boundaries for learning rate and number of completed ages. For learning rate, we retained the number of ages as a single pass through the dataset and set the specified incentive to 0.01 or 0.001 as its starting value.
Fig. 2 Feature extraction for audio samples
40 Prediction of Clinical Depression Through Acoustic Feature Sampling …
505
Fig. 3 Network architecture for resnet-50 classifier
Figure 3 describes the Network Architecture for ResNet-50 classifier for the Refinement of the hyper boundaries causes the classifiers to learn undeniable level of properties, by changing the model with the dataset which is impossible by preparing the model ordinarily. We change dataset utilizing stochastic angle plunge by resuming calculations. We empowered the information argumentation that recently processed the initiation reserve for preparing the convolutional neural organization. We locate the most elevated learning rate with an incentive with diminishing estimation of misfortune work. We balance out and freeze all the convolutional layers aside from the last layer. We train the last layer with a couple of ages data having recently determined actuation esteems. We likewise train the last layer having a few ages with information argumentation and cycle length boundaries characterized as one. Subsequent to finishing the preparation for the last layer, we thaw the frozen convolutional layers and set the pace of learning lower by three to ten for the first layer contrasted with the hindmost layer. We again figure the most noteworthy incentive for the learning rate where the estimation of misfortune work continues to diminish. We train the entire CNN organization with cycle of multiple boundaries set apart, until the organization gets over-fitted.
4 Result and Discussion Although, mass media offers a way to record individual’s current mental state, sentiments or thoughts may depend on one or more indirect reasons, thus this data can’t be utilized exclusively for depression diagnosis. Therefore, in order to identify depression from audio signals, we employed the AVEC-2019 dataset. Along with in-depth questionnaire answers, audio–video copies were also collected as dataset. The information that was taken out of the AVEC dataset was transcribed and annotated for changes in the typical verbal and nonverbal aspects. The AVEC-2019 dataset, which was further processed by the USCs Institute of Creative Technologies and distributed as part of the 2019 Audio/Visual Emotional Challenge, contained all the recorded audio sessions as well as accompanying data and relation metrics.
506
P. Meshram and R. K. Rambola
As shown in Table 1 Data has been separated into 11 separate files for training purposes. Each folder is separately trained on the model and overall outcomes are aggregated or mean for testing purposes. Only 10% of audio data has been utilized for training purposes which was randomly selected from the data of every patient. Also, the data types of data are reduced from 64 to 32-bit float. After training every model data frame is removed to evacuate the space so that enough space is available on RAM for the training purpose of each of the folders generated. Data pre-processing has been done for removing the rows with 50% or more of the values as zeroes as it is of no use. In the dataset, there were all around 190 recorded sessions ranging from a time period of 8 min ro 35 min having an average time period of 17 min. Hence it might be possible to get biased results due to an unbalanced dataset. A recurring finding was that due to individual differences in attributes or personalities, some aspects that an individual emphasizes may only apply to them. It was determined that participants with non-depressed class labels were revealed more commonly than those with depressed class labels. We have used sampling process for sampling and reordering the uneven dataset. To determine the connections between the various audio elements and how they may affect one another, a correlation matrix is created. The obtained correlation coefficient values range from 0 to 0.4. There are 189 sessions in the dataset. The whole computer-based interview session’s recorded audio files were included in the AVEC dataset. Using the COVAREP toolkit from GitHub, we retrieved the features from the audio dataset during the course of the session at 12-ms intervals. As shown in Table 2 to choose the features with the utmost influence on dataset, we performed feature selection to the extracted features. F0, NAQ, QOQ, H1H2, PSP, MDQ, peak slope, Rd, Rd Conf, MCEP 0–24, HMPDM 1-24, HMPDD 1-12, and Formats 1-3 were the characteristics that were chosen. The transcript file includes information such as total time, participant speaking time, and a format file. Pre-processing of the data has been done to exclude the rows with 50% or more of the values as zeroes as it is of no use. Further, BDI-II score column is added in each of the files for training of the model. This demonstrates the individuality of characteristics of different audio samples. In addition to this, the effect of each distinctive feature of the variable used to predict a score is also examined. We predicted the BDI-II scale for the tested subjects using a RF regressor with 40 estimators in comparison to the suggested model. Furthermore, an assumption that a person comes under depression scale value is depressed and else not depressed is taken into account when examining the model’s accuracy. A sad Table 1 Deep net feature on development validation set and test set on dataset for audio segments performance metric Partition
Methods
Segment type
RMSE score
MAE score
Training
Deep learning
Waveform
9.2589
7.6549
Testing
Spectrogram
10.0124
8.8523
Waveform
11.2365
9.3698
Spectrogram
8.4589
7.1278
40 Prediction of Clinical Depression Through Acoustic Feature Sampling … Table 2 Extracted features for audio dataset from AVEC-2019 dataset
507
Our methods
RMSE
C3D F0
10.68
8.46
9.85
8.60
C3D QOQ
10.86
8.76
RNN-C3D
10.67
8.69
C3D H1H2 & PSP (2 models)
10.45
8.34
RNN-C3D MCEp 0–24 (2 models)
10.91
8.23
RNN-C3D NAQ
MAE
person is represented as “1” in the binary classification column that has been added. By comparing categorization and calculation similarity using the participant labels that are readily available, we were able to gauge the model’s performance. We did a machine and manual investigation at the same time. If the question was represented as “yes” in the recorded gathering and the model also provided for better accuracy of same class. We observed it as a positive response. Any disagreement or controversy about the categorization results in a negative reaction. For measuring the model loss while predicting the BDI-II scale for a patient, we estimate the root mean square error and average mean error shown in ROC curve in Fig. 4. As discussed in Table 3 the random forest method, which had a mean average error of 8.4235 and a mean error of 8.5696, was discovered to be the best performing algorithm. Thus, we eventually reach the conclusion that we used the properties of handwriting, audio, and drawing samples to predict depression scale. Further we analyzed the performance of Random Forest algorithm in detail. The algorithm reduces the
Fig. 4 Receiver operating characteristic curve for depression detection using audio segments
508
P. Meshram and R. K. Rambola
issue of overfitting in decision trees improving model accuracy. It supports classification and prediction problem of depressive class and BDI-II scale. The algorithm supports for both continuous and categorical values in dataset. In case the dataset is incomplete or inconsistent then the dataset gets automated by adding true values as a pre-processing step using random forest algorithm. As the algorithm uses rulebased approach, it does not require normalization of dataset. Analysis of the Random Forest algorithm’s performance during classification and regression on handwriting, drawing, and voice samples revealed specificity, sensitivity, accuracy, and precision values of 86.13, 86.55, 88.97, and 87.46% (Table 6). We have also analyzed accuracy values for various scores and categories of depression such as for 0–13 as anxiety or stress, class the accuracy was found to be 87.56%; for 14 to 19 as mild depression/ anxiety class, the 88.74%; for 20–28 as moderate depression/anxiety class, the accuracy was 87.30% and 29–63 as severe depression/anxiety class, the accuracy was 89.45%. Using the AVEC-2019 dataset, we looked at and evaluated the individual studies for the majority of techniques and architectural performances. By analyzing the variations in the patterns shown based on the recorded sessions in the dataset, the challenge predicts the depression score using the BDI-II scale. We separated the recorded sessions into three equal groups for training, testing, and validation. For training and validation, we used supervised based learning methods; for testing, we used unsupervised. To select the most appropriate method with the greatest accuracy, many ML techniques were applied to the dataset. The model that performed the best on the dataset determined to be random forest. Using the training and testing of datasets with minimal model losses, the Random Forest method demonstrated high accuracy. In addition to summarizing the random forest method’s results, Figs. 5 and 6 show the model loss using the random forest algorithm.
Table 3 Random forest classifier performance metric Folds
Performance evaluation metric Specificity
Sensitivity
Accuracy
Precision
F-I
0.7954
0.7869
0.7832
0.7971
F-II
0.7756
0.7874
0.7730
0.7945
F-III
0.7631
0.7656
0.7834
0.7764
F-IV
0.7807
0.7723
0.7954
0.7821
Overlapping
NULL
NULL
NULL
NULL
Depressing
0.7754
0.7565
0.7625
0.7752
Not-depressing
0.7682
0.7609
0.7771
0.7721
Mean
0.7613
0.7655
0.7897
0.7746
40 Prediction of Clinical Depression Through Acoustic Feature Sampling …
509
Fig. 5 Accuracy testing by using random forest by training and testing
Fig. 6 Model loss using random forest
5 Conclusion In this study, we have created an framework model depression prediction and classification to scale it using BDI-II scale which psychiatrist used to follow using various audio samples. We trained model using audio samples using an AVEC-2019 dataset.
510
P. Meshram and R. K. Rambola
This dataset includes the audio recordings of sad people talking to an Artificial Intelligence assistance during their documented sessions. From the recorded sessions, we have taken audio samples. Dimensionality reduction technique like PCA is applied for feature scaling and extraction of deep features from the speech dataset after segmenting audio data into manageable chunks. Voice activation detection method is applied through MATLAB toolbox, we only collect the part of the recording that was the sad individual’s voice and reject rest of the part. After computing the estimations, we use the spectrogram’s deep feature extraction approach to extract the Mel Frequency Cepstral Constant (MFCC) and integrate audio features. Audio dataset’s features were extracted using the ResNet-50 architecture. The ResNet-50 architecture comprises of 50 layered, five-stage convolutional networks. With the use of several deep neural networks, we have collected characteristics from audio recordings. Following the extraction of the characteristics from the audio recordings, we use several machine learning algorithms to categorize the depressed patients and conduct a comparison of the outcomes. With an accuracy and precision of 88.23–87.46%, Based on the observations Random Forest ML algorithm was found to be best performing as compared to other algorithms. Thus, we may say that we used the properties of audio to predict depression scale. Furthermore, we use several machine learning methods to predict the BDI-II scale for depressed individuals and compare the outcomes. The random forest method had mean average error of 7.4235 and a mean error of 7.5696, was discovered to be the best performing algorithm. Thus, we can say that we were successful in predicting depression scale using audio sample attributes.
References 1. Garcia-Ceja E, Riegler M, Nordgreen T, Jakobsen P, Oedegaard KJ, Tørresen J (2018) Mental health monitoring with multimodal sensing and machine learning: a survey. Pervasive Mob Comput 51:1–26 2. Deshpande Y, Patel S, Lendhe M, Chavan M, Koshy R (2021) Emotion and depression detection from speech 3. Jiang H, Hu B, Liu Z, Wang G, Zhang L, Li X, Kang H (2018) Detecting depression using an ensemble logistic regression model based on multiple speech features. Hindawi 2018:6508319 4. Deshpande M (2019) Depression detection using speech recognition, BDI and image processing. Int J Res Appl Sci Eng Technol 7:136–137 5. Low LS (2011) Detection of clinical depression in adolescents’ speech during family interactions. IEEE Trans Biomed Eng 58:3 6. Deshpande Y, Patel S, Lendhe M, Chavan M, Koshy R (2021) Emotion and depression detection from speech 7. Horwitz R, Quatieri TF, Helfer BS, Yu B, Williamson JR, Mundt J (2013) On the relative importance of vocal source, system, and prosody in human depression. In: 2013 IEEE international conference on body sensor networks. IEEE, New York, pp 1–6 8. Pampouchidou A, Simantiraki O, Vazakopoulou CM et al (2017) Facial geometry and speech analysis for depression detection. IEEE, New York 9. Kumar R, Nagar SK, Shrivastava A (2020) Depression detection using stacked autoencoder from facial features and NLP. IJOSTHE. 7:1–7
40 Prediction of Clinical Depression Through Acoustic Feature Sampling …
511
10. Long H, Wu X, Guo Z, Liu J, Hu B (2017) Detecting depression in speech: a multi-classifier system with ensemble pruning on kappa-error diagram. J Health Med Inform. 8 11. Jiang H, Hu B, Liu Z, Wang G, Zhang L, Li X, Kang H (2018) Detecting depression using an ensemble logistic regression model based on multiple speech features. Comput Math Methods Med 6508319:9. https://doi.org/10.1155/2018/6508319 12. Pampouchidou A, Simantiraki O, Vazakopoulou CM, Chatzaki C, Pediaditis M, Maridaki A, Tsiknakis M (2017) Facial geometry and speech analysis for depression detection. IEEE, New York 13. Ashraf A, Gunawan TS, Rahman FDA, Kartiwi M, Ismail N (2020). A summarization of the visual depression databases for depression detection. In: 2020 6th International Conference on Wireless and Telematics (ICWT). IEEE, New York, pp 1–6 14. Lin C, Hu P, Su H, Li S, Mei J, Zhou J, Leung H (2020). Sensemood: depression detection on social media. In: Proceedings of the 2020 international conference on multimedia retrieval, pp 407–411 15. Zeng S, Niu J, Zhu J, Li X (2019) A study on depression detection using eye tracking 16. Mantri S, Agrawal P, Dorle SS, Patil D, Wadhai VM (2013) Clinical depression analysis using speech features. In: 2013 6th international conference on emerging trends in engineering and technology. IEEE, New York, pp 111–112 17. Liu Z, Hu B, Yan L, Wang T, Liu F, Li X, Kang H (2015) Detection of depression in speech. In: 2015 international conference on affective computing and intelligent interaction (ACII). IEEE, New York, pp 743–747 18. Pampouchidou A, Simantiraki O, Vazakopoulou CM, Chatzaki C, Pediaditis M, Maridaki A, Tsiknakis M (2017) Facial geometry and speech analysis for depression detection. In: 2017 39th annual international conference of the IEEE engineering in medicine and biology society (EMBC). IEEE, New York, pp 1433–1436 19. Shweta O (2017) Depression detection and analysis at the AAAI spring symposium on wellbeing AI: from machine learning to subjectivity oriented computing technical report. 20. Voice analysis aids fight against suicide depression. https://www.cio.com.au/article/648076/ voice-analysis-aid-fight-against-suicide-depression/ 21. Nield D (2016) Computer program tells when someone is depressed by speech patterns at sciencealert 22. Matheson R (2022) Neural network model to detect depression in conversations at MIT news office 23. Raut P et al (2018) Depression detection using BDI and speech recognition at IJRASET. 24. Kalra V, Sharma S, Chaudhary P (2021) Depression detection in cancer communities using affect analysis. In: Mobile radio communications and 5g networks: proceedings of MRCN 2020. Springer, Singapore, pp 649–657 25. Jiang H (2018) Detecting depression using an ensemble logistic regression model based on multiple speech features at Hindawi 26. Balbuena J, Samamé H, Almeyda S, Mendoza J, Pow-Sang JA (2021) Depression detection using audio-visual data and artificial intelligence: a systematic mapping study 27. Mande A (2019) Emotion detection using audio data samples. Int J Adv Res Comput Sci. 10:13–20. https://doi.org/10.26483/ijarcs.v10i6.6489. 28. Deshpande M, Rao V (2017) Depression detection using emotion artificial intelligence. 858– 862. 29. Pratiksha M, Radhakrishna R (2020) Depression prediction analysis using deep learning: a surveybioscience. Biotechnol Res Commun 13:0974–6455 30. Vázquez-Romero A, Gallardo-Antolín A (2020) Automatic detection of depression in speech using ensemble convolutional neural networks. Entropy 22(6):688 31. Zhu J, Wang Z, Gong T, Zeng S, Li X, Hu B, Zhang L (2020) An improved classification model for depression detection using EEG and eye tracking data. IEEE Trans Nanobiosci 19(3):527–537 32. Balbuena J, Samamé H, Almeyda S, Mendoza J, Pow-Sang JA (2021) Depression detection using audio-visual data and artificial intelligence: a systematic mapping study
512
P. Meshram and R. K. Rambola
33. Deshpande M, Rao V (2017) Depression detection using emotion artificial intelligence. In: 2017 international conference on intelligent sustainable systems (ICISS). IEEE, New York, pp 858– 862 34. Ozkanca Y, Göksu Öztürk M, Ekmekci MN, Atkins DC, Demiroglu C, Hosseini Ghomi R (2019) Depression screening from voice samples of patients affected by parkinson’s disease. Digital Biomarkers 3(2):72–82 35. Bhayani A, Meshram P, Desai B, Garg A, Jha S (2021) Scaling depression level through facial image processing and social media analysis. Commun Intell Syst Proceed ICCIS 2020:921–933 36. Pratiksha Meshram and Radha Krishna Rambola (MPSTME), Diagnosis of depression level using multimodal approaches using deep learning techniques with multiple selective features, International Journal Expert Systems, Wiley Publications (John Wiley & Sons).
Chapter 41
Optical Character Recognition and Text Line Recognition of Handwritten Documents: A Survey Prarthana Dutta and Naresh Babu Muppalaneni
1 Introduction Digitization is a way of preserving and storing documents, handwritten or printed, in a secure and advanced manner after the invention of digital devices. Storage in such electronic devices proves to be a safer and more secure way of preserving, accessing, and even transferring information and data. The conversion of imagebased handwritten content in a document or text into text-based (editable) content is called “Optical Character Recognition”, or OCR [36]. Initially, the recognition mechanism was challenging in learning handwriting patterns across various scripts, languages, and writers. But the latest surge in Deep Learning in addition to Machine Learning tools has led to the development of new methodologies that helped researchers overcome the complexities and difficulties in learning the various patterns in handwriting [9]. The OCR has been able to draw the attention of researchers even in the present paperless world. Sometimes people prefer the traditional mode of writing on paper with a pen or pencil. At the same time, prescriptions, invoices, postal stamps, bank forms, etc., also take up to writing on paper with a pen in one’s handwriting. Secondly, maintaining various historical documents, scriptures, etc., is accomplished through paperless mode in the digitized form [6]. An exciting area of research at the present age, character recognition, has been tried to explore initially by Grimsdale et al. in 1956 [13]. Several literature reviews and surveys can be obtained for OCR for handwritten and printed documents across various languages [5, 23, 35, 39], etc. Comparatively smaller research directions and surveys are being witnessed in literature for text line recognition in OCR for handwritten documents [4, 32]. So the authors have presented this survey work to bring before the researchers in this P. Dutta (B) · N. B. Muppalaneni National Institute of Technology Silchar, Assam, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_41
513
514
P. Dutta and N. B. Muppalaneni
field, a vivid view of the works and approaches carried out in the domain of offline handwritten text line recognition. This survey will discuss the pieces of literature on the first level of extracting and identifying the individual lines from the handwritten documents. Thus a fundamental motivation of the survey is to bring about the numerous roadmaps and experimental works conducted for the recognition of text lines, and the isolation or segmentation of text lines from offline handwritten documents. The remaining sections of the paper is structured as: Sect. 2, presents literature on different types of OCR along with the categories of OCR and its stages in a recognition system. Section 3 discusses the various methods and methodologies employed for individual line recognition in handwritten texts and presents before the reader about the various challenges in the domain. Finally, Sect. 4 concludes the survey by providing future research directions in the domain of text line recognition from handwritten documents.
2 Background Literature has witnessed diverse ways in which OCR studies are carried out. For this, the OCR is grouped into several categories. This is dependent on their utility and other criteria, such as the mode of acquisition or the mode of writing of the document. These will be demonstrated in the subsequent phases of the survey.
2.1 Different Types of Optical Character Recognition There are two ways of acquiring the document image whose content characters are to be recognized [28]. These are either in offline mode or in online mode. The writing mode in OCR can be categorized into either handwritten or printed. The visualization is shown in Fig. 1.
Optical Character Recognition (OCR)
Mode of Acquisition
Online
Mode of Writing
Offline
Handwritten
Fig. 1 Various categorization of optical character recognition
Printed
41 Optical Character Recognition and Text Line …
515
Mode of Acquisition: When the OCR system works on recognizing the characters based on how the image is acquired for processing, there can be two possibilities— Online or Offline [40]. 1. When the images are taken by a real-time device and are used for processing and recognition, it is the online approach to recognition. This mode considers the stroke of the pen movements on a digital surface as a function of time. The handwriting is thus recognized at the time of writing using the stylus pen through the touchpad. 2. On the other hand, when the input acquired for processing is through scanned images captured through a digital device, it is an offline mode of acquisition. Both these acquisition modes have different processing and recognition methods that are studied and experimented with in literature across several languages. Mode of Writing: Another category of OCR is the mode in which the text or document is written, which can be either handwritten or printed. When a handwritten text is processed, and its patterns are learned and recognized, it is called handwritten OCR, while processing and recognizing the printed text documents is called printed OCR. Compared to the printed style, character recognition in handwritten documents comes up as a challenging task due to several reasons, such as different writing styles of people, uneven spacing, overlapping, thickness, [18]. This study mainly focuses on the recognition of “handwritten” documents or texts where the input is acquired in the “offline” mode. Thus we narrow down our study to the objective of recognizing or extracting individual text lines from offline handwritten documents. The OCR for handwriting recognition consists of a number of stages. These stages incorporate Image Acquisition, Pre-processing, Feature Extraction, Recognition, and Post-processing [29]. These stages are discussed in Sect. 2.2.
2.2 Various Stages in OCR for Handwriting Recognition The recognition of any handwritten or printed document comprises of various processing stages or phases. Each phase has its own significance in the final recognition. These stages or phases are shown in Fig. 2. 1. Image Acquisition: The initial step toward character recognition is the acquisition of the images to be studied and analyzed. Images of handwritten documents or texts are acquired via various modes, such as scanners and photographs, by directly
Pre-Processing Segmentation Raw Handwritten Document
Binarization, Thinning, Skew Correwction, etc.
Pixel Counting, Histogram Based, Projection based, etc
Feature Extraction PCA, LDA, etc
Fig. 2 Various stages in OCR for handwriting recognition
Recognition CNN, LSTM, etc.
Post-Processing Syntax and Semantic Analyzer, Dictionary Based, etc
516
2.
3.
4.
5.
6.
P. Dutta and N. B. Muppalaneni
writing on the computer using a stylus or digital pens. The mode of acquisition of an image can be either online or offline explained in Sect. 2.1. Pre-processing: The original images acquired through various means may be loaded with unwanted information irrelevant in the recognition process. This unwanted information from the initially acquired images must be removed for better processing. Some of the pre-processing methods include noise removal, size normalization, binarization, skew correction, smoothing, etc. Segmentation: Separating the handwritten text into its constituent parts, such as lines, words, and characters, is called segmentation. It is a crucial stage in the correct recognition of handwritten documents, as it affects the recognition rate to a large extent. Feature Extraction: Extracting out the most relevant parts of the image for classification and recognition is called feature extraction. It is also an essential phase in the recognition task, as the correctly extracted features can reduce the chances of misclassification. Some feature extraction approaches comprise the Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), etc. Recognition: Once the relevant features are obtained, these are fed into a learning process to learn the feature patterns in the input data and train the model. The learning process can be supervised, unsupervised, semi-supervised, or reinforcement learning. The trained model is utilized to experiment on new test data and obtain the trained model’s recognition accuracy [8]. Post-processing: To check the correctness of the recognized character, sometimes post-processing may be applied to escalate the accuracy further. Some postprocessings are syntax and semantic analysis by employing specific dictionarybased systems.
2.3 Various Levels of Recognition for Handwriting Documents
Isolated Lines
Feature Extraction & Classificationl
Character-Level Segmentation
Word-Level Segmentation
Clean handwritten Document/Text Image
Text-Line Segmentation
Raw Handwritten Document/Text Image
Pre-Processing
Handwriting text recognition studies have been named and visualized in several ways by researchers [25]. Recognition of handwritten documents, texts, or characters is also performed at various levels [11].
Isolated Words
Fig. 3 Hierarchy of recognition from handwritten documents
Isolated Characters
Recognition
41 Optical Character Recognition and Text Line …
517
This can be better understood from the visualization diagram in Fig. 3. 1. Text line level: When a handwritten document or text is segmented into isolated or individual text lines, it attains the text line level of handwriting recognition. The survey presented in this paper surrounds this level of recognition. 2. Word level: When isolated or individual words are extracted from the individual lines, it attains the second level of recognition. 3. Character level: When the words are further isolated into individual characters, it attains the third level of recognition. After attaining this level, various character recognition methodologies are employed for recognizing the characters; hence, a character recognition mechanism can eventually be developed.
3 Methodologies Isolating individual lines from handwritten documents and using them for further processing is a concerning area. The lines extracted are further utilized for obtaining the words and characters. Hence, when text line recognition attains good recognition accuracy, then the final character recognition task also runs smoothly. When line recognition is not as desired, it causes difficulty in recognizing the individual words and characters contained in it. Most of the noted literature divides the text line recognition approaches into several categories [25]. These are better understood from the diagram in Fig. 4. These categories of approaches were proposed by Rani in [31]. Basically, these approaches are classified as Top-Down, Bottom-Up, and Hybrid approaches. Approaches undertopdown include—Projection Profile-based methods, Dynamic programming, and Level set methods. The bottom-up approaches include methods such as Hough transformation, Smearing-based, MST, Grouping, and Energy map methods.
Handwritten Text Line Recognition Approaches
Top Down Approach
Bottom Up Approach
Hybride Approach
Multiple Methods Projection Profile
Dynamic Programming
Hough Transformation Methods
Level Set Method
Smearing Based Methods
Minimum Spanning Tree
Grouping Methods
Energy Maps
Fig. 4 Various text line recognition approaches used for handwritten documents proposed in [31]
518
P. Dutta and N. B. Muppalaneni
Unconstrained handwritten Persian text was tried to segment by authors Alaei, Nagabhushan, and Pal using a piece-wise painting scheme [2]. The block of text was decomposed into stripes and painted by an average (gray) intensity value which is further processed to obtain the PPSL, the Piece-wise Potential Separating Lines which help obtain the segmented lines. Rohini et al. tried to segment touching, overlapping, and skewed lines of the IAM dataset using the top-down approach [33]. Initially, they classify the upper and lower text lines using run length. Later the short and skewed lines were segmented using the distance metric and connected components recursively. The advantages of the connected component method and the projection approach were explored in [1] by Hande et al. on the Ottoman dataset of printed and handwritten documents. They attained a good accuracy of around 92%. A Fringe map-based approach was utilized in [15] by Jetley et al. on printed and handwritten document text of ICFHR-DIBCO 2009, 2010, and 2011. They attained an accuracy of 77.77%, 85.52%, and 91.93% respectively on the three handwritten datasets. Segmenting the historical text lines of the Gurumukhi script was carried out in [16] using the Piece-wise projection profile method. This algorithm attained very good recognition accuracy but failed in lower and upper zone characters of the dataset. In the work of Messaoud et al. in [24] the historical documents of IAM (IAM-HistDB) and ICFHR2010 datasets are explored for segmentation. The work has experimented on three different text line segmentation approaches based on (i) vertical projection (which is a bottom-up approach), (ii) grouping connected components (a top-down approach), and (iii) detection of the nearest neighbor. Saabni et al. utilized the concept of energy maps to extract text line segments of the ICDAR2009 dataset and the private dataset [34]. Another noted work on the text line and character segmentation of palm leaf manuscript images was conducted by Kesiman et al. in [17]. A Binarization-free scheme was proposed. The recognition accuracy attained by this binarization-free approach is 78.57% which is a good accuracy for the discolored, poor contrast images. The Arabic handwritten documents can be said to have been exploring the most for recognition and segmentation purposes over the years. Among the various approaches, the pixel-based method was utilized by Neche et al. [26] to partition text lines and background pixels by adopting the RU-net architecture. Another noted work on segmenting Arabic handwritten text was also performed by Ali et al. [3]. Another efficient counting pixel-based segmentation task was performed by authors Malik et al. on the Urdu language [22]. This approach of segmentation was based on the detection of the headline and baseline. Hurdles faced in the text line recognition system such as overlapping, touching characters or skewed lines in the text are tried to overcome by using Generative Adversarial Networks in the work by Kundu et al. [19]. Chinese and ICDAR datasets were utilized for the experiment. the authors claim to attain recognition accuracy of 99.63 and 98.67% on these two datasets. Suleyman et al. used the projectionbased approach to segment text lines of the Uyghur text that is handwritten [38]. Peak locations of the probable text lines were identified using the projection profile method. After this, a thresholding mechanism leads to the detection of the text line of the handwritten document image. The experiment attained a text line recognition accuracy of 98.05%.
41 Optical Character Recognition and Text Line …
519
A new counting approach to determine the individual lines of text in handwritten documents was proposed by Li et al. [21]. The authors called this approach the Line Counting approach which counts the number of lines at each pixel location. The experiment was conducted on three publicly available datasets and attained satisfactory results. Aiming to deal with challenges with handwritten documents containing overlapping, slopes, and touching, authors Rajyagor and Rakholia used the horizontal profile method to isolate lines [30]. An accuracy of 87% has been obtained on the Gujarati text dataset. A sufficient amount of good recognition accuracy has also been achieved for subsequent word and character-level segmentation as well. Semantic segmentation has been used for segmenting handwritten and printed document images of various datasets in the work in [10]. An unsupervised method for text line segmentation was performed in 2021 by Barakat et al. on the Arabic language dataset in [7]. They generated random patches of the document image with some predefined height and width. These patches are used for training the model and obtaining the text lines. The various work focusing on the extraction of the isolated lines from handwritten text documents are in the following table (Table 1).
3.1 Challenges in Text Line Recognition in Handwritten Documents All of the methods in Table 1 are applicable for handwritten text documents. Though most of these methods are able to attain a good amount of recognition accuracy, there exist some limitations and drawbacks. These are mentioned below: 1. Presence of skewed text segments poses a hindrance to text line recognition. Some of the written text might contain more than one skew angles which becomes difficult to segment into lines. A sample dataset with different angular orientations is shown in Fig. 5. 2. When some character contents mostly in the upper baselines touch the lower baseline components, overlapping is encountered which becomes difficult to segment. This could be witnessed from a popular dataset named PHDIndic_11 [27] (Fig. 6). 3. Sometimes some texts might contain unusual gaping within the text which becomes a challenge in the text line recognition task. This could be witnessed in the GNHK dataset which contains handwritten text with unusual gaps and spaces [20] (Fig. 7). 4. Text line recognition also becomes difficult due to the presence of dots and other punctuation in the characters or words. 5. Handwriting styles and sizes of different persons vary a lot due to different styles of writing. This also makes the text line recognition task somewhat difficult compared to printed ones.
520
P. Dutta and N. B. Muppalaneni
Table 1 Summary of the various text line recognition of handwritten text Author
Language
Methodology
Accuracy (%)
Remark
Alaei et al. [2]
Persian text
Piece-wise painting algorithm
92.35
Difficulty in determining the belongingness of some overlapping characters
Rohini et al. [33]
IAM database along with other writers
Run-length, distance metrics, and CC
91.92
Feedback from word-level recognition can be utilized
Adiguzel et al. [1]
Ottoman datasets
Hybrid (connected component and projection based)
92
NA
Jetley et al. [15]
ICFHR-DIBCO 2009, 2010 and 2011 databases (HW)
Fringe map-based
≈ 85
Binarization method can be improvised
Jindat et al. [16]
Gurumukhi Script
Piece-wise projection ≈ 100 profile
Messaoud et al. [24]
IAM-HistDB, ICFHR Multi-level 2010 segmentation framework
97
New methods could be added during selection
Saabni et al. [34]
Arabic, English and Spanish
Energy Map
≈ 97
Sometimes fails to determine dots and diacritics
Kesiman et al. [17]
Corpus of palm leaf manuscript
Binarization-free scheme
78.57
Performed well on discolored parts, poor contrast, etc
Souhar et al. [37]
Arabic text
Watershed method
89.4
Linearity constraint, noise, touching lines
Vo et al. [41]
ICDAR 2013
Line adjacency graph 97.73% and 98.68 (Horizontal and vertical)
Touching lines needs improvement
Neche et al. [26]
KHATT(Arabic)
BLSTM-CTC
96.7
Relevant post-processings are to be applied
Ali et al. [3]
Arabic (AHDB and IFN/ENIT)
Hough transform
99.1 and 97.4
NA
Mallik et al. [22]
Urdu
Modified headline and baseline detection mechanism
98.1
Dots and Diacritics are to be handled efficiently
Kundu et al. [19]
Handwritten Chinese test dataset (HIT-MW), ICDAR 2013
GAN
99.96 and 98.66
The developed model is a robust one
Suleyman et al. [38]
Uyghur
Projection-based 98.05 adaptive thresholding mechanism
Lower/upper zone characters may lead to incorrect segmentation
Unable to segment skewed lines and incorrect peak extraction
(continued)
41 Optical Character Recognition and Text Line …
521
Table 1 (continued) Author and Year
Language
Methodology
Accuracy (%)
Remark
Li et al. [21]
ICDAR 2013-HSC, HIT-MW, VML-AHTE
Line counting formulation
95.9, 98.5 and 93.8
Architecture and hyperparameters are not fully optimized
Rajyagor et al. [30]
Gujarati
Horizontal projection 87
Dutta et al. [10]
Alireza, IAM, etc
CNN (encoder-decoder based)
≈ 95
Methodology can be fine-tuned
Barakat et al. [7]
Arabic (VML-AHTE), ICDAR 2017 and ICFHR 2010
Unsupervised embedding
93.95, 99.33, 97.72
Wide space between words and uneven heights
Works well on Gujarati handwritten text with diacritics
Fig. 5 a Dataset with different degrees of orientations [14], b sample of skewed Bangla handwritten document of the PHDIndic_11 dataset, c sample of a skewed Arabic handwritten document [12]
Fig. 6 Sample datasets of a Devanagari, b Telugu, c Gurumukhi, and d Gujarati language taken from the PHDIndic_11 dataset that is difficult to segment into isolated lines [27]
Fig. 7 a–d Samples of GNHK dataset [20] that consist of unusual gaps and spaces within the text that also creates a challenge in segmenting the exact text lines
522
P. Dutta and N. B. Muppalaneni
4 Conclusion and Future Directions Handwriting recognition has seen a surge in the modern age with the application of various learning models. The advent of text line identification or segmentation has an impact on word and character segmentation since several kinds of literature take the results of the text lines and perform word and character extractions. Though many literature surveys and reviews are available that are relevant for OCR and handwriting recognition [22], no review or surveys have been witnessed for text line recognition as per the knowledge of the authors. This paper contributes a critical overview of the Optical Character Recognition system, focusing mainly on the offline handwritten documents of various languages. It discusses the various types of OCR and the approaches to character recognition from handwritten documents used in literature. The study results critically describe the various text line recognition approaches carried out in literature over the years. Some of the future scopes of research in this context are: (i) Firstly, though these methodologies can attain good recognition accuracy, certain limitations, such as touching lines, overlapping lines, are sometimes not efficiently segmented. (ii) Secondly, numerous works have also been employed for English, Roman, Devanagari, Chinese, Latin, etc. But very few of them focused on native Indian languages such as Assamese, Telugu, Kannada, Tamil, Gujarati. So researchers can also work on text line segmentation for these Indic languages. Usually, these regional languages have character contents in the upper and lower baselines which can be challenging for text line recognition. So overlapping within two adjacent text lines is of very high chance in these languages and is a bold and demanding task that needs more attention and research in the future.
References 1. Adıgüzel H, Sahin ¸ E, Duygulu P (2012) A hybrid approach for line segmentation in handwritten documents. In: 2012 International conference on frontiers in handwriting recognition, pp 503– 508 2. Alaei A, Nagabhushan P, Pal U (2011) Piece-wise painting technique for line segmentation of unconstrained handwritten text: a specific study with Persian text documents. Pattern Anal Appl 14:381–394 3. Ali AAA, Suresha M (2019) Efficient algorithms for text lines and words segmentation for recognition of Arabic handwritten script. In: Emerging research in computing, information, communication and applications: ERCICA 2018, vol 1. Springer, pp 387–401 4. Ali AAA, Suresha M (2020) Survey on segmentation and recognition of handwritten Arabic script. SN Comput Sci 1(4):192 5. Balaha HM, Ali HA, Badawy M (2021) Automatic recognition of handwritten Arabic characters: a comprehensive review. Neural Comput Appl 33:3011–3034 6. Balakrishnan N, Reddy R, Ganapathiraju M, Ambati V (2006) Digital library of India: a testbed for Indian language research. TCDL Bulletin 3(1):No–pp 7. Barakat BK, Droby A, Alaasam R, Madi B, Rabaev I, Shammes R, El-Sana J (2021) Unsupervised deep learning for text line segmentation. In: 2020 25th international conference on pattern recognition (ICPR). IEEE, pp 2304–2311
41 Optical Character Recognition and Text Line …
523
8. Chhajro M, Khan H, Khan F, Kumar K, Wagan A, Solangi S (2020) Handwritten Urdu character recognition via images using different machine learning and deep learning techniques. Indian J Sci Technol 13(17):1746–1754 9. Dargan S, Kumar M, Ayyagari MR, Kumar G (2020) A survey of deep learning and its applications: a new paradigm to machine learning. Arch Comput Meth Eng 27(4):1071–1092 10. Dutta A, Garai A, Biswas S, Das AK (2021) Segmentation of text lines using multi-scale CNN from warped printed and handwritten document images. Int J Doc Anal Recogn (IJDAR) 24(4):299–313 11. Dutta P, Muppalaneni NB (2022) A survey on image segmentation for handwriting recognition. In: Third international conference on image processing and capsule networks: ICIPCN 2022. Springer, pp 491–506 12. Farooq F, Govindaraju V, Perrone M (2005) Pre-processing methods for handwritten arabic documents. In: Eighth international conference on document analysis and recognition (ICDAR’05). IEEE, pp 267–271 13. Grimsdale R, Sumner F, Tunis C, Kilburn T (1959) A system for the automatic recognition of patterns. Proc IEE-Part B Radio Electron Eng 106(26):210–221 14. Hiremath P, Pujari JD, Shivashankar S, Mouneswara V (2010) Script identification in a handwritten document image using texture features. In: 2010 IEEE 2nd international advance computing conference (IACC). IEEE, pp 110–114 15. Jetley S, Belhe S, Koppula VK, Negi A (2012) Two-stage hybrid binarization around fringe map based text line segmentation for document images. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012). IEEE, pp 343–346 16. Jindal S, Lehal GS (2012) Line segmentation of handwritten Gurmukhi manuscripts. In: Proceeding of the workshop on document analysis and recognition, pp 74–78 17. Kesiman MWA, Burie JC, Ogier JM (2016) A new scheme for text line and character segmentation from gray scale images of palm leaf manuscript. In: 2016 15th international conference on frontiers in handwriting recognition (ICFHR). IEEE, pp 325–330 18. Khobragade RN, Koli NA, Lanjewar VT (2020) Challenges in recognition of online and off-line compound handwritten characters: a review. Smart Trends Comput Commun Proc SmartCom 2019:375–383 19. Kundu S, Paul S, Bera SK, Abraham A, Sarkar R (2020) Text-line extraction from handwritten document images using gan. Expert Syst Appl 140:112916 20. Lee AW, Chung J, Lee M (2021) Gnhk: A dataset for English handwriting in the wild. In: document analysis and recognition–ICDAR 2021: 16th international conference, Lausanne, Switzerland, Sept 5–10, 2021, Proceedings, Part IV 16. Springer, pp 399–412 21. Li D, Wu Y, Zhou Y (2021) Linecounter: Learning handwritten text line segmentation by counting. In: 2021 IEEE international conference on image processing (ICIP). IEEE, pp 929– 933 22. Malik SA, Maqsood M, Aadil F, Khan MF (2020) An efficient segmentation technique for Urdu optical character recognizer (OCR). In: Advances in information and communication: proceedings of the 2019 future of information and communication conference (FICC), vol 2. Springer, pp 131–141 23. Memon J, Sami M, Khan RA, Uddin M (2020) Handwritten optical character recognition (OCR): a comprehensive systematic literature review (SLR). IEEE Access 8:142642–142668 24. Messaoud IB, Amiri H, El Abed H, Märgner V (2012) A multilevel text-line segmentation framework for handwritten historical documents. In: 2012 international conference on frontiers in handwriting recognition. IEEE, pp 515–520 25. Narang SR, Jindal MK, Kumar M (2020) Ancient text recognition: a review. Artifi Intell Rev 53:5517–5558 26. Neche C, Belaid A, Kacem-Echi A (2019) Arabic handwritten documents segmentation into text-lines and words using deep learning. In: 2019 international conference on document analysis and recognition workshops (ICDARW), vol 6. IEEE, pp 19–24 27. Obaidullah SM, Halder C, Santosh K, Das N, Roy K (2018) Phdindic_11: page-level handwritten document image dataset of 11 official Indic scripts for script identification. Multimedia Tools Appl 77:1643–1678
524
P. Dutta and N. B. Muppalaneni
28. Plamondon R, Srihari SN (2000) Online and off-line handwriting recognition: a comprehensive survey. IEEE Trans Pattern Anal Mach Intell 22(1):63–84 29. Purohit A, Chauhan SS (2016) A literature survey on handwritten character recognition. Int J Comput Sci Inf Technol (IJCSIT) 7(1):1–5 30. Rajyagor B, Rakholia R (2021) Tri-level handwritten text segmentation techniques for Gujarati language. Indian J Sci Technol 14(7):618–627 31. Rani S (2015) Recognition of Gurmukhi handwritten manuscripts. Ph.D. thesis, Punjabi University, Patiala 32. Razak Z, Zulkiflee K, Idris MYI, Tamil EM, Noor MNM, Salleh R, Yaakob M, Yusof ZM, Yaacob M (2008) Off-line handwriting text line segmentation: a review. Int J Comput Sci Netw Secur 8(7):12–20 33. Rohini S, RS UD, Mohanavel S (2012) Segmentation of touching, overlapping, skewed and short handwritten text lines. Int J Comput Appl 49(19) 34. Saabni R, Asi A, El-Sana J (2014) Text line extraction for historical document images. Pattern Recognit Lett 35:23–33 35. Singh S, Garg NK (2021) Review of optical Devanagari character recognition techniques. In: Intelligent system design: proceedings of intelligent system design: India 2019. Springer, pp 97–106 36. Singh S (2013) Optical character recognition techniques: a survey. J Emerg Trends Comput Inf Sci 4(6) 37. Souhar A, Boulid Y. Ameur E, Ouagague MM (2017) Watershed transform for text lines extraction on binary Arabic handwriten documents. In: Proceedings of the 2nd international conference on big data, cloud and applications, pp 1–6 38. Suleyman E, Hamdulla A, Tuerxun P, Moydin K (2021) An adaptive threshold algorithm for offline Uyghur handwritten text line segmentation. Wireless Netw 27:3483–3495 39. Tamhankar PA, Masalkar KD et al (2020) A novel approach for character segmentation of offline handwritten Marathi documents written in Modi script. Procedia Comput Sci 171:179– 187 40. Vashist PC, Pandey A, Tripathi A (2020) A comparative study of handwriting recognition techniques. In: 2020 international conference on computation, automation and knowledge management (ICCAKM). IEEE, pp 456–461 41. Vo QN, Kim SH, Yang HJ, Lee GS (2018) Text line segmentation using a fully convolutional network in handwritten document images. IET Image Process 12(3):438–446
Chapter 42
Neural Style Preserving Visual Dubbing Masooda Modak, Anirudh Venugopal, Karthik Iyer, and Jairaj Mahadev
1 Introduction Dubbing is a technique for translating video content from one language to another. State-of-the-art visual dubbing techniques fail to directly copy facial expressions from source to target actors and also do not consider identity-specific features such as a unique type of smile. When a video source needs to be dubbed in a regional language having very demanding facial expressions, these expressions have to be duplicated on the actor’s face to capture the essence of the language. Videos are becoming one of the most consumed sources of information. When a person obtains information with audio as well as visual stimulus, that information is retained more prominently in his/her mind. But most video content is made in a single language. To allow the information to spread without having a language barrier, dubbing is used. Dubbing replaces the audio of the video content with another audio of the required language. Visual dubbing refers to syncing of the visual elements with the new audio. This method helps to maintain the immersion of the video. Visual dubbing not only will help in eliminating the language barrier, but also maintain complete immersion of the video content. Unlike tradition voice dubbing, visual dubbing will help to retain the M. Modak · A. Venugopal (B) · K. Iyer · J. Mahadev Department of Computer Engineering, University of Mumbai, Mumbai, India e-mail: [email protected] M. Modak e-mail: [email protected] K. Iyer e-mail: [email protected] J. Mahadev e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_42
525
526
M. Modak et al.
facial gestures and expressions required for expressing certain thoughts in various languages.
2 Literature Review In paper [1] the objective is to provide a web-based tool for identifying faces in a real-time environment, such as Online Classes. Here, concepts like Local Binary Pattern Histogram (LBPH), Convolutional Neural Network (CNN), Haar cascade classifier with boosting are being used. This model only recognizes faces of a single person at a time. If multiple faces can be recognized at the same time, then there would be much more functionality of the model. In CNN, the accuracy is 95%, while in LBPH, the accuracy is 78%. Proposed method is economical and can be deployed on commodity laptops. The purpose of paper [2] was to provide a system for video face replacement that requires only two videos of a source actor and a target actor using only a single digital camera. Here, Modified Poisson Blending (MPB) method is used. The proposed technique reduces bleeding problems of Poisson in images and videos. It gives good results compared with other image cloning techniques in face replacement. A blending method for the face replacement system based on a weight map that is adaptive to pose is provided in the paper [3]. Pose adaptive weight map is the name of the method implemented in this research. It gives a more natural effect than blending the target face with the reference face by using region fixed linear blending methods. It can be difficult when there are many pose changes. An extensive empirical evaluation of CNN’s used in [4] on large-scale video classification using a new dataset of 1 million YouTube videos belonging to 487 classes displays significant performance improvements compared to strong feature-based baselines (55.3–63.9%), but only a surprisingly modest improvement compared to single-frame models (59.3–60.9%). The paper [5] proposes a method to convert a single image into a short video by generating facial expressions using generative deep neural networks. The proposed system implements a CNN model which is a user-controllable approach so as to generate video clips of various lengths and different target expressions from a single face image. Experiments and user studies verify the effectiveness of this approach. The method proposed in paper [6] to transfer facial expressions from one image to another, one being the donor while the other being the recipient, obtains the required results with the implementation of styleGAN. It is a simple and effective expression transfer method which is also highly scalable and automatic. The drawback of this approach is that it does not include head position re-enactment for movement of the head. DeepFake framework is used in [7] for face swapping providing easy to use and necessary tools. It also provides a complete DeepFake pipeline. The methods and concepts used in this research are Heatmap-based facial landmark algorithm, 2DFAN, PRNet. This research helps the community from freeing people from laborious,
42 Neural Style Preserving Visual Dubbing
527
complicated data processing, trivial detailed work in training, and conversion. It also makes workflow easier. The aim of paper [8] is to make a model for visual dubbing while preserving the signature style of the target actor including facial emotions and mouth expressions. This paper states the use of Generative Adversarial Network (GAN), RNN, and LSTM concepts. This model may fail for extreme illumination conditions or head poses. In these cases, the facial expressions cannot be robustly recovered and thus the neural face renderer cannot be reliably trained. The authors of paper [9] built a model for audiovisual translation and dubbing from source language to target language. Their research includes concepts such as Convolutional Neural Networks (CNNs), K-means clustering. In the proposed system, the model translates both the audio and visual contents of a target video, with modification in the lip movements of the speaker to match the translated audio, creating a seamless viewing experience in the target language. Generation of a full talking-head video with emotional facial expressions and rhythmic head movements from the time-aligned text representation has been implemented in paper [10]. This approach is restricted to producing speakers uttering in English or Chinese. The model tackles with dynamic background and complex upper torso movements. The paper [11] proposes a method for generating a realistic video output of a target person from an audio sequence of a source person or digital assistant. However, the method fails in scenarios where multiple voices are present in the audio stream and when the target actor’s talking style is not constant. Similar to the approach of [10], the objective of the paper [12] is to generate videos of target speakers with facial expressions from audio input. This model has bypassed the approach to generate unconscious head movements/expressions. In the scenario of multiple voices in the audio stream, this method fails. If the target actor does not have a constant talking style, the model would not work properly.
3 Proposed System In current systems of dubbing videos, the dubbing artist translates the original video into a different language. This causes a lack of synchronization between the video and the audio of the input video. The lip movements and facial expressions of the target video do not sync with the dubbed audio from the dubbing artist. This problem is mainly seen in dubbed educational videos and movies in regional languages where the actor’s lip movements and facial expressions do not match with the audio and dialogues which are in a different language. To have a better user experience while watching a dubbed video, it would be beneficial to have the facial expressions and lip movements match the audio. Keeping this in mind, we aim to create a system that gives us a coherently put-together dubbed video that captures not only the audio of the required regional language but also the expressions that need to be portrayed to preserve the immersion of the video. The
528
M. Modak et al.
Fig. 1 Overview of our style-preserving visual dubbing approach
input video when dubbed into a different language would output a dubbed video with lip movements and facial expressions that are obtained from the dubbing actor which are superimposed on the actors’ face of the original video. Overview of our approach is discussed in Fig. 1. The main modules as per the given framework are:
3.1 Image Preprocessing In this step, each frame of the target video is taken and processed. Target video frame example is shown in Fig. 2. In preprocessing, we resize the image to further extract landmarks from it. Further, we take this resized frame and pass it to a program to extract the landmarks of the face of the person present in the video. These landmarks cover features such as eyes, lips, jaw, eyebrows, position of faces as shown in Fig. 3.
42 Neural Style Preserving Visual Dubbing
529
Fig. 2 Single frame of target video after processing
Fig. 3 Landmarks of the target actor
Once these landmarks are extracted for each frame, we can move on to the next step.
3.2 Dataset Creation In this step, we prepare the dataset as per the requirement of the model. The model requires the original frame and the landmark image of that frame in the same image side by side. We take the original frame and the landmark image from the previous step and combine it together displaying them side by side. For training, the number of frames is taken as parameters while processing to obtain the required number of data records. The combined dataset now will contain the new image required by the model. The combined image is shown in Fig. 4.
530
M. Modak et al.
Fig. 4 Combined image example
3.3 GAN Model Training Generative Adversarial Networks, or GANs for short, help us approach generative modeling through deep learning methods such as Convolutional Neural Networks. It is an unsupervised learning method which involves learning of regularities in the input data. It can produce outputs in a way that the output would seem plausibly taken from the original dataset [8]. In GAN, we have two modules, generator and discriminator. The generator forms new outputs that are passed to the discriminator. The discriminator then distinguishes whether the output of the generator is different from the real data. The discriminator then penalizes the generator for producing implausible results.
4 GAN Model Testing Once all the above steps are carried out successfully, we can test the model by providing an input. The input to be provided to the model is the landmark image of the dubbing artist for the required frame. The model is given the input and we check whether the obtained output frame is up to our expectation. If the output image does not match our expectations, we can tweak the number of training numbers to increase the number of records in the dataset or also focus on preprocessing the image frames more suitably.
5 Implementation The first step is to extract the target video frames. The image shown below is a sample frame of the target video (Fig. 5).
42 Neural Style Preserving Visual Dubbing
531
Fig. 5 Single frame extracted from the target video
For each target frame extracted, we process the image to find the landmarks of the detected face in that frame. Figure 6 gives an example of the extracted landmark of the frame shown in Fig. 5. For training the GAN model, we require the input image to be a combination of the original frame and the frame having the landmark of the face. We convert each frame and its landmark into a combined image to get the combined dataset. The combined dataset having 500 records in total is shown in Fig. 7. In the next step, we extract each individual frame from the dubber’s video. We get a total of 500 frames from the dubber’s video as well. Then, we process each frame from the dubber’s video and get the landmark of the dubber’s face for each frame. Figure 8 shown below shows a sample of a dubber frame and the corresponding image of the dubber’s face landmarks.
Fig. 6 Landmarks of the single frame of the target video extracted in the previous step
532
M. Modak et al.
Fig. 7 Single image of the combined dataset
Fig. 8 Single frame of the dubber actor combined with the landmarks of the respective extracted frame
A total of 500 frames is extracted from the dubber’s video and then combined with its respective landmarks to create a dataset to be used for creating the desired output. Once the training of the GAN model is done with the help of the combined dataset, we provide input of the dubber landmarks to get the output. The output of the frames shown above is shown below in Fig. 9. Similarly, 500 frames are generated from landmarks of the dubber and they are combined together to form continuous video.
6 Results The final generated output contains 500 frames which is then combined together to form the final output video. Each frame is generated from frames of the target video and facial expressions, i.e., landmarks of the dubber as shown in Fig. 10.
42 Neural Style Preserving Visual Dubbing
533
Fig. 9 Output frames
The trained pix2pix model has around 200 epochs. From Fig. 11, we get see that the model converges after about 100 epochs. The results for 100, 150, and 200 epochs are similar. According to the MAE loss on the validation set, the model over fits after 100 epochs, while that is not the case and can be clearly seen in Fig. 11. For a pix2pix model, the total loss of the generator is a weighted sum of the two components—(i) pixel-wise MAE and (ii) the discriminator’s loss with respect to the generated image. Figure 12 gives us an idea about the total loss generated by the pix2pix model. In Fig. 13, we demonstrate comparison between the expressions of target actor in first row and the expressions of the dubbing artist in the second row. The third row shows output frame generated by our model. The model generates output frames according to the target actor while having the facial expressions of the dubbing artist. In the similar way, multiple generated frames are then combined together to form the final continuous video.
534
M. Modak et al.
Fig. 10 Comparison of a single frame of dubber’s facial expressions and its final generated output frame
Fig. 11 Generated MAE loss of pix2pix model
42 Neural Style Preserving Visual Dubbing
535
Fig. 12 Generated total loss of pix2pix model
Fig. 13 Comparison between the expressions of target actor, the dubbing artist, and the output frame generated by our model
536
M. Modak et al.
7 Conclusion and Future Scope We have successfully obtained satisfactory results that match the input provided. The output video obtained can be used as a better-dubbed substitute for the original video. It is also better than a dubbed video with only audio replacement. We obtain better synchronization between audio and facial expressions, thus, obtaining a better immersive experience. The system that we have proposed can be useful for dubbing online video lectures for educational purposes or a visual guide to be followed. In the future, when this system is more robust, it can also be used for the visual dubbing of movies and other. For future scopes, we can obtain better results using 3D morphological masks for better clarity and synchronization.
References 1. Archana MCP, Nitish CK, Harikumar S (2022) Real time face detection and optimal face mapping for online classes. J Phys Conf Ser 2161(1). IOP Publishing 2. Afifi M, Hussain KF, Ibrahim HM, Omar NM (2014) Video face replacement system using a modified Poisson blending technique. In: 2014 International symposium on intelligent signal processing and communication systems (ISPACS) 3. Zhang X, Park J-I (2018) adaptive face blending for face replacement system. In: 2018 international conference on network infrastructure and digital content (IC-NIDC) 4. Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: 2014 IEEE conference on computer vision and pattern recognition 5. Fan L et al (2019) Controllable image-to-video translation: a case study on facial expression generation. Proc AAAI Conf Artif Intell 33(01) 6. Fan L, Huang W, Gan C, Huang J, Gong B (2019) Controllable image-to-video translation: a case study on facial expression generation. In: Proceedings of the AAAI conference on artificial intelligence 7. Perov I et al, DeepFaceLab: integrated, flexible and extensible face-swapping framework 8. Kim H, Elgharib M, Zollhöfer M, Seidel H-P, Beeler T, Richardt C, Theobalt C (2019) Neural style-preserving visual dubbing. ACM Trans Graphics (TOG) 38(6):1–13 9. Yang Y, Shillingford B, Assael Y, Wang M, Liu W, Chen Y, de Freitas N (2020) Large-scale multilingual audio-visual dubbing 10. Li L et al (2021) Write-a-speaker: text-based emotional and rhythmic talking-head generation. Proc AAAI Conf Artif Intell 35(3) 11. Thies J et al (2020) Neural voice puppetry: audio-driven facial reenactment. In: European conference on computer vision. Springer, Cham 12. Chen L, Maddox RK, Duan Z, Xu C (2019) Hierarchical cross-modal talking face generation with dynamic pixel-wise loss. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Chapter 43
Advanced Pointer-Generator Networks Based Text Generation Narayana Darapaneni, Anwesh Reddy Paduri, B. G. Sudha, Adithya Kashyap, Roopak Mayya, C. S. Thejas, K. S. Nagullas, Ashwini Kulkarni, and Ullas Dani
1 Introduction The growth of technology, particularly IoT, has led to an increase in data collection and transmission. As per International Data Corporation, 180 zettabytes of digital data will be transmitted globally by 2025 [1]. Meetings are vital but time-consuming, and with the outbreak of COVID-19, the use of videoconferencing has led to longer and more frequent meetings, causing fatigue and less time for information processing among participants. Automatic meeting summaries are becoming increasingly popular as they can greatly improve productivity and efficiency. By using text summarization models to automatically transcribe meetings and extract important information [2], we aim to streamline the process of obtaining meeting minutes. The goal of these models is to extract relevant information from video, audio recordings, or meeting transcripts, which includes statements made by various participants on various topics. By at least partially replacing manual note-taking, these models can help both meeting participants and non-participants review crucial material in less time and potentially save a significant amount of time [3]. Methods for extracting data from unstructured text data for summarization can be divided into two categories: extractive and abstractive. Extractive methods only include the most important passages from the text [4], while abstractive methods N. Darapaneni Northwestern University/Great Learning, Evanston, IL, USA A. R. Paduri (B) · B. G. Sudha Great Learning, Bangaluru, India e-mail: [email protected] A. Kashyap · R. Mayya · C. S. Thejas · K. S. Nagullas · A. Kulkarni · U. Dani PES University, Bangaluru, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8_43
537
538
N. Darapaneni et al.
use advanced NLP techniques to understand the meaning of the text and create an insightful summary. However, extractive methods may not capture all key information of a meeting as discussions are multi-faceted, while abstractive methods require a large number of parameters and data, making them harder to train. In this paper, we present a methodology for building a summarization corpus using an encoder–decoder-based pointer-generator network model with attention.
2 Related Work We analyzed various papers and observed various proposed architectures for automatic text summarization, including pointer-generator networks, convolutional neural networks, and attentive seq2seq models. These architectures aim to improve the performance of text summarization by incorporating mechanisms such as a coverage mechanism, topic-level attention, and convolutional layers to capture longrange dependencies between words in a document. The models are trained on various datasets, such as the ICSI meeting corpus and the X-SUM dataset, and are shown to outperform competitive baselines in terms of performance metrics such as ROUGE and METEOR. Additionally, the text discusses the limitations of ROUGE as an evaluation metric for text summarization and proposes using word embeddings, specifically word2vec, as an alternative method for evaluating the semantic similarity of words used in summaries [5]. The correlation of this method with human assessments is found to be very good, suggesting that it can improve the evaluation of ROUGE and expand its applicability to abstractive summarization [6]. As part of look out for other architecture, we found one of architectures proposed called HMNet for automatic text summarization of meeting transcripts. The hierarchical structure of HMNet reduces the burden of computation and captures both token-level understanding within each turn and turn-level understanding across the whole meeting. The model also incorporates the role of each speaker to encode different semantic styles and standpoints among participants. Results from experiments show that HMNet achieves state-of-the-art performance in both automatic metrics and human evaluation [7]. However, the system sometimes summarizes salient information from parts of the meeting different from the reference summaries and creates a summary at a high-level not in a detailed way [8]. Specifically, the limitations of the ROUGE evaluation metric for text summarization are that it is biased toward surface lexical similarities and lacks provisions for the readability or fluency of generated summaries. As an alternative, the text suggests using word embeddings, specifically word2vec, to compute the semantic similarity of words used in summaries. The correlation of this method with human assessments is found to be very good, as measured by the Spearman and Kendall rank coefficient, using the TAC AESOP dataset. This suggests that using word embeddings can improve the evaluation of ROUGE and expand its applicability to abstractive summarization.
43 Advanced Pointer-Generator Networks Based Text Generation
539
3 Approach 3.1 Flow of the Model We propose a method to generate meeting minutes by utilizing the speech recognition capability of the computer software. The process begins by converting an audio recording of the meeting into a text file using the Python library, speech recognition. Specifically, we utilize the Google Speech Recognition engine to transcribe the audio file into text. For small to medium-sized audio files, this method is sufficient. However, for larger audio files, the preprocessing [9] is more complex. The audio file must first be segmented into smaller chunks using the pydub library’s split on silence method. This method analyzes the audio file and segments it based on the specified silence threshold. Once the audio file has been segmented, each chunk is then transcribed into text using the Google Speech Recognition engine. Finally, the transcribed text chunks are concatenated and stored in a folder for further processing by the text summarization model. Additionally, many modern meeting tools have an in-built transcription feature, which can also be used as an input for the text summarization model. The input file is processed by the EDA segment of the model, which includes converting it to the required format. This process begins with the removal of stop words, such as greetings, ’the’, ’then’, and other common filler words typically used when people think or formulate sentences [10]. Next step can be looked into as performing stemming or lemmatization. Stemming the data may be usual depending on the use case of the model and generally has an impact on accuracy as the words may end up with sounding different meanings altogether. We mainly focused on lemmatization as this proved to retain more accuracy in meaning of sentence in comparison to stemming. Some of other activities like lowercase conversation, spaces, etc. were also handled in this as part of preprocessing. The model which is the core for the text summarization will handle preprocessed data. The model will run data through it producing the summarized data as outcome. As for which model is used for this and other details will be discussed in further sections in detail. The newly generated data summary may not be in a human-readable format, such as meeting minutes. Therefore, post-processing is necessary to convert it into a meeting summary format. The model flow or the architecture is shown in Fig. 1.
3.2 Summarization Datasets The “Amazon Fine Food Reviews” dataset (https://snap.stanford.edu/data/web-Fin eFoods.html) consists of reviews of fine foods from Amazon. The data span a period of more than 10 years, including all 500,000 reviews up to October 2012. Reviews include product and user information, ratings, and a plain text review. It also includes
540
N. Darapaneni et al.
Fig. 1 Flow of the model
Table 1 Table of datasets’ info Dataset
Text–summary pairs Average length text Average length summary
Amazon fine-food review 568,454 CNN/daily mail
311,971
80
8
692
52
reviews from all other Amazon categories. Reviews are by 256,059 users and of 74,258 products. The preliminary info of the dataset is tabulated in Table 1. The CNN/daily mail dataset (https://github.com/abisee/cnndailymail) consists of more than 300 K news articles and each of them is paired with several highlights, known as multi-sentence summaries. The dataset primarily has two versions. The first version anonymizes name entities, while the second one keeps the original texts. For our study, we used the second version and obtained the processed data. The dataset contains online news articles (781 tokens on average) paired with multi-sentence summaries (3.75 sentences or 56 tokens on average). The processed version contains 287,226 training pairs, 13,368 validation pairs, and 11,490 test pairs. Some of the statistical metrics such as length of the summary, length of text, number of sentences in summary and text are compared in Figs. 2, 3, and 4.
3.3 Text Preprocessing In the proposed approach, the input text file is preprocessed in the following ways: 1. First, the file which is given as input is tokenized in order to get tokens of the individual terms. 2. The individual tokens are converted into lowercase. The goal is to change the input text’s casing so that it changes “TEXT”, “Text”, “TEXT” into “text”. 3. Certain summaries in the dataset are blank. So, we drop these null values.
43 Advanced Pointer-Generator Networks Based Text Generation
541
Fig. 2 Sentence length in the Amazon food reviews’ summary and text
Fig. 3 Average number of sentences per review in Amazon food reviews’ text
4. Punctuations are removed. The technique of removing punctuation will aid in treating each text equally. 5. Depending on the use cases, numbers occasionally do not contain any essential information in the text. Therefore, getting rid of them is preferable to keeping them. 6. Stop words like “a”, “and”, “the”, “of” are removed. This is done to reduce the size of the text and to avoid irrelevant information. 7. Stemming or lemmatization is performed on the remaining key words. Stemming is the process of producing morphological variants of a root/base word.
542
N. Darapaneni et al.
Fig. 4 Average number of sentences per review in Amazon food reviews’ summary
4 Methodology The pointer-generator network is composed of three main components: an encoder, a decoder, and a pointer [11], and Fig. 5 depicts generic architecture. The encoder is responsible for converting the input text into a dense representation that captures the meaning of the text [12]. The decoder then generates the summary by attending to the encoder’s output and the input text [13]. The pointer is used to point to specific words in the input text, allowing the network to include these words in the summary. The encoder is typically implemented as a recurrent neural network (RNN), such as a long short-term memory (LSTM) or a gated recurrent unit (GRU). In an issue involving question-answering [14], the input sequence consists of the question’s entire word list. Each word is denoted by the symbol xi , where “i” is the order of the word. The following formula is used to calculate the hidden states in Eq. (1): h t = f (W (hh) h (t−1) + W (hx) X t .
(1)
The output of a typical recurrent neural network is represented by this straightforward formula. It can be seen that we just add the proper weights to the prior hidden state h(t–1) and the input vector x(t), and a typical working model of encoder is depicted in Fig. 6. The decoder is also typically implemented as an RNN, and it takes as input the encoder’s output and the previous word in the summary. In our experiment, we have used a recurrent neural network (RNN) model in our encoder setup. In the questionanswering problem, the output sequence is a collection of all words from the answer. Each word is represented as yi , where “i” is the order of that word. Any hidden state hi is computed using the formula in Eq. (2).
43 Advanced Pointer-Generator Networks Based Text Generation
543
Fig. 5 Architecture of an encoder–decoder seq-to-seq model
Fig. 6 Working model of an encoder
h t = f (W (hh) h (t−1) .
(2)
It can be seen that we are simply computing the subsequent concealed state using the prior one as a reference. The output yt at time step “t” is computed using the formula in Eq. (3).
544
N. Darapaneni et al.
h t = softmax W s h t .
(3)
Using the hidden state at the current time step and the appropriate weight “W ”, we calculate the outputs “S”. Using softmax, we may generate a probability vector that will enable us to predict the result. The pointer-generator network uses a technique called attention to determine which parts of the input text are most relevant for the summary. Attention allows the network to selectively focus on certain parts of the input text, rather than considering the entire text at once. This helps the network to identify the most important information in the text and include it in the summary. In the typical pointer-generator network, it uses a combination of dot product and location-based attention mechanism but due to hardware limitations, we could not implement this and we have only implemented location-based attention. During training, the network is trained to maximize the likelihood of the reference summary given the input text. The network is also trained to balance between copying words from the input text and generating new words. The pointer-generator network uses a technique called coverage to ensure that it does not repeat information from the input text.
5 Results We tested LSTM model as the base model in addition to LSTM model with attention, bidirectional LSTM with attention, and pointer-generator networks using Google Colab on the Amazon Fine Food Reviews dataset. In our pointer-generator model, we employed the following parameters in order to set up the model: (1) (2) (3) (4) (5) (6) (7)
Hidden sets: 100. Dimensional word embeddings: 128. Batch size: 16. Encoder Dropouts: 0.01. Decoder Dropouts: 0.01. N-gram: 3. Number of iterations 100,000.
Table 2 summarizes the results observed from our experiments. It can be seen from Table 2 and Fig. 7 that the pointer-generator network model demonstrates a better ROUGE-2 score, while the Bidirectional LSTM model with attention model demonstrates better performance in terms of ROUGE-1 and ROUGEL score. Figure 8 shows the number of training iterations versus loss of the model. Following Tables 3, 4, and 5 are some of the outputs from our experiments.
43 Advanced Pointer-Generator Networks Based Text Generation
545
Table 2 Summarization of ROUGE scores Model type
Metric
ROUGE-1
ROUGE-2
ROUGE-L
LSTM model
Recall
0.1408
0.0183
0.1408
Precision
0.1105
0.015
0.1105
F-score
0.1186
0.0162
0.1186
Recall
0.2078
0.0317
0.2028
Precision
0.2078
0.0283
0.1269
F-score
0.1491
0.0292
0.1469
Recall
0.299
0.0733
0.299
Precision
0.2045
0.0417
0.2045
F-score
0.2395
0.0484
0.2305
Recall
0.159
0.102
0.133
Precision
0.120
0.116
0.1133
F-score
0.143
0.100
0.125
LSTM with attention
Bidirectional LSTM with attention
Pointer-generator network
Fig. 7 ROUGE score visualization of various models
6 Discussion and Future Work As most of us agree, the highest importance currently is given to the utterances with the highest number of repetitions but in most cases that is not completely true. For example, a critical task might be brought up in a meeting and a team member volunteers to take responsibility for it and it is not spoken about during the entire
546
N. Darapaneni et al.
Fig. 8 Pointer-generator model iterations versus loss
Table 3 Few results of LSTM model
Table 4 Few results of LSTM model with attention
Original
Predicted
Every morning with coffee
Great taste
Best coconut water
The best
Really nice surprise
My dogs love these
Not hate because it is overpriced here
Great for
Disappointed
Not too strong
Original
Predicted
Every morning with coffee
Great coffee
Best coconut water
Coconut water
Really nice surprise
My dogs love these
Not hate because it is overpriced here
Good product
Disappointed
Not good
meeting again. However, the task due to its critical nature needs to be mentioned in the MOM of the meeting. The next scope of work is to capture the attendees’ facial expressions like gestures, head nods, etc. which will give more information than words in certain cases. For this, we would need dedicated video cameras focusing on each member, etc. However, this is challenging in terms of infrastructure, network bandwidth in case of virtual meetings, etc. To get accurate information of the meeting, this plays an extremely important role.
43 Advanced Pointer-Generator Networks Based Text Generation
547
Table 5 Few results of Pointer-Generator model Original
Predicted
I drink nothing but half caff coffee this coffee is so smooth and rich you can Smooth till the drink it late into the evening and still be able to go to sleep without the jitters end This is an excellent choice for summer refreshment i appreciate the Refreshing ingredients as they are a healthy alternative to traditional sodas the ingredients organic are br br sparkling filtered water organic evaporated cane juice organic root beer flavor organic lemon juice concentrate organic vanilla extract The ad says 30 regular greenies but the package i received says br 27 greenies Less than I ordered I think they would have been good really wish i could have experienced them Never received they ve never arrived this order Best healthy snack chips available white cheddar is preferred flavor but note they are not gluten free only sea salt flavor is gluten free which is good but not as good as white cheddar
Gluten-free healthy snack
We can also think of an AI model that will keep track of all the action items and their stakeholders. Once a certain task is completed by a team member, all stakeholders get an automatic update about the same and the task is marked with current status on a global (company level) backlog of all pending tasks.
7 Conclusion The task of summarizing text automatically is intricate and involves various subtasks, each of which has the potential to produce high-quality summaries. Initial efforts to create a proper summary used basic models such as RNN, but issues were encountered and work shifted to LSTM models. As models involving Attention became more complex, results improved, but performance and infrastructure requirements increased. The pointer-generator network which includes encoders, decoders, and attention models addresses issues of repetition in the summary. The pointer-generator network combines extractive and abstractive summarization techniques to create concise, meaningful summaries. It extracts key information and uses abstractive generators to generate apt text sequences. With proper training, pointergenerator networks have shown promise in generating state-of-the-art results. We have trimmed the training data and reduced the number of epochs and depth of the models due to infrastructural challenges and time constraints. We conclude that with proper training data and training the model, we will be able to get the state-of-the-art results.
548
N. Darapaneni et al.
References 1. Rydning DRJGJ, Reinsel J, Gantz J (2018) The digitization of the world from edge to core. Framingham Int Data Corp16 2. Narayan S, Cohen SB, Lapata M (2018) Don’t give me the details, just the summary! topicaware convolutional neural networks for extreme summarization. arXiv preprint arXiv:1808. 08745 3. X. Feng, X. Feng, B. Qin and X. Geng (2020) Dialogue discourse-aware graph model and data augmentation for meeting summarization. arXiv preprint arXiv:2012.03502 4. Madhuri JN, Kumar RG (2019) Extractive text summarization using sentence ranking. In: 2019 international conference on data science and communication (IconDSC), pp 1–3 IEEE 5. Ng JP, Abrecht V (2015) Better summarization evaluation with word embeddings for ROUGE. arXiv preprint arXiv:1508.06034 6. Shang G, Ding W, Zhang Z, Tixier AJP, Meladianos P, Vazirgiannis M, Lorré JP (2018) Unsupervised abstractive meeting summarization with multi-sentence compression and budgeted submodular maximization. arXiv preprint arXiv:1805.05271 7. Zhu C, Xu R, Zeng M, Huang X (2020) A hierarchical network for abstractive meeting summarization with cross-domain pretraining. arXiv preprint arXiv:2004.02016 8. Tardy P, Janiszek D, Estève Y, Nguyen V (2020) Align then summarize: automatic alignment methods for summarization corpus creation. arXiv preprint arXiv:2007.07841 9. Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Stoyanov V, Zettlemoyer L (2019) Bart: denoising sequence-to-sequence pre-training for natural language generation, translation and comprehension. arXiv preprint arXiv:1910.13461 10. Koay JJ, Roustai A, Dai X, Burns D, Kerrigan A, Liu F (2020) How domain terminology affects meeting summarization performance. arXiv preprint arXiv:2011.00692 11. See A, Liu PJ, Manning CD (2017) Get to the point: summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368 12. Liu Y, Lapata M (2019) Text summarization with pretrained encoders. arXiv preprint arXiv: 1908.08345 13. Khandelwal U, Clark K, Jurafsky D, Kaiser L (2019) Sample efficient text summarization using a single pre-trained transformer. arXiv preprint arXiv:1905.08836 14. Zhong M, Yin D, Yu T, Zaidi A, Mutuma M, Jha R, Awadallah AM, Celikyilmaz A, Liu Y, Qiu X, Radev D (2021) QMSum: a new benchmark for query-based multi-domain meeting summarization. arXiv preprint arXiv:2104.05938
Author Index
A Abhishek Sri Sai Tammannagari, 151 Adithya Kashyap, 537 Aditi Pandey, 485 Akhilesh Tiwari, 329 Akila, K., 263 Ambhore, P. B., 1 Amisha Nakhale, 387 Amogh Katti, 303 Amruta Lipare, 281 Amudha, P., 163 Ananya, S. T., 205 Anirudh Venugopal, 525 Anjana Rani, 127 Ankush R. Deshmukh, 1 Anuja, E., 221 Anwesh Reddy Paduri, 537 Apash Roy, 457 Arun Kumar, 25 Ashok Kumar Yadav, 25 Ashwini Dalvi, 429 Ashwini Kulkarni, 537 Ayushi Maurya, 25 Aziz, Md. Abdul, 99
B Baby Maruthi, P., 51 Bacanin, Nebojsa, 63, 347 Balacumaresan, Harshanth, 99 Balghouni, Amira, 63 Bhargav Narayanan, P., 177 Bhirud, S. G., 429 Bhushan Inje, 463 Bright Keswani, 191
C Catherine Aurelia, C. A., 205 Chinmoy Kakoty, 87 Choudhury, Tanveer, 99
D Debayani Ghosh, 457 Devendran, V., 249 Dipu, Abu Jafar Md Rejwanul Hoque, 235
F Fahim, Md. Masroor, 413
G Geethanjali, D., 221 Gondhalekar, S. K., 119
H Harshil Joshi, 439 Hasan, Tahmid Bin, 413 Honey Sengar, 329
I Imteaz, Monzur, 99
J Jacily Jemila, S., 339 Jairaj Mahadev, 525 Janaki, R., 205
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 A. K. Tripathi et al. (eds.), Proceedings of World Conference on Artificial Intelligence: Advances and Applications, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-99-5881-8
549
550
Author Index
Jarin, Mahnaz, 235 Jaswitha Sai Ayinapuru, 291 Jayasree, T., 401 Jovanovic, Luka, 63, 347 Jovina, D. J., 401 Jyothsna Manchiraju, 51
P Petrovic, Aleksandar, 347 Prabhjot Kaur, 485 Prarthana Dutta, 513 Pratiksha Meshram, 497 Provas Kumar Roy, 363
K Kanade, D. M., 119 Kapil Nagwanshi, 463 Karthik Iyer, 525 Keshav Tyagi, 485 Kljajic, Maja, 347 Krishnan, L. R. K., 35 Kulkarni, R. D., 119
Q Qadir, Imran, 249
L Lakshmi, D., 205 Lataben J. Gadhavi, 439
M Madhavan, R., 221 Mangesh Angadrao Bidve, 451 Manish Billore, 451 Marevic, Vladimir, 63 Mary Cynthia, S., 339 Masooda Modak, 525 Meenalochini, M., 163 Merlin Livingston, L. M., 339 Mim, Mahbuba Sharmin, 413 Mishu, Mehedi Hasan, 235 Mizdrakovic, Vule, 347 Monika Saxena, 127 Mostafizur Rahaman, A. S. M., 235 Mridul Namboori, 485 Mrunalini, T., 221 Munish Saran, 377
N Nagullas, K. S., 537 Narayana Darapaneni, 537 Naresh Babu Muppalaneni, 513 Nivas Kodali, 291 Nomi Baruah, 87
O Oviya, I. R., 79
R Radha Krishna Rambola, 463, 497 Rahaman, Abu Sayed Md. Mostafizur, 413 Rajesh Pudi, 151 Rashmi Agarwal, 13, 303 Rathish Kumar, B. V., 387 Ritesh Kumar Singh, 377 Rituraj Phukan, 87 Ritwik Shivam, 177 Roopak Mayya, 537
S Sai Gruheeth, N., 177 Sarthak Yadav, 79 Sashreek Krishnan, 35 Sathya, P., 317 Satyam R. D. Dwivedi, 177 Saurav Gupta, 87 Senthil Kumar, K., 205 Shivangi Mehta, 439 Shruti Sharma, 139 Shruti Taksali, 281 Shweta Saraswat, 191 Sishil Surendran, 13 Siva Jyothi Natha Reddy, B., 79 Sneha Sultana, 363 Sonam Maurya, 281 Sourav Paul, 363 Sri Silpa Padmanabhuni, 151 Srujana Pesaramalli, 151 Stankovic, Marko, 63 Subashini, V., 205 Subhankar Ghosh, 87 Sudha, B. G., 537 Sujeetha, R., 263 Syed Nazim Afrid, 87
T Tarun Madduri, 291 Thejas, C. S., 537
Author Index U Ullas Dani, 537 V Vamsi Mohan Prattipati, 291 Vedashree Joshi, 429 Venkatakrishnan, R., 79 Vimala Kumari Jonnalagadda, 291 Vinodha, K., 177 Vrishit Saraswat, 191
551 Y Yamini Gujjidi, 303 Yash Jaiswal, 25 Yogesh Kumar Gupta, 139
Z Zivkovic, Miodrag, 63, 347